Which of the following statements about working with data tables is not true?

data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.

Why data.table?

  • concise syntax: fast to type, fast to read
  • fast speed
  • memory efficient
  • careful API lifecycle management
  • community
  • feature rich

Features

  • fast and friendly delimited file reader: ?fread, see also convenience features for small data
  • fast and feature rich delimited file writer: ?fwrite
  • low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
  • fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
  • fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to IRanges::findOverlaps), non-equi joins (i.e. joins using operators >, >=, <, <=), aggregate on join (by=.EACHI), update on join
  • fast add/update/delete columns by reference by group using no copies at all
  • fast and feature rich reshaping data:
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    0 (pivot/wider/spread) and
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    1 (unpivot/longer/gather)
  • any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    2 are supported
  • has no dependencies at all other than base R itself, for simpler production/maintenance
  • the R dependency is as old as possible for as long as possible, dated April 2014, and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0

Installation

install.packages("data.table")

# latest development version that has passed all tests:
data.table::update_dev_pkg()

See the Installation wiki for more details.

Usage

Use data.table subset

library(data.table)
DT = as.data.table(iris)

# FROM[WHERE, SELECT, GROUP BY]
# DT  [i,     j,      by]

DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
#      Species       V1
#1: versicolor 4.362791
#2:  virginica 5.552000
4 operator the same way you would use data.frame one, but...

  • no need to prefix each column with
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    6 (like
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    7 and
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    8 but built-in)
  • any R expression using any package is allowed in
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    9 argument, not just list of columns
  • extra argument data.table0 to compute
    library(data.table)
    DT = as.data.table(iris)
    
    # FROM[WHERE, SELECT, GROUP BY]
    # DT  [i,     j,      by]
    
    DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
    #      Species       V1
    #1: versicolor 4.362791
    #2:  virginica 5.552000
    9 expression by group

library(data.table)
DT = as.data.table(iris)

# FROM[WHERE, SELECT, GROUP BY]
# DT  [i,     j,      by]

DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
#      Species       V1
#1: versicolor 4.362791
#2:  virginica 5.552000

Getting started

  • Introduction to data.table vignette
  • Getting started wiki page
  • produced by data.table2

Cheatsheets

Which of the following statements about working with data tables is not true?

Community

data.table is widely used by the R community. It is being directly used by hundreds of CRAN and Bioconductor packages, and indirectly by thousands. It is one of the top most starred R packages on GitHub, and was highly rated by the Depsy project. If you need help, the data.table community is active on StackOverflow.

Which of the following is not true when adding criteria to a data input form?

Which of the following is not true when adding criteria to a data input form? The last matching record displays in the form.

When creating a one variable data table with input values in a column where must data table formulas start?

Add a formula to a one-variable data table. Formulas that are used in a one-variable data table must refer to the same input cell. Do either of these: If the data table is column-oriented, enter the new formula in a blank cell to the right of an existing formula in the top row of the data table.

Which of the following statements is not true for assigning a macro to a button quizlet?

Which of the following is not true when assigning a macro to a button? You can resize a button, but you cannot move the button.

Which is not true of filters in a PivotTable?

Which is not true of filters in a PivotTable? Summary statistics do not change to reflect the values selected in the slicer.