Generic shell function that produces a summary of a variable (or for each variable in an entire dataset), given a number of summary functions and depending on its data class.

summarize(v, reportstyleOutput = FALSE, summaries = setSummaries(), ...)

Arguments

v

The variable (vector) or dataset (data.frame) to be summarized.

reportstyleOutput

Logical indicating whether the output should be formatted for inclusion in the report (escaped matrix) or not. Defaults to not.

summaries

A list of summaries to use on each supported variable type. We recommend using setSummaries for creating this list and refer to the documentation of this function for more details.

...

Additional argument passed to data class specific methods.

Value

The return value depends on the value of reportstyleOutput.

If reportstyleOutput = FALSE (the default): If v is a varibale, a list of summaryResult objects, one summaryResult for each summary function called on v. If v is a dataset, then summarize() returns a list of lists of summaryResult objects instead; one list for each variable in v.

If reportstyleOutput = TRUE: If v is a single variable: A matrix with two columns, feature and result and one row for each summary function that was called. Character strings in this matrix are escaped such that they are ready for Rmarkdown rendering.

If v is a full dataset: A list of matrices as described above, one for each variable in the dataset.

Details

Summary functions are supplied using their names (in character strings) in the class-specific argument, e.g. characterSummaries = c("countMissing", "uniqueValues") for character variables and similarly for the remaining 6 data classes (factor, Date, labelled, numeric, integer, logical). Note that an overview of all available summaryFunctions can be obtained by calling allSummaryFunctions.

The default choices of summaryFunctions are available in data class specific functions, e.g. defaultCharacterSummaries() and defaultNumericSummaries(). A complete overview of all default options can be obtained by calling setSummaries()

A user defined summary function can be supplied using its function name. Note however that it should take a vector as argument and return a list on the form list(feature="Feature name", result="The result"). More details on how to construct valid summary functions are found in summaryFunction.

See also

Examples

#Default summary for a character vector: charV <- c("a", "b", "c", "a", "a", NA, "b", "0") summarize(charV)
#> $variableType #> Variable type: character #> $countMissing #> Number of missing obs.: 1 (12.5 %) #> $uniqueValues #> Number of unique values: 4 #> $centralValue #> Mode: "a"
#Inspect default character summary functions: defaultCharacterSummaries()
#> [1] "variableType" "countMissing" "uniqueValues" "centralValue"
#Define a new summary function and add it to the summary for character vectors: countZeros <- function(v, ...) { res <- length(which(v == 0)) summaryResult(list(feature="No. zeros", result = res, value = res)) } summarize(charV, summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))
#> Error in countZeros(v = c("a", "b", "c", "a", "a", NA, "b", "0")): could not find function "countZeros"
#Does nothing, as intV is not affected by characterSummaries intV <- c(0:10) summarize(intV, summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))
#> $variableType #> Variable type: integer #> $countMissing #> Number of missing obs.: 0 (0 %) #> $uniqueValues #> Number of unique values: 11 #> $centralValue #> Median: 5 #> $quartiles #> 1st and 3rd quartiles: 2.5; 7.5 #> $minMax #> Min. and max.: 0; 10
#But supplying the argument for integer variables changes the summary: summarize(intV, summaries = setSummaries(integer = "countZeros"))
#> Error in countZeros(v = 0:10): could not find function "countZeros"
#Summarize a full dataset: data(cars) summarize(cars)
#> $speed #> $speed$variableType #> Variable type: numeric #> $speed$countMissing #> Number of missing obs.: 0 (0 %) #> $speed$uniqueValues #> Number of unique values: 19 #> $speed$centralValue #> Median: 15 #> $speed$quartiles #> 1st and 3rd quartiles: 12; 19 #> $speed$minMax #> Min. and max.: 4; 25 #> #> $dist #> $dist$variableType #> Variable type: numeric #> $dist$countMissing #> Number of missing obs.: 0 (0 %) #> $dist$uniqueValues #> Number of unique values: 35 #> $dist$centralValue #> Median: 36 #> $dist$quartiles #> 1st and 3rd quartiles: 26; 56 #> $dist$minMax #> Min. and max.: 2; 120 #>
#Summarize a variable and obtain report-style output (formatted for markdown) summarize(charV, reportstyleOutput = TRUE)
#> Feature Result #> [1,] "Variable type" "character" #> [2,] "Number of missing obs." "1 (12.5 %)" #> [3,] "Number of unique values" "4" #> [4,] "Mode" "\"a\""