Generic shell function that produces a summary of a variable (or for each variable in an entire dataset), given a number of summary functions and depending on its data class.
summarize(v, reportstyleOutput = FALSE, summaries = setSummaries(), ...)
v | The variable (vector) or dataset (data.frame) to be summarized. |
---|---|
reportstyleOutput | Logical indicating whether the output should be formatted for inclusion in the report (escaped matrix) or not. Defaults to not. |
summaries | A list of summaries to use on each supported variable type. We recommend
using |
... | Additional argument passed to data class specific methods. |
The return value depends on the value of reportstyleOutput
.
If reportstyleOutput = FALSE
(the default): If v
is a varibale,
a list of summaryResult
objects, one summaryResult
for each summary
function called on v
. If v
is a dataset, then summarize()
returns
a list of lists of summaryResult
objects instead; one list for each variable
in v
.
If reportstyleOutput = TRUE
:
If v
is a single variable: A matrix with two columns, feature
and
result
and one row for each summary function that was called. Character
strings in this matrix are escaped such that they are ready for Rmarkdown rendering.
If v
is a full dataset: A list of matrices as described above, one for each
variable in the dataset.
Summary functions are supplied using their
names (in character strings) in the class-specific argument, e.g.
characterSummaries = c("countMissing", "uniqueValues")
for character variables and
similarly for the remaining 6 data classes (factor, Date, labelled, numeric, integer, logical).
Note that an overview of all available summaryFunction
s can be obtained by calling
allSummaryFunctions
.
The default choices of summaryFunctions
are available in data class specific functions, e.g.
defaultCharacterSummaries()
and defaultNumericSummaries()
.
A complete overview of all default options can be obtained by calling setSummaries()
A user defined summary function can be supplied using its function name. Note
however that it should take a vector as argument and return a list on the form
list(feature="Feature name", result="The result")
. More details on how to construct
valid summary functions are found in summaryFunction
.
setSummaries
,
summaryFunction
, allSummaryFunctions
,
summaryResult
,
defaultCharacterSummaries
, defaultFactorSummaries
,
defaultLabelledSummaries
, defaultLabelledSummaries
,
defaultNumericSummaries
, defaultIntegerSummaries
,
defaultLogicalSummaries
#Default summary for a character vector: charV <- c("a", "b", "c", "a", "a", NA, "b", "0") summarize(charV)#> $variableType #> Variable type: character #> $countMissing #> Number of missing obs.: 1 (12.5 %) #> $uniqueValues #> Number of unique values: 4 #> $centralValue #> Mode: "a"#Inspect default character summary functions: defaultCharacterSummaries()#> [1] "variableType" "countMissing" "uniqueValues" "centralValue"#Define a new summary function and add it to the summary for character vectors: countZeros <- function(v, ...) { res <- length(which(v == 0)) summaryResult(list(feature="No. zeros", result = res, value = res)) } summarize(charV, summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))#> Error in countZeros(v = c("a", "b", "c", "a", "a", NA, "b", "0")): could not find function "countZeros"#Does nothing, as intV is not affected by characterSummaries intV <- c(0:10) summarize(intV, summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))#> $variableType #> Variable type: integer #> $countMissing #> Number of missing obs.: 0 (0 %) #> $uniqueValues #> Number of unique values: 11 #> $centralValue #> Median: 5 #> $quartiles #> 1st and 3rd quartiles: 2.5; 7.5 #> $minMax #> Min. and max.: 0; 10#But supplying the argument for integer variables changes the summary: summarize(intV, summaries = setSummaries(integer = "countZeros"))#> Error in countZeros(v = 0:10): could not find function "countZeros"#Summarize a full dataset: data(cars) summarize(cars)#> $speed #> $speed$variableType #> Variable type: numeric #> $speed$countMissing #> Number of missing obs.: 0 (0 %) #> $speed$uniqueValues #> Number of unique values: 19 #> $speed$centralValue #> Median: 15 #> $speed$quartiles #> 1st and 3rd quartiles: 12; 19 #> $speed$minMax #> Min. and max.: 4; 25 #> #> $dist #> $dist$variableType #> Variable type: numeric #> $dist$countMissing #> Number of missing obs.: 0 (0 %) #> $dist$uniqueValues #> Number of unique values: 35 #> $dist$centralValue #> Median: 36 #> $dist$quartiles #> 1st and 3rd quartiles: 26; 56 #> $dist$minMax #> Min. and max.: 2; 120 #>#Summarize a variable and obtain report-style output (formatted for markdown) summarize(charV, reportstyleOutput = TRUE)#> Feature Result #> [1,] "Variable type" "character" #> [2,] "Number of missing obs." "1 (12.5 %)" #> [3,] "Number of unique values" "4" #> [4,] "Mode" "\"a\""