Run a set of validation checks to check a variable vector or a full dataset for potential errors. Which checks are performed depends on the class of the variable and on user inputs.

check(v, nMax = 10, checks = setChecks(), ...)

Arguments

v

the vector or the dataset (data.frame) to be checked.

nMax

If a check is supposed to identify problematic values, this argument controls if all of these should be pasted onto the outputted message, or if only the first nMax should be included. If set to Inf, all problematic values are printed.

checks

A list of checks to use on each supported variable type. We recommend using setChecks for creating this list and refer to the documentation of this function for more details.

Other arguments that are passed on to the checking functions. These includes general parameters controlling how the check results are formatted (e.g. maxDecimals, which controls the number of decimals printed for numerical, problematic values).

Value

If v is a variable, a list of objects of class checkResult, which each summarizes the result of a checkFunction call performed on v. See checkResult for more details. If V is a data.frame, a list of lists of the form above is returned instead with one entry for each variable in v.

Details

It should be noted that the default options for each variable type are returned by calling e.g. defaultCharacterChecks(), defaultFactorChecks(), defaultNumericChecks(), etc. A complete overview of all default options can be obtained by calling setChecks(). Moreover, all available checkFunctions (including both locally defined functions and functions imported from dataMaid or other packages) can be viewed by calling allCheckFunctions().

See also

Examples

x <- 1:5 check(x)
#> $identifyMissing #> No problems found. #> $identifyOutliers #> No problems found.
#Annoyingly coded missing as 99 y <- c(rnorm(100), rep(99, 10)) check(y)
#> $identifyMissing #> The following suspected missing value codes enter as regular values: 99. #> $identifyOutliers #> Note that the following possible outlier values were detected: -2.61, -2.44, -2.27, -1.91, -1.91, -1.86, -1.82, -1.7, -1.63, -1.51 (4 additional values omitted).
#Check y for outliers and print 4 decimals for problematic variables check(y, checks = setChecks(numeric = "identifyOutliers"), maxDecimals = 4)
#> $identifyOutliers #> Note that the following possible outlier values were detected: -2.6123, -2.4373, -2.2741, -1.9117, -1.9101, -1.863, -1.8218, -1.6995, -1.631, -1.5124 (4 additional values omitted).
#Change what checks are performed on a variable, now only identifyMissing is called # for numeric variables check(y, checks = setChecks(numeric = "identifyMissing"))
#> $identifyMissing #> The following suspected missing value codes enter as regular values: 99.
#Check a full data.frame at once data(cars) check(cars)
#> $speed #> $speed$identifyMissing #> No problems found. #> $speed$identifyOutliers #> Note that the following possible outlier values were detected: 4. #> #> $dist #> $dist$identifyMissing #> No problems found. #> $dist$identifyOutliers #> Note that the following possible outlier values were detected: 2, 4, 10, 14. #>
#Check a full data.frame at once, while changing the standard settings for #several data classes at once. Here, we ommit the check of miscoded missing values for factors #and we only do this check for numeric variables: check(cars, checks = setChecks(factor = defaultFactorChecks(remove = "identifyMissing"), numeric = "identifyMissing"))
#> $speed #> $speed$identifyMissing #> No problems found. #> #> $dist #> $dist$identifyMissing #> No problems found. #>