Run a set of validation checks to check a variable vector or a full dataset for potential errors. Which checks are performed depends on the class of the variable and on user inputs.
check(v, nMax = 10, checks = setChecks(), ...)
v | the vector or the dataset ( |
---|---|
nMax | If a check is supposed to identify problematic values,
this argument controls if all of these should be pasted onto the outputted
message, or if only the first |
checks | A list of checks to use on each supported variable type. We recommend
using |
… | Other arguments that are passed on to the checking functions.
These includes general parameters controlling how the check results are
formatted (e.g. |
If v
is a variable, a list of objects of class
checkResult
, which each summarizes the result of a
checkFunction
call performed on v
.
See checkResult
for more details. If V
is a
data.frame
, a list of lists of the form above
is returned instead with one entry for each variable in v
.
It should be noted that the default options for each variable type
are returned by calling e.g. defaultCharacterChecks()
,
defaultFactorChecks()
, defaultNumericChecks()
, etc. A complete
overview of all default options can be obtained by calling setChecks()
.
Moreover, all available checkFunction
s (including both locally defined
functions and functions imported from dataMaid
or other packages) can
be viewed by calling allCheckFunctions()
.
setChecks
,
allCheckFunctions
checkResult
checkFunction
, defaultCharacterChecks
,
defaultFactorChecks
, defaultLabelledChecks
,
defaultNumericChecks
, defaultIntegerChecks
,
defaultLogicalChecks
, defaultDateChecks
x <- 1:5 check(x)#> $identifyMissing #> No problems found. #> $identifyOutliers #> No problems found.#Annoyingly coded missing as 99 y <- c(rnorm(100), rep(99, 10)) check(y)#> $identifyMissing #> The following suspected missing value codes enter as regular values: 99. #> $identifyOutliers #> Note that the following possible outlier values were detected: -2.61, -2.44, -2.27, -1.91, -1.91, -1.86, -1.82, -1.7, -1.63, -1.51 (4 additional values omitted).#Check y for outliers and print 4 decimals for problematic variables check(y, checks = setChecks(numeric = "identifyOutliers"), maxDecimals = 4)#> $identifyOutliers #> Note that the following possible outlier values were detected: -2.6123, -2.4373, -2.2741, -1.9117, -1.9101, -1.863, -1.8218, -1.6995, -1.631, -1.5124 (4 additional values omitted).#Change what checks are performed on a variable, now only identifyMissing is called # for numeric variables check(y, checks = setChecks(numeric = "identifyMissing"))#> $identifyMissing #> The following suspected missing value codes enter as regular values: 99.#Check a full data.frame at once data(cars) check(cars)#> $speed #> $speed$identifyMissing #> No problems found. #> $speed$identifyOutliers #> Note that the following possible outlier values were detected: 4. #> #> $dist #> $dist$identifyMissing #> No problems found. #> $dist$identifyOutliers #> Note that the following possible outlier values were detected: 2, 4, 10, 14. #>#Check a full data.frame at once, while changing the standard settings for #several data classes at once. Here, we ommit the check of miscoded missing values for factors #and we only do this check for numeric variables: check(cars, checks = setChecks(factor = defaultFactorChecks(remove = "identifyMissing"), numeric = "identifyMissing"))#> $speed #> $speed$identifyMissing #> No problems found. #> #> $dist #> $dist$identifyMissing #> No problems found. #>