Convert a function, f
, into an S3
checkFunction
object. This adds f
to the
overview list returned by an allCheckFunctions()
call.
checkFunction(f, description = NULL, classes = NULL)
f | A function. See details and examples below for the exact requirements of this function. |
---|---|
description | A character string describing the check
performed by |
classes | The classes for which |
A function of class checkFunction
which has to attributes,
namely classes
and description
.
checkFunction
represents the functions used in
check
and makeDataReport
for performing
error checks and quality control on variables in dataset.
An example of defining a new checkFunction
is given below.
Note that the minimal requirements for such a function (in order for it to be
compatible with check()
and makeDataReport()
) is the following
input/output-structure: It must input at least two arguments, namely
v
(a vector variable) and ...
. Additional implemented
arguments from check()
and makeDataReport()
include nMax
and
maxDecimals
, see e.g. the pre-defined checkFunction
identifyMissing
for more details about how these arguments should
be used.
The output must be a list with at least the two entries $problem
(a logical indicating whether a problem was found) and $message
(a character string message describing the problem). However, if the
result of a checkFunction
is furthermore appended with a
$problemValues
entry (including the values from the variable
that caused the problem, if relevant) and converted to a
checkResult
object, a print()
method also becomes
available for consistent formatting of checkFunction
results.
Note that all available checkFunction
s are listed by the call
allCheckFunctions()
and we recommed looking into these function,
if more knowledge about checkFunction
s is required.
#Define a minimal requirement checkFunction that can be called #from check() and makeDataReport(). This function checks whether all #values in a variable are of equal length and that this #length is then also larger than 10: isID <- function(v, nMax = NULL, ...) { out <- list(problem = FALSE, message = "") if (class(v) %in% c("character", "factor", "labelled", "numeric", "integer")) { v <- as.character(v) lengths <- nchar(v) if (all(lengths > 10) & length(unique(lengths)) == 1) { out$problem <- TRUE out$message <- "Warning: This variable seems to contain ID codes!" } } out } #Convert it into a checkFunction isID <- checkFunction(isID, description = "Identify ID variables (long, equal length values)", classes = allClasses())#> Error in get(fName): object 'isID' not found#Call isID isID(c("12345678901", "23456789012", "34567890123", "45678901234"))#> $problem #> [1] TRUE #> #> $message #> [1] "Warning: This variable seems to contain ID codes!" #>#isID now appears in a allCheckFunctions() call: allCheckFunctions()#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>#Define a new checkFunction using messageGenerator() for generating #the message and checkResult() for getting a printing method #for its output. This function identifies values in a variable #that include a colon, surrounded by alphanumeric characters. If #at least one such value is found, the variable is flagged as #having a problem: identifyColons <- function(v, nMax = Inf, ... ) { v <- unique(na.omit(v)) problemMessage <- "Note: The following values include colons:" problem <- FALSE problemValues <- NULL problemValues <- v[sapply(gregexpr("[[:xdigit:]]:[[:xdigit:]]", v), function(x) all(x != -1))] if (length(problemValues) > 0) { problem <- TRUE } problemStatus <- list(problem = problem, problemValues = problemValues) outMessage <- messageGenerator(problemStatus, problemMessage, nMax) checkResult(list(problem = problem, message = outMessage, problemValues = problemValues)) } #Make it a checkFunction: identifyColons <- checkFunction(identifyColons, description = "Identify non-suffixed nor -prefixed colons", classes = c("character", "factor", "labelled"))#> Error in get(fName): object 'identifyColons' not found#Call it: identifyColons(1:100)#> No problems found.identifyColons(c("a:b", 1:10, ":b", "a:b:c:d"))#> Note: The following values include colons: a:b, a:b:c:d.#identifyColons now appears in a allCheckFunctions() call: allCheckFunctions()#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>#Define a checkFunction that looks for negative values in numeric #or integer variables: identifyNeg <- function(v, nMax = Inf, maxDecimals = 2, ...) { problem <- FALSE problemValues <- printProblemValues <- NULL problemMessage <- "Note: The following negative values were found:" negOcc <- unique(v[v < 0]) if (length(negOcc > 0)) { problemValues <- negOcc printProblemValues <- round(negOcc, maxDecimals) problem <- TRUE } outMessage <- messageGenerator(list(problem = problem, problemValues = printProblemValues), problemMessage, nMax) checkResult(list(problem = problem, message = outMessage, problemValues = problemValues)) } #Make it a checkFunction identifyNeg <- checkFunction(identifyNeg, "Identify negative values", classes = c("integer", "numeric"))#> Error in get(fName): object 'identifyNeg' not found#Call it: identifyNeg(c(0:100))#> No problems found.identifyNeg(c(-20.1232323:20), nMax = 3, maxDecimals = 4)#> Note: The following negative values were found: -20.1232, -19.1232, -18.1232 (18 additional values omitted).#identifyNeg now appears in a allCheckFunctions() call: allCheckFunctions()#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>