Convert a function, f, into an S3 checkFunction object. This adds f to the overview list returned by an allCheckFunctions() call.

checkFunction(f, description = NULL, classes = NULL)

Arguments

f

A function. See details and examples below for the exact requirements of this function.

description

A character string describing the check performed by f. If NULL (the default), the name of f will be used instead.

classes

The classes for which f is intended to be called. If NULL (the default), one of two things happens. If f is not a S3 generic function, the classes attribute of f will be an empty character string. If f is a S3 generic function, an automatic look-up for methods will be conducted, and the classes attribute will then be filled out automatically. Note that the function allClasses (listing all classes used in dataMaid) might be useful.

Value

A function of class checkFunction which has to attributes, namely classes and description.

Details

checkFunction represents the functions used in check and makeDataReport for performing error checks and quality control on variables in dataset.

An example of defining a new checkFunction is given below. Note that the minimal requirements for such a function (in order for it to be compatible with check() and makeDataReport()) is the following input/output-structure: It must input at least two arguments, namely v (a vector variable) and .... Additional implemented arguments from check() and makeDataReport() include nMax and maxDecimals, see e.g. the pre-defined checkFunction identifyMissing for more details about how these arguments should be used. The output must be a list with at least the two entries $problem (a logical indicating whether a problem was found) and $message (a character string message describing the problem). However, if the result of a checkFunction is furthermore appended with a $problemValues entry (including the values from the variable that caused the problem, if relevant) and converted to a checkResult object, a print() method also becomes available for consistent formatting of checkFunction results.

Note that all available checkFunctions are listed by the call allCheckFunctions() and we recommed looking into these function, if more knowledge about checkFunctions is required.

See also

Examples

#Define a minimal requirement checkFunction that can be called #from check() and makeDataReport(). This function checks whether all #values in a variable are of equal length and that this #length is then also larger than 10: isID <- function(v, nMax = NULL, ...) { out <- list(problem = FALSE, message = "") if (class(v) %in% c("character", "factor", "labelled", "numeric", "integer")) { v <- as.character(v) lengths <- nchar(v) if (all(lengths > 10) & length(unique(lengths)) == 1) { out$problem <- TRUE out$message <- "Warning: This variable seems to contain ID codes!" } } out } #Convert it into a checkFunction isID <- checkFunction(isID, description = "Identify ID variables (long, equal length values)", classes = allClasses())
#> Error in get(fName): object 'isID' not found
#Call isID isID(c("12345678901", "23456789012", "34567890123", "45678901234"))
#> $problem #> [1] TRUE #> #> $message #> [1] "Warning: This variable seems to contain ID codes!" #>
#isID now appears in a allCheckFunctions() call: allCheckFunctions()
#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>
#Define a new checkFunction using messageGenerator() for generating #the message and checkResult() for getting a printing method #for its output. This function identifies values in a variable #that include a colon, surrounded by alphanumeric characters. If #at least one such value is found, the variable is flagged as #having a problem: identifyColons <- function(v, nMax = Inf, ... ) { v <- unique(na.omit(v)) problemMessage <- "Note: The following values include colons:" problem <- FALSE problemValues <- NULL problemValues <- v[sapply(gregexpr("[[:xdigit:]]:[[:xdigit:]]", v), function(x) all(x != -1))] if (length(problemValues) > 0) { problem <- TRUE } problemStatus <- list(problem = problem, problemValues = problemValues) outMessage <- messageGenerator(problemStatus, problemMessage, nMax) checkResult(list(problem = problem, message = outMessage, problemValues = problemValues)) } #Make it a checkFunction: identifyColons <- checkFunction(identifyColons, description = "Identify non-suffixed nor -prefixed colons", classes = c("character", "factor", "labelled"))
#> Error in get(fName): object 'identifyColons' not found
#Call it: identifyColons(1:100)
#> No problems found.
identifyColons(c("a:b", 1:10, ":b", "a:b:c:d"))
#> Note: The following values include colons: a:b, a:b:c:d.
#identifyColons now appears in a allCheckFunctions() call: allCheckFunctions()
#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>
#Define a checkFunction that looks for negative values in numeric #or integer variables: identifyNeg <- function(v, nMax = Inf, maxDecimals = 2, ...) { problem <- FALSE problemValues <- printProblemValues <- NULL problemMessage <- "Note: The following negative values were found:" negOcc <- unique(v[v < 0]) if (length(negOcc > 0)) { problemValues <- negOcc printProblemValues <- round(negOcc, maxDecimals) problem <- TRUE } outMessage <- messageGenerator(list(problem = problem, problemValues = printProblemValues), problemMessage, nMax) checkResult(list(problem = problem, message = outMessage, problemValues = problemValues)) } #Make it a checkFunction identifyNeg <- checkFunction(identifyNeg, "Identify negative values", classes = c("integer", "numeric"))
#> Error in get(fName): object 'identifyNeg' not found
#Call it: identifyNeg(c(0:100))
#> No problems found.
identifyNeg(c(-20.1232323:20), nMax = 3, maxDecimals = 4)
#> Note: The following negative values were found: -20.1232, -19.1232, -18.1232 (18 additional values omitted).
#identifyNeg now appears in a allCheckFunctions() call: allCheckFunctions()
#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>