Convert a function, f, into an S3
checkFunction object. This adds f to the
overview list returned by an allCheckFunctions()
call.
checkFunction(f, description = NULL, classes = NULL)
| f | A function. See details and examples below for the exact requirements of this function. |
|---|---|
| description | A character string describing the check
performed by |
| classes | The classes for which |
A function of class checkFunction which has to attributes,
namely classes and description.
checkFunction represents the functions used in
check and makeDataReport for performing
error checks and quality control on variables in dataset.
An example of defining a new checkFunction is given below.
Note that the minimal requirements for such a function (in order for it to be
compatible with check() and makeDataReport()) is the following
input/output-structure: It must input at least two arguments, namely
v (a vector variable) and .... Additional implemented
arguments from check() and makeDataReport() include nMax and
maxDecimals, see e.g. the pre-defined checkFunction
identifyMissing for more details about how these arguments should
be used.
The output must be a list with at least the two entries $problem
(a logical indicating whether a problem was found) and $message
(a character string message describing the problem). However, if the
result of a checkFunction is furthermore appended with a
$problemValues entry (including the values from the variable
that caused the problem, if relevant) and converted to a
checkResult object, a print() method also becomes
available for consistent formatting of checkFunction results.
Note that all available checkFunctions are listed by the call
allCheckFunctions() and we recommed looking into these function,
if more knowledge about checkFunctions is required.
#Define a minimal requirement checkFunction that can be called #from check() and makeDataReport(). This function checks whether all #values in a variable are of equal length and that this #length is then also larger than 10: isID <- function(v, nMax = NULL, ...) { out <- list(problem = FALSE, message = "") if (class(v) %in% c("character", "factor", "labelled", "numeric", "integer")) { v <- as.character(v) lengths <- nchar(v) if (all(lengths > 10) & length(unique(lengths)) == 1) { out$problem <- TRUE out$message <- "Warning: This variable seems to contain ID codes!" } } out } #Convert it into a checkFunction isID <- checkFunction(isID, description = "Identify ID variables (long, equal length values)", classes = allClasses())#> Error in get(fName): object 'isID' not found#Call isID isID(c("12345678901", "23456789012", "34567890123", "45678901234"))#> $problem #> [1] TRUE #> #> $message #> [1] "Warning: This variable seems to contain ID codes!" #>#isID now appears in a allCheckFunctions() call: allCheckFunctions()#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>#Define a new checkFunction using messageGenerator() for generating #the message and checkResult() for getting a printing method #for its output. This function identifies values in a variable #that include a colon, surrounded by alphanumeric characters. If #at least one such value is found, the variable is flagged as #having a problem: identifyColons <- function(v, nMax = Inf, ... ) { v <- unique(na.omit(v)) problemMessage <- "Note: The following values include colons:" problem <- FALSE problemValues <- NULL problemValues <- v[sapply(gregexpr("[[:xdigit:]]:[[:xdigit:]]", v), function(x) all(x != -1))] if (length(problemValues) > 0) { problem <- TRUE } problemStatus <- list(problem = problem, problemValues = problemValues) outMessage <- messageGenerator(problemStatus, problemMessage, nMax) checkResult(list(problem = problem, message = outMessage, problemValues = problemValues)) } #Make it a checkFunction: identifyColons <- checkFunction(identifyColons, description = "Identify non-suffixed nor -prefixed colons", classes = c("character", "factor", "labelled"))#> Error in get(fName): object 'identifyColons' not found#Call it: identifyColons(1:100)#> No problems found.identifyColons(c("a:b", 1:10, ":b", "a:b:c:d"))#> Note: The following values include colons: a:b, a:b:c:d.#identifyColons now appears in a allCheckFunctions() call: allCheckFunctions()#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>#Define a checkFunction that looks for negative values in numeric #or integer variables: identifyNeg <- function(v, nMax = Inf, maxDecimals = 2, ...) { problem <- FALSE problemValues <- printProblemValues <- NULL problemMessage <- "Note: The following negative values were found:" negOcc <- unique(v[v < 0]) if (length(negOcc > 0)) { problemValues <- negOcc printProblemValues <- round(negOcc, maxDecimals) problem <- TRUE } outMessage <- messageGenerator(list(problem = problem, problemValues = printProblemValues), problemMessage, nMax) checkResult(list(problem = problem, message = outMessage, problemValues = problemValues)) } #Make it a checkFunction identifyNeg <- checkFunction(identifyNeg, "Identify negative values", classes = c("integer", "numeric"))#> Error in get(fName): object 'identifyNeg' not found#Call it: identifyNeg(c(0:100))#> No problems found.identifyNeg(c(-20.1232323:20), nMax = 3, maxDecimals = 4)#> Note: The following negative values were found: -20.1232, -19.1232, -18.1232 (18 additional values omitted).#identifyNeg now appears in a allCheckFunctions() call: allCheckFunctions()#> #> ---------------------------------------------------------- #> name description #> ------------------------- -------------------------------- #> identifyCaseIssues Identify case issues #> #> identifyLoners Identify levels with < 6 obs. #> #> identifyMissing Identify miscoded missing #> values #> #> identifyNums Identify misclassified numeric #> or integer variables #> #> identifyOutliers Identify outliers #> #> identifyOutliersTBStyle Identify outliers (Turkish #> Boxplot style) #> #> identifyWhitespace Identify prefixed and suffixed #> whitespace #> #> isCPR Identify Danish CPR numbers #> #> isEmpty Check if the variable contains #> only a single value #> #> isKey Check if the variable is a key #> #> isSingular Check if the variable contains #> only a single value #> #> isSupported Check if the variable class is #> supported by dataMaid. #> ---------------------------------------------------------- #> #> Table: Table continues below #> #> #> ----------------------------- #> classes #> ----------------------------- #> character, factor #> #> character, factor #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, factor, labelled #> #> Date, integer, numeric #> #> Date, integer, numeric #> #> character, factor, labelled #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> #> character, Date, factor, #> integer, labelled, logical, #> numeric #> ----------------------------- #>