R/identifyMissing.R
identifyMissing.Rd
A checkFunction to be called from check
that identifies values that
appear to be miscoded missing values.
identifyMissing(v, nMax = 10, ...)
v | A variable to check. |
---|---|
nMax | The maximum number of problematic values to report.
Default is |
... | Not in use. |
A checkResult
with three entires:
$problem
(a logical indicating whether midcoded missing values where found),
$message
(a message describing which values in v
were suspected to be
miscoded missing values), and $problemValues
(the problematic values
in their original format). Note that Only unique problematic values
are listed and that they are presented in alphabetical order.
identifyMissing
tries to identify common choices of missing values outside of the
R standard (NA
). These include special words (NaN and Inf (no matter the cases)),
one or more -9/9's (e.g. 999, "99", -9, "-99"), one ore more -8/8's (e.g. -8, 888, -8888),
Stata style missing values (commencing with ".") and other character strings
("", " ", "-", "NA" miscoded as character). If the variable is numeric/integer or a
character/factor variable consisting only of numbers and with more than 11 different values,
the numeric miscoded missing values (999, 888, -99, -8 etc.) are
only recognized as miscoded missing if they are maximum or minimum, respectively, and the distance
between the second largest/smallest value and this maximum/minimum value is greater than one.
##data(testData) ##testData$miscodedMissingVar ##identifyMissing(testData$miscodedMissingVar) #Identify miscoded numeric missing values v1 <- c(1:15, 99) v2 <- c(v1, 98) v3 <- c(-999, v2, 9999) identifyMissing(v1)#> The following suspected missing value codes enter as regular values: 99.identifyMissing(v2)#> No problems found.identifyMissing(v3)#> The following suspected missing value codes enter as regular values: -999, 9999.identifyMissing(factor(v3))#> The following suspected missing value codes enter as regular values: -999, 9999.