An artificial dataset, intended for presenting the key features of dataMaid
, which is a
toolset for identifying potential errors in a dataset.
toyData
A data.frame
with 15 rows and 6 variables.
A factor variable with two levels ("red"
and "blue"
) and a few
(correctly coded) missing observations. This represents the colour of a pill.
A numeric variable with one obvious outlier value (82
), two miscoded
missing values (999
and NaN
) and a few correctly coded missing values. The number of previous events.
A factor variable where two of the levels ("other"
and "OTHER"
are the same word with different case settings. Moreover, the variable includes a Stata-style
miscoded missing value ("."
). Used to represent geographical regions or treatment centers.
A numeric variable (random draws from a standard normal distribution). Representing a change in a measured variable.
A factor variable with unique codes for each observation (a character string with a number between 1 and 15), i.e. a key variable.
A factor variable that has the same level ("Irrelevant"
) for all
observations, i.e. a empty variable. The latest song played on Spotify.
Artificial data
data(toyData)