An artificial dataset, intended for presenting the key features of dataMaid, which is a toolset for identifying potential errors in a dataset.

toyData

Format

A data.frame with 15 rows and 6 variables.

pill

A factor variable with two levels ("red" and "blue") and a few (correctly coded) missing observations. This represents the colour of a pill.

events

A numeric variable with one obvious outlier value (82), two miscoded missing values (999 and NaN) and a few correctly coded missing values. The number of previous events.

region

A factor variable where two of the levels ("other" and "OTHER" are the same word with different case settings. Moreover, the variable includes a Stata-style miscoded missing value ("."). Used to represent geographical regions or treatment centers.

.
change

A numeric variable (random draws from a standard normal distribution). Representing a change in a measured variable.

id

A factor variable with unique codes for each observation (a character string with a number between 1 and 15), i.e. a key variable.

spotifysong

A factor variable that has the same level ("Irrelevant") for all observations, i.e. a empty variable. The latest song played on Spotify.

Source

Artificial data

Examples

data(toyData)