Inference for features identified by the Lasso

Performs randomization tests of features identified by the Lasso

feature.test(x, y, B = 100, type.measure = "deviance", s = "lambda.min",
  keeplambda = FALSE, olsestimates = TRUE, penalty.factor = rep(1, nvars),
  alpha = 1, control = list(trace = FALSE, maxcores = 24), ...)

Arguments

x	input matrix, of dimension nobs x nvars; each row is an observation vector.
y	quantitative response variable of length nobs
B	The number of randomizations used in the computations
type.measure	loss to use for cross-validation. See `cv.glmnet` for more information
s	Value of the penalty parameter 'lambda' at which predictions are required. Default is the entire sequence used to create the model. See `coef.glmnet` for more information
keeplambda	If set to `TRUE` then the estimated lambda from cross validation from the original dataset is kept and used for evaluation in the subsequent randomization datasets. This reduces computation time substantially as it is not necessary to perform cross validation for each randomization. If set to a value then that value is used for the value of lambda. Defaults to `FALSE`
olsestimates	Logical. Should the test statistic be based on OLS estimates from the model based on the variables selected by the lasso. Defaults to `TRUE`. If set to `FALSE` then the coefficients from the lasso is used as test statistics.
penalty.factor	a vector of weights used for adaptive lasso. See `glmnet` for more information.
alpha	The elasticnet mixing parameter. See `glmnet` for more information.
control	A list of options that control the algorithm. Currently `trace` is a logical and if set to `TRUE` then the function produces more output. `maxcores` sets the maximum number of cores to use with the `parallel` package
…	Other arguments passed to `glmnet`

Value

Returns a list of 7 variables:

p.full

The p-value for the test of the full set of variables selected by the lasso (based on the OLS estimates)

ols.selected

A vector of the indices of the non-zero variables selected by glmnet sorted from (numerically) highest to lowest based on their ols test statistic.

p.maxols

The p-value for the maximum of the OLS test statistics

lasso.selected

A vector of the indices of the non-zero variables selected by glmnet sorted from (numerically) highest to lowest based on their absolute lasso coefficients.

p.maxlasso

The p-value for the maximum of the lasso test statistics

lambda.orig

The value of lambda used in the computations

The number of permutations used

References

Brink-Jensen, K and Ekstrom, CT 2014. Inference for feature selection using the Lasso with high-dimensional data. http://arxiv.org/abs/1403.4296

Examples



# Simulate some data
x <- matrix(rnorm(30*100), nrow=30)
y <- rnorm(30, mean=1*x[,1])

# Make inference for features
# NOT RUN {
feature.test(x, y)
# }

Inference for features identified by the Lasso

Arguments

Value

References

See also

Examples

Contents

Author