Title: | Modelling Interactions in High-Dimensional Data with Backtracking |
---|---|
Description: | Implementation of the algorithm introduced in Shah, R. D. (2016) <https://www.jmlr.org/papers/volume17/13-515/13-515.pdf>. Data with thousands of predictors can be handled. The algorithm performs sequential Lasso fits on design matrices containing increasing sets of candidate interactions. Previous fits are used to greatly speed up subsequent fits, so the algorithm is very efficient. |
Authors: | Rajen Shah [aut, cre] |
Maintainer: | Rajen Shah <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2024-11-26 04:52:03 UTC |
Source: | https://github.com/cran/LassoBacktracking |
LassoBT
Perform k-fold cross-validation potentially multiple times on permuted version of the data.
cvLassoBT( x, y, lambda = NULL, nlambda = 100L, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), nfolds = 5L, nperms = 1L, mc.cores = 1L, ... )
cvLassoBT( x, y, lambda = NULL, nlambda = 100L, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), nfolds = 5L, nperms = 1L, mc.cores = 1L, ... )
x |
input matrix of dimension nobs by nvars; each row is an observation vector. |
y |
response variable; shoud be a numeric vector. |
lambda |
user supplied |
nlambda |
the number of lambda values. Must be at least 3. |
lambda.min.ratio |
smallest value in |
nfolds |
number of folds. Default is 5. |
nperms |
the number of permuted datasets to apply k-folds corss-validation to. Default is 1 so we carry out vanilla cross-validation. |
mc.cores |
the number of cores to use. Only applicable when not in Windows as it uses the parallel package to parallelise the computations. |
... |
other arguments that can be passed to |
A list with components as below.
lambda
the sequence of lambda
values used
cvm
a matrix of error estimates (with squared error loss). The rows correspond
to different lambda
values whilst the columns correspond to different iterations
BT_fit
a "BT
" object from a fit to the full data.
cv_opt
a two component vector giving the cross-validation optimal lambda
index
and iteration
cv_opt_err
the minimal cross-validation error.
x <- matrix(rnorm(100*250), 100, 250) y <- x[, 1] + x[, 2] - x[, 1]*x[, 2] + x[, 3] + rnorm(100) out <- cvLassoBT(x, y, iter_max=10, nperms=2)
x <- matrix(rnorm(100*250), 100, 250) y <- x[, 1] + x[, 2] - x[, 1]*x[, 2] + x[, 3] + rnorm(100) out <- cvLassoBT(x, y, iter_max=10, nperms=2)
Computes a number of Lasso solution paths with increasing numbers of interactions present in the design matrices corresponding to each path. Previous paths are used to speed up computation of subsequent paths so the process is very fast.
LassoBT( x, y, nlambda = 100L, iter_max = 1L, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, thresh = 1e-07, verbose = FALSE, inter_orig )
LassoBT( x, y, nlambda = 100L, iter_max = 1L, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, thresh = 1e-07, verbose = FALSE, inter_orig )
x |
input matrix of dimension nobs by nvars; each row is an observation vector. |
y |
response variable; shoud be a numeric vector. |
nlambda |
the number of lambda values. Must be at least 3. |
iter_max |
the number of iterations of the Backtracking algorithm to
run. |
lambda.min.ratio |
smallest value in |
lambda |
user supplied |
thresh |
convergence threshold for coordinate descent. Each inner
coordinate descent loop continues until either the maximum change in the
objective after any coefficient update is less than |
verbose |
if |
inter_orig |
an optional 2-row matrix with each column giving interactions that are to be added to the design matrix before the algorithm begins. |
The Lasso optimisations are performed using coordinate descent similarly to the glmnet package. An intercept term is always included. Variables are centred and scaled to have equal empirical variance. Interactions are constructed from these centred and scaled variables, and the interactions themselves are also centred and scaled. Note the coefficients are returned on the original scale of the variables. Coefficients returned for interactions are for simple pointwise products of the original variables with no scaling.
An object with S3 class "BT
".
call
the call that produced the object
a0
list of intercept vectors
beta
list of matrices of coefficients
stored in sparse column format (CsparseMatrix
)
fitted
list of fitted values
lambda
the sequence of lambda
values used
nobs
the number of observations
nvars
the number of variables
var_indices
the indices of the non-constant columns of the design matrix
interactions
a 2-row matrix with columns giving the interactions that were added to the design matrix
path_lookup
a matrix with columns corresponding to iterations
and rows to lambda values. Entry gives the component of the
a0
and beta
lists that gives the coefficients for the
th
lambda
value and th iteration
l_start
a vector with component entries giving the minimimum
lambda
index in the corresponding copmonents of beta
and
a0
Shah, R. D. (2016) Shah, R. D. (2016) Modelling interactions in high-dimensional data with Backtracking. JMLR, 17, 1-31 https://www.jmlr.org/papers/volume17/13-515/13-515.pdf
predict.BT
, coef.BT
methods and the cvLassoBT
function.
x <- matrix(rnorm(100*250), 100, 250) y <- x[, 1] + x[, 2] - x[, 1]*x[, 2] + x[, 3] + rnorm(100) out <- LassoBT(x, y, iter_max=10)
x <- matrix(rnorm(100*250), 100, 250) y <- x[, 1] + x[, 2] - x[, 1]*x[, 2] + x[, 3] + rnorm(100) out <- LassoBT(x, y, iter_max=10)
BT
" object.Similar to other predict methods, this function predicts fitted values and computes coefficients
from a fitted "BT
" object.
## S3 method for class 'BT' predict( object, newx, s = NULL, iter = NULL, type = c("response", "coefficients"), ... ) ## S3 method for class 'BT' coef(object, s = NULL, iter = NULL, ...)
## S3 method for class 'BT' predict( object, newx, s = NULL, iter = NULL, type = c("response", "coefficients"), ... ) ## S3 method for class 'BT' coef(object, s = NULL, iter = NULL, ...)
object |
fitted " |
newx |
matrix of new values of design matrix at which predictions are to be made. Ignored
when |
s |
value of the penalty parameter at which predictions are required. If the value
is not one of the |
iter |
iteration at which predictions are required. Default is the entire sequence
of iterations in |
type |
of prediction required. Type " |
... |
not used. Other arguments to |
Either a vector of predictions or, if either s
or iter
are NULL
,
a three-dimensional array with last two dimensions indexing different lambda
values and
iterations.
x <- matrix(rnorm(100*250), 100, 250) y <- x[, 1] + x[, 2] - x[, 1]*x[, 2] + x[, 3] + rnorm(100) out <- LassoBT(x, y, iter_max=10) predict(out, newx=x[1:2, ])
x <- matrix(rnorm(100*250), 100, 250) y <- x[, 1] + x[, 2] - x[, 1]*x[, 2] + x[, 3] + rnorm(100) out <- LassoBT(x, y, iter_max=10) predict(out, newx=x[1:2, ])