Python – Vector autoregressive model fitting with scikit-learn

linear-regressionmachine learningmodel-fittingpythonscikit-learn

I am trying to fit vector autoregressive (VAR) models using the generalized linear model fitting methods included in scikit-learn. The linear model has the form y = X w, but the system matrix X has a very peculiar structure: it is block-diagonal, and all blocks are identical. To optimize performance and memory consumption the model can be expressed as Y = BW, where B is a block from X, and Y and W are now matrices instead of vectors.
The classes LinearRegression, Ridge, RidgeCV, Lasso, and ElasticNet readily accept the latter model structure. However, fitting LassoCV or ElasticNetCV fails due to Y being two-dimensional.

I found https://github.com/scikit-learn/scikit-learn/issues/2402
From this discussion I assume that the behavior of LassoCV/ElasticNetCV is intended.
Is there a way to optimize the alpha/rho parameters other than manually implementing cross-validation?

Furthermore, Bayesian regression techniques in scikit-learn also expect y to be one-dimensional. Is there any way around this?

Note: I use scikit-learn 0.14 (stable)

Best Answer

How crucial is the performance and memory optimization gained by using this formulation of the regression? Given that your reformulation breaks scikit-learn, I wouldn't really call it an optimization... I would suggest:

  1. Running the unoptimized version and waiting (if possible).

  2. Git pull the following code, which supposedly solves your problem. It's referenced in the conversation you posted from the scikit-learn github project. See here for instructions on building scikit-learn from a git pull. You can then add the branched scikit-learn location to your python path and execute your regression using the modified library code. Be sure to post your experiences and any issues you encounter; I'm sure the scikit developers would appreciate it.