Stacking Optimization
A meta-learning ensemble that combines several base portfolio optimisers into a single allocation through a cross-validated meta-stage. Diversifies model risk across optimisers with different inductive biases — mean-variance, hierarchical, risk-parity, factor — rather than committing the entire portfolio to one structural assumption.
Overview
Stacked generalisation was introduced by Wolpert (1992) and refined by Breiman (1996) as a strategy for combining the predictions of multiple base learners. A held-out fold is used to score each base learner, and a meta-learner then maps the base predictions to a final prediction. The ensemble outperforms the best base learner whenever the base learners make different mistakes — which is exactly the situation in portfolio optimisation, where different optimisers are sensitive to different aspects of the data (means, covariances, correlations, regime structure).
FolioLab implements stacking via skfolio's StackingOptimization. Each base estimator (e.g. Mean Variance, HRP, Risk Parity, Inverse Volatility, Maximum Diversification) is fit on cross-validated folds of the training data; the meta-stage then learns the convex combination of base portfolios that minimises out-of-fold portfolio variance (or another configurable risk measure). The final allocation is the meta-weighted blend of the base allocations.
The meta-stage in skfolio is itself an optimisation problem — commonly a mean-variance optimisation over the panel of base-portfolio returns — so the entire stack is a two-level convex programme. Cross-validation is essential: it prevents the meta-stage from overfitting to the same in-sample data the base estimators saw.
Mathematical Formulation
Notation
- — set of base optimisers
- — weights produced by base on training fold
- — out-of-fold returns of base on the held-out fold
- — meta-weights, one per base optimiser
Stage 1: cross-validated base outputs
For each fold and each base optimiser , fit on the training portion of fold , then evaluate the resulting weights on the held-out portion to get the out-of-fold return series . Stack these returns into an matrix of out-of-fold returns.
Stage 2: meta-optimisation
is the covariance of the out-of-fold base-portfolio returns. The meta-stage chooses a long-only convex combination of base portfolios that minimises out-of-fold variance (skfolio also supports CVaR and other risk measures at the meta-stage). This is a low-dimensional QP — is typically 3 to 6 — and is fast.
Final allocation
is the base- portfolio fit on the full training history. The final stacked portfolio is the meta-weighted combination. Long-only and budget constraints on propagate to by convex combination.
Why stacking works
Each base optimiser encodes a structural assumption: MVO assumes accurate and ; HRP assumes hierarchical clustering captures the dependence structure; Risk Parity assumes equal-risk allocation is desirable; Inverse Volatility assumes correlations can be ignored. When one assumption fails the corresponding base portfolio degrades. As long as the failures are partly idiosyncratic, the meta-stage can downweight the failing base in real time.
On Indian equity universes the typical stacking blend produces meta-weights of roughly 30-40% on HRP, 20-30% on Risk Parity, 15-20% on Min-Variance, and the rest spread across the remaining bases — with the exact split depending on the regime in the training window.
Advantages & Limitations
Advantages
- Model-risk diversification: No single inductive bias dominates.
- Adaptive blending: Meta-weights adjust to which base is currently working.
- Convex combination: Constraints on the bases propagate to the stack.
- Out-of-fold scoring: Reduces the in-sample bias of the meta-stage.
Limitations
- Computational cost: base fits per training pass.
- Cross-validation choice matters: Walk-forward folds are more honest than random folds for time-series data.
- Black-box weights: Final weights are harder to attribute to a single rationale.
- Meta-stage is itself an optimiser: Inherits its own estimation noise.
References
- Wolpert, D. H. (1992). "Stacked Generalization." Neural Networks, 5(2), 241-259.
- Breiman, L. (1996). "Stacked Regressions." Machine Learning, 24(1), 49-64.
- van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). "Super Learner." Statistical Applications in Genetics and Molecular Biology, 6(1).
- skfolio documentation —
skfolio.optimization.StackingOptimization.