Nested Clustered Optimization

A machine-learning-informed portfolio construction method that uses hierarchical clustering to reduce the dimensionality of the optimization problem and mitigate the instability of traditional mean-variance approaches.

Overview

Nested Clustered Optimization (NCO) addresses the well-known numerical instability of mean-variance optimization by decomposing the full optimization problem into smaller, better-conditioned sub-problems. The method proceeds in three stages: (1) cluster the assets into groups based on their correlation structure, (2) optimize within each cluster independently, and (3) optimize across clusters using the reduced representation. This two-level "nesting" dramatically reduces the effective dimensionality and the condition number of each sub-problem, leading to more stable and diversified portfolios.

NCO builds on the Hierarchical Risk Parity (HRP) framework but replaces the recursive bisection allocation step with a proper optimization at each level, allowing the use of any convex objective function (such as minimum variance, maximum Sharpe ratio, or risk parity) at both the intra-cluster and inter-cluster stages.

Step 1: Correlation-Based Distance and Clustering

Distance Metric

The first step converts the correlation matrix into a proper distance metric. The standard angular distance transforms correlations into distances that satisfy the metric axioms (non-negativity, symmetry, triangle inequality):

Perfectly correlated assets () have distance zero, uncorrelated assets () have distance , and perfectly anti-correlated assets () have distance one.

Hierarchical Clustering

Using the distance matrix, agglomerative hierarchical clustering groups assets into clusters. Common linkage methods include single linkage, complete linkage, and Ward's method. The number of clusters can be determined by the gap statistic, silhouette scores, or a fixed threshold on the dendrogram. Each cluster contains a subset of assets that are more correlated with each other than with assets in other clusters.

Step 2: Intra-Cluster Optimization

For each cluster , a standard portfolio optimization is performed using only the assets in that cluster. Let and denote the expected return vector and covariance matrix restricted to the assets in cluster . The intra-cluster optimal weights are:

where is the chosen objective function (e.g., maximum Sharpe ratio, minimum variance, or risk parity). The key advantage is that each cluster contains far fewer assets than the full universe, so the covariance sub-matrix is much smaller and better conditioned.

Cluster-Level Statistics

Once the intra-cluster weights are determined, each cluster is reduced to a single "meta-asset" with the following return and variance:

These meta-asset statistics form the inputs for the inter-cluster optimization. The covariance between meta-assets and is computed as:

where is the cross-covariance block between clusters and .

Step 3: Inter-Cluster Optimization

The inter-cluster optimization determines the allocation across the meta-assets. Let and denote the -dimensional return vector and covariance matrix of the meta-assets. The inter-cluster weights are:

This is a much smaller optimization problem (dimension instead of ), which is faster to solve and far less prone to numerical instability. The same objective function can be used at both levels, or different objectives can be chosen for each stage.

Final Portfolio Weights

The final weight for each individual asset is obtained by combining the inter-cluster allocation with the intra-cluster weights. For asset belonging to cluster :

where is the inter-cluster weight assigned to cluster , and is the intra-cluster weight of asset within cluster . The weights automatically sum to one if both the intra-cluster and inter-cluster weights are fully invested.

Advantages

Numerical stability: By decomposing into smaller sub-problems, the condition number of each covariance sub-matrix is dramatically reduced, leading to more stable solutions.
Flexibility: Any convex objective function can be used at each level, enabling combinations such as risk parity within clusters and maximum Sharpe across clusters.
Reduced overfitting: The clustering step acts as a regularizer, grouping similar assets and reducing the effective number of free parameters in the optimization.
Interpretability:The two-level structure provides a natural decomposition of portfolio decisions into "within-sector" and "across-sector" choices.

Limitations

Cluster sensitivity: Results depend on the choice of clustering algorithm, linkage method, and number of clusters.
Non-convexity of clustering: The clustering step introduces a discrete, non-convex component that may produce different results with different random seeds.
Information loss: Cross-cluster correlations are summarized through meta-assets, potentially losing information about individual asset interactions across clusters.

References

Lopez de Prado, M. & Lewis, M.J. (2019). "Detection of False Investment Strategies Using Unsupervised Learning Methods." Quantitative Finance, 19(9), 1555-1565.
Lopez de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge University Press, Elements in Quantitative Finance.
Lopez de Prado, M. (2016). "Building Diversified Portfolios that Outperform Out of Sample." The Journal of Portfolio Management, 42(4), 59-69.