Deflated Sharpe Ratio (DSR)

The probability that a strategy's Sharpe ratio is real once you correct for the fact that it was selected as the best (or one of several) among many optimization methods tried on the same data. It is the Probabilistic Sharpe Ratio measured against a deflated benchmark that accounts for multiple testing.

Introduced by Bailey and López de Prado (2014) as the selection-bias complement to the Sharpe Ratio (SR), the Probabilistic Sharpe Ratio (PSR), and the Sharpe Stability Ratio (SSR).

The Problem It Solves

When you run several portfolio optimization methods on the same universe and the same history, then pick the one with the highest Sharpe ratio, that maximum is biased upward. Even if none of the methods has any genuine skill, the best of independent draws from a zero-mean Sharpe distribution will, in expectation, be strictly positive and grow with . Reporting that winning Sharpe as if it were the only thing ever tried overstates its significance. This is the classic multiple-testing (selection bias) problem, also known in this context as backtest overfitting.

The Probabilistic Sharpe Ratio (PSR) already corrects the raw Sharpe for finite-sample uncertainty, non-Normal returns (skewness and kurtosis), and serial correlation. But PSR compares your Sharpe against a single fixed benchmark (usually zero). It has no notion that you tried many methods and kept the best. The Deflated Sharpe Ratio fixes exactly this gap: it raises the benchmark from zero to the expected maximum Sharpe you would have obtained under zero skill across trials, then asks the PSR question against that higher, deflated bar.

When It Appears

DSR is only meaningful when more than one optimization method is compared in a run. The deflated benchmark is defined in terms of the number of trials and the dispersion of their Sharpe ratios, so a single-method run has nothing to deflate against.

N > 1 (two or more methods): the DSR card is shown, alongside the deflated benchmark and the trial count.
N = 1 (single method): no selection took place, so DSR is not reported. In this degenerate case DSR would coincide exactly with PSR, since the deflated benchmark collapses to the ordinary benchmark.

Mathematical Formulation

The deflated benchmark

Let be the number of optimization methods compared, and let be the (annualized) variance of their Sharpe ratios. The expected maximum Sharpe under the null of zero skill across independent trials is approximated by

where is the Euler-Mascheroni constant, is the inverse standard-Normal CDF, and is Euler's number. This is the extreme-value expectation of the maximum of standard-Normal Sharpe draws, rescaled by the observed cross-trial dispersion . Note that grows with both (more trials raise the bar) and (more disagreement between methods raises the bar).

The DSR itself

With the deflated benchmark in hand, the Deflated Sharpe Ratio is simply the PSR evaluated against instead of zero:

where is the selected strategy's annualized Sharpe, is the standard-Normal CDF, and is the higher-moment-adjusted standard error of the Sharpe estimator, the same one used by the PSR:

with the skewness, the kurtosis, and the number of return observations. Setting (which happens when ) recovers the ordinary PSR exactly, which is why DSR is best understood as "PSR against a deflated benchmark."

How To Read It

DSR is a probability in : the probability that the selected method's Sharpe is genuinely above the deflated benchmark rather than a product of selection luck across the trials. Higher is better.

DSR	Verdict	Interpretation
	Survives the selection bar	Strong evidence the Sharpe is real even after accounting for methods being compared.
	Borderline	The selected Sharpe clears the deflated bar on balance, but not at a conventional confidence level.
	Likely selection luck	The Sharpe is at or below what you would expect from the best of skill-free trials; treat it as probably overfit.

The card also surfaces the deflated benchmark (the annualized Sharpe you must beat to be non-spurious), the number of trials , and the cross-trial variance . Comparing your reported Sharpe directly against gives a quick, model-free read: a Sharpe far above the benchmark drives DSR toward one, a Sharpe near or below it drives DSR toward zero.

How DSR Relates to PSR and SSR

DSR, PSR, and SSR are complementary Sharpe-ratio diagnostics. They share the same higher-moment-adjusted standard error but answer different questions.

Metric	Question	Benchmark	Adjustment
SR	Ex-post risk-adjusted return?	None	Mean over volatility
PSR	Credible at this sample?	(fixed)	Higher moments + sample size
DSR	Did I cherry-pick from N methods?	(deflated)	PSR plus number of trials + dispersion
SSR	Stable across time?	on rolling SR	HAC-robust temporal dispersion

The key relationship is that DSR is PSR against a deflated benchmark. PSR asks whether your Sharpe beats zero given sampling noise; DSR raises that bar to , the expected best Sharpe under zero skill across trials. When the bar is zero and DSR equals PSR. SSR is orthogonal to both: a strategy can clear PSR and DSR (credible and not cherry-picked) yet still have a lumpy, episodic time series that SSR flags. Read the three together for a complete picture.

Advantages & Limitations

Advantages

Corrects selection bias directly: penalizes the act of comparing many methods and keeping the best, which PSR ignores.
Closed-form and cheap: requires only the trial count, the cross-trial Sharpe variance, and the same moments PSR already computes.
Nests PSR: reduces exactly to PSR when , so it is a strict, interpretable generalization.
Accounts for non-Normality: inherits the skewness- and kurtosis-adjusted Sharpe standard error from the PSR framework.

Limitations

Trial count is approximate: here is the number of optimization methods in the run, which understates the true number of configurations explored across many runs.
Assumes a Normal extreme-value bound: the approximation is derived under Gaussian Sharpe draws across trials.
Independence assumption: the expected-maximum formula treats trials as independent; strongly correlated methods make the bar conservative.
Not a temporal-stability check: DSR says nothing about whether performance is consistent over time. Pair it with SSR.

References

Bailey, D. H., & López de Prado, M. (2014). "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality." The Journal of Portfolio Management, 40(5), 94-107. doi:10.3905/jpm.2014.40.5.094 (ssrn:2460551).
Bailey, D. H., & López de Prado, M. (2012). "The Sharpe Ratio Efficient Frontier." Journal of Risk, 15(2), 3-44. doi:10.21314/JOR.2012.255.
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley, Chapter 8 (Backtesting through cross-validation) and Chapter 14 (Backtest statistics).