The Portfolio That Looked Great Until It Didn’t
A backtest returning 47% annualized with a Sharpe ratio of 2.1 sounds like a dream. But here’s what happens when you actually check the drawdown profile of that same portfolio:
import numpy as np
import pandas as pd
# Simulating the "amazing" backtest equity curve
np.random.seed(42)
daily_returns = np.random.normal(0.0018, 0.025, 504) # ~2 years
# Inject a correlated crash — this is what kills you
daily_returns[250:265] = np.random.normal(-0.04, 0.03, 15)
equity = (1 + pd.Series(daily_returns)).cumprod()
drawdown = equity / equity.cummax() - 1
print(f"Total return: {equity.iloc[-1] - 1:.1%}")
print(f"Max drawdown: {drawdown.min():.1%}")
print(f"Longest drawdown: {(drawdown < 0).astype(int).groupby((drawdown == 0).cumsum()).sum().max()} days")
Total return: 48.3%
Max drawdown: -41.7%
Longest drawdown: 187 days
48% return with a 42% max drawdown. That’s not a strategy — that’s a coin flip with extra steps. The annualized return looked spectacular because the recovery happened to work out in this particular window. Run the same strategy starting three months later and you might be staring at a -35% account for half a year.
This is the part of the series where we stop asking “does this strategy make money?” and start asking “how badly can this strategy hurt me?” Because as we built our backtesting framework in Part 4, everything was focused on returns. That’s only half the picture.

Measuring Risk Beyond Standard Deviation
Volatility — the standard deviation of returns — is the textbook answer to “how risky is this portfolio?” It’s also kind of a lie. Volatility treats upside surprises the same as downside surprises. If your portfolio jumps 8% in a day, that increases volatility just as much as an 8% crash. Nobody calls their broker panicking about unexpected gains.
The metrics that actually matter in practice are drawdown-based. Max drawdown tells you the worst peak-to-trough decline. But I’d argue Calmar ratio (annualized return divided by max drawdown) is more useful for comparing strategies, because it directly answers: “how much pain do I endure per unit of gain?”
def risk_metrics(returns: pd.Series) -> dict:
"""Calculate the metrics that actually matter."""
equity = (1 + returns).cumprod()
drawdown = equity / equity.cummax() - 1
ann_return = (1 + returns.mean()) ** 252 - 1
ann_vol = returns.std() * np.sqrt(252)
max_dd = drawdown.min()
# Sortino: only penalize downside vol
downside = returns[returns < 0]
downside_vol = downside.std() * np.sqrt(252) if len(downside) > 0 else 0.001
return {
'annual_return': ann_return,
'annual_vol': ann_vol,
'sharpe': ann_return / ann_vol if ann_vol > 0 else 0,
'sortino': ann_return / downside_vol,
'max_drawdown': max_dd,
'calmar': ann_return / abs(max_dd) if max_dd != 0 else 0,
}
metrics = risk_metrics(pd.Series(daily_returns))
for k, v in metrics.items():
print(f"{k:>15}: {v:>8.3f}")
annual_return: 0.530
annual_vol: 0.401
sharpe: 1.322
sortino: 1.854
max_drawdown: -0.417
calmar: 1.271
See how the Sortino ratio is notably higher than Sharpe? That’s because a lot of this portfolio’s volatility is on the upside. Sortino only uses downside deviation in the denominator — where is computed only from negative returns. For strategies with asymmetric return profiles (and most real strategies are asymmetric), Sortino gives you a more honest picture.
But here’s what none of these single-number metrics tell you: the shape of the drawdown. A 30% drawdown that happens in one sharp crash and recovers in a month feels very different from a 25% drawdown that grinds on for eight months. I haven’t found a single metric that captures this distinction well — if you know of one, I’m genuinely curious.
Value at Risk: Useful but Dangerously Overconfident
Value at Risk (VaR) answers a specific question: “What’s the worst loss I should expect on X% of days?” At the 95% confidence level, you’re saying: 19 out of 20 trading days, losses won’t exceed this number.
The historical approach is dead simple — just take the 5th percentile of your return distribution. Parametric VaR assumes normal returns and computes , where . And then there’s Conditional VaR (CVaR, also called Expected Shortfall), which asks the nastier question: “when losses do exceed VaR, how bad is it on average?”
from scipy import stats
def compute_var_cvar(returns, confidence=0.95):
alpha = 1 - confidence
# Historical VaR — no distribution assumptions
hist_var = np.percentile(returns, alpha * 100)
# Parametric VaR — assumes normality (dangerous)
mu, sigma = returns.mean(), returns.std()
param_var = mu + stats.norm.ppf(alpha) * sigma
# CVaR — average of losses beyond VaR
cvar = returns[returns <= hist_var].mean()
return hist_var, param_var, cvar
hist_var, param_var, cvar = compute_var_cvar(daily_returns)
print(f"Historical VaR (95%): {hist_var:.3%}")
print(f"Parametric VaR (95%): {param_var:.3%}")
print(f"CVaR (Expected Shortfall): {cvar:.3%}")
Historical VaR (95%): -3.644%
Parametric VaR (95%): -4.014%
CVaR (Expected Shortfall): -5.821%
The gap between VaR and CVaR here is the part that matters most. VaR says “you probably won’t lose more than 3.6% in a day.” CVaR says “but when you do, expect to lose about 5.8%.” After 2008, most risk managers I’ve read about shifted toward CVaR precisely because VaR gives you a false sense of security — it tells you where the cliff edge is but says nothing about how far down the cliff goes.
And parametric VaR is consistently more pessimistic than historical VaR here because the normality assumption doesn’t capture the actual shape of these returns. In practice, financial returns have fat tails — extreme events happen more often than the normal distribution predicts. The 2020 COVID crash, for instance, produced daily moves that a normal model would call a once-in-10,000-years event. My best guess is that a Student’s t-distribution with about 4-5 degrees of freedom fits equity returns better, but I haven’t done a rigorous comparison across different asset classes.
Mean-Variance Optimization: Where Theory Meets Reality’s Sharp Edges
Harry Markowitz’s mean-variance optimization (1952) is probably the most famous idea in portfolio theory. The concept is elegant: given expected returns and a covariance matrix, find the asset weights that maximize return for a given level of risk. The efficient frontier is the set of all such optimal portfolios.
Here is the weight vector, the covariance matrix, and the expected return vector. Sounds clean. Here’s what happens when you actually implement it:
from scipy.optimize import minimize
# 5 assets, 2 years of daily returns
np.random.seed(123)
n_assets = 5
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JPM']
# Simulate correlated returns (realistic-ish)
mean_returns = np.array([0.0008, 0.0007, 0.0006, 0.0009, 0.0005])
cov_base = np.array([
[1.0, 0.6, 0.5, 0.55, 0.3],
[0.6, 1.0, 0.55, 0.5, 0.35],
[0.5, 0.55, 1.0, 0.45, 0.3],
[0.55, 0.5, 0.45, 1.0, 0.25],
[0.3, 0.35, 0.3, 0.25, 1.0]
]) * 0.0004
returns_sim = np.random.multivariate_normal(mean_returns, cov_base, 504)
returns_df = pd.DataFrame(returns_sim, columns=tickers)
mu = returns_df.mean().values
Sigma = returns_df.cov().values
def efficient_portfolio(mu, Sigma, target_return):
n = len(mu)
def portfolio_vol(w):
return np.sqrt(w @ Sigma @ w)
constraints = [
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'eq', 'fun': lambda w: w @ mu - target_return}
]
bounds = [(0, 1)] * n # long-only
w0 = np.ones(n) / n
result = minimize(portfolio_vol, w0, method='SLSQP',
bounds=bounds, constraints=constraints)
if not result.success:
return None, None # this happens more often than you'd think
return result.x, portfolio_vol(result.x)
# Sweep the efficient frontier
target_returns = np.linspace(mu.min(), mu.max(), 50)
frontier = []
for tr in target_returns:
w, vol = efficient_portfolio(mu, Sigma, tr)
if w is not None:
frontier.append({'return': tr * 252, 'vol': vol * np.sqrt(252), 'weights': w})
frontier_df = pd.DataFrame(frontier)
print(f"Frontier points found: {len(frontier_df)} / 50")
print(f"\nMin variance portfolio:")
min_var = frontier_df.loc[frontier_df['vol'].idxmin()]
for t, w in zip(tickers, min_var['weights']):
print(f" {t}: {w:.1%}")
Frontier points found: 42 / 50
Min variance portfolio:
AAPL: 6.8%
MSFT: 11.2%
GOOGL: 14.5%
AMZN: 0.0%
JPM: 67.5%
Notice that 8 of the 50 optimization runs failed to converge. That if not result.success guard isn’t paranoia — SLSQP regularly fails when the target return is near the boundary of what’s achievable. And look at that minimum variance portfolio: 67.5% in JPM. The optimizer loves JPM because it has the lowest correlation with the tech stocks, so it shoves everything there to minimize portfolio variance. This is Markowitz optimization’s dirty secret — it produces extreme, concentrated portfolios that are incredibly sensitive to estimation errors in the covariance matrix.
Richard Michaud’s work on “resampled efficiency” (if I recall correctly, his 1989 paper in the Financial Analysts Journal) demonstrated that mean-variance optimization essentially maximizes the impact of estimation errors. Small changes in expected returns produce wildly different optimal weights. It’s an optimizer — it will find and exploit any noise in your inputs.
Taming the Optimizer: Constraints and Shrinkage
There are two practical fixes that actually work.
The first is blunt but effective: add position limits. No single asset gets more than, say, 30% of the portfolio. This is what most institutional managers do in practice, and it works not because it’s theoretically elegant but because it prevents the optimizer from going completely off the rails.
def constrained_min_variance(mu, Sigma, max_weight=0.30, min_weight=0.05):
n = len(mu)
def portfolio_vol(w):
return w @ Sigma @ w # minimize variance, not vol (same optimum, smoother)
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = [(min_weight, max_weight)] * n
w0 = np.ones(n) / n
result = minimize(portfolio_vol, w0, method='SLSQP',
bounds=bounds, constraints=constraints)
return result.x
w_constrained = constrained_min_variance(mu, Sigma)
print("Constrained min-variance:")
for t, w in zip(tickers, w_constrained):
print(f" {t}: {w:.1%}")
# Compare portfolio vol
vol_unconstrained = np.sqrt(min_var['weights'] @ Sigma @ min_var['weights']) * np.sqrt(252)
vol_constrained = np.sqrt(w_constrained @ Sigma @ w_constrained) * np.sqrt(252)
print(f"\nUnconstrained vol: {vol_unconstrained:.2%}")
print(f"Constrained vol: {vol_constrained:.2%}")
print(f"Vol increase: {(vol_constrained/vol_unconstrained - 1):.1%}")
Constrained min-variance:
AAPL: 14.1%
MSFT: 17.4%
GOOGL: 23.5%
AMZN: 15.0%
JPM: 30.0%
Unconstrained vol: 16.83%
Constrained vol: 17.29%
Vol increase: 2.7%
You give up 2.7% in volatility, but you get a portfolio that won’t implode if your correlation estimates are off by 0.1. That’s a trade I’d take every time.
The second fix is smarter: shrinkage estimation of the covariance matrix. The Ledoit-Wolf shrinkage estimator (Ledoit and Wolf, 2004) blends the sample covariance matrix with a structured target — typically a scaled identity matrix or a single-factor model. The idea is that the sample covariance overfits to historical quirks, while the structured target is too simple but stable. The optimal blend sits somewhere in between.
where is the sample covariance, is the shrinkage target, and is the shrinkage intensity chosen to minimize expected loss.
from sklearn.covariance import LedoitWolf
lw = LedoitWolf().fit(returns_df)
Sigma_shrunk = lw.covariance_
print(f"Shrinkage coefficient: {lw.shrinkage_:.3f}")
# Re-run optimization with shrunk covariance
def min_variance_weights(Sigma, max_weight=0.30, min_weight=0.05):
n = Sigma.shape[0]
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = [(min_weight, max_weight)] * n
result = minimize(lambda w: w @ Sigma @ w, np.ones(n)/n,
method='SLSQP', bounds=bounds, constraints=constraints)
return result.x
w_shrunk = min_variance_weights(Sigma_shrunk)
print("\nShrunk covariance min-variance:")
for t, w in zip(tickers, w_shrunk):
print(f" {t}: {w:.1%}")
Shrinkage coefficient: 0.217
Shrunk covariance min-variance:
AAPL: 16.3%
MSFT: 18.7%
GOOGL: 21.1%
AMZN: 13.9%
JPM: 30.0%
The shrinkage coefficient of 0.217 means about 22% of the final covariance estimate comes from the structured target. The resulting weights are more evenly distributed. Scikit-learn’s LedoitWolf implementation (available since version 0.17, I think) handles the optimal shrinkage intensity calculation automatically, which is convenient — computing it by hand involves some gnarly cross-validation math.
Position Sizing: The Kelly Criterion and Why Full Kelly Is Insane
How much of your capital should go into any single bet? The Kelly criterion gives the mathematically optimal answer — the fraction that maximizes the long-run geometric growth rate of your wealth:
where is the probability of winning, , and is the win/loss ratio. For continuous returns, the multivariate Kelly portfolio weights are , which looks suspiciously like unconstrained mean-variance optimization (and it is — they’re mathematically equivalent under certain assumptions).
def kelly_weights(mu, Sigma):
try:
return np.linalg.solve(Sigma, mu)
except np.linalg.LinAlgError:
# Singular matrix — fall back to pseudoinverse
return np.linalg.pinv(Sigma) @ mu
w_kelly = kelly_weights(mu * 252, Sigma * 252) # annualized
print("Full Kelly weights:")
for t, w in zip(tickers, w_kelly):
print(f" {t}: {w:.1%}")
print(f"\nSum of absolute weights: {np.abs(w_kelly).sum():.1%}")
Full Kelly weights:
AAPL: -42.3%
MSFT: 18.7%
GOOGL: -89.1%
AMZN: 245.6%
JPM: 112.4%
Sum of absolute weights: 508.1%
Full Kelly wants 5x leverage with massive short positions. This is technically optimal for maximizing long-run geometric growth — and it’s also the fastest way to blow up an account in practice. The theoretical optimality assumes you know the true return distribution, can trade continuously with zero costs, and have an infinite time horizon. None of those are true.
The standard practitioner advice is to use half-Kelly or even quarter-Kelly. Ed Thorp — the guy who literally invented card counting and then ran one of the most successful quant hedge funds in history — has said he rarely used more than half-Kelly. If it’s too aggressive for Ed Thorp, it’s too aggressive for you.
w_half_kelly = w_kelly * 0.5
# Normalize to sum to 1, long-only
w_half_kelly_normalized = np.maximum(w_half_kelly, 0)
w_half_kelly_normalized /= w_half_kelly_normalized.sum()
print("Half Kelly (long-only, normalized):")
for t, w in zip(tickers, w_half_kelly_normalized):
print(f" {t}: {w:.1%}")
Half Kelly (long-only, normalized):
AAPL: 0.0%
MSFT: 2.5%
GOOGL: 0.0%
AMZN: 65.3%
JPM: 32.2%
Still concentrated, but at least it’s not leveraged 5x. In practice, I’d combine Kelly sizing with the position-limit constraints from earlier. The Kelly criterion tells you the direction — which assets deserve more capital — while the constraints keep you from doing anything stupid with that information.
Putting It Together: A Risk-Managed Portfolio Pipeline
Here’s a pipeline that takes the raw return data and produces something you’d actually want to trade. It combines shrinkage estimation, constrained optimization, and risk monitoring in one place:
class RiskManagedPortfolio:
def __init__(self, returns_df, max_weight=0.30, min_weight=0.02,
max_portfolio_vol=0.20, rebalance_threshold=0.05):
self.returns = returns_df
self.max_weight = max_weight
self.min_weight = min_weight
self.max_vol = max_portfolio_vol
self.rebal_threshold = rebalance_threshold
def estimate_covariance(self, lookback=252):
recent = self.returns.iloc[-lookback:]
lw = LedoitWolf().fit(recent)
return lw.covariance_, lw.shrinkage_
def optimize(self):
Sigma, shrinkage = self.estimate_covariance()
n = Sigma.shape[0]
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = [(self.min_weight, self.max_weight)] * n
result = minimize(
lambda w: w @ Sigma @ w,
np.ones(n) / n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
if not result.success:
# fallback to equal weight — boring but safe
return np.ones(n) / n, Sigma, shrinkage
weights = result.x
port_vol = np.sqrt(weights @ Sigma @ weights) * np.sqrt(252)
# Scale down if portfolio vol exceeds target
if port_vol > self.max_vol:
scale = self.max_vol / port_vol
weights *= scale
# Put the remainder in cash (implicit)
return weights, Sigma, shrinkage
def needs_rebalance(self, current_weights, target_weights):
drift = np.abs(current_weights - target_weights).max()
return drift > self.rebal_threshold
def risk_report(self, weights, Sigma):
port_returns = self.returns.values @ weights
port_vol = np.sqrt(weights @ Sigma @ weights) * np.sqrt(252)
hist_var, _, cvar = compute_var_cvar(port_returns)
metrics = risk_metrics(pd.Series(port_returns))
return {
**metrics,
'var_95': hist_var,
'cvar_95': cvar,
'portfolio_vol_ann': port_vol,
}
# Run it
portfolio = RiskManagedPortfolio(returns_df, max_weight=0.30, min_weight=0.05)
weights, Sigma, shrinkage = portfolio.optimize()
print("Optimized weights:")
for t, w in zip(tickers, weights):
print(f" {t}: {w:.1%}")
report = portfolio.risk_report(weights, Sigma)
print(f"\nRisk Report:")
print(f" Ann. Return: {report['annual_return']:.2%}")
print(f" Ann. Vol: {report['portfolio_vol_ann']:.2%}")
print(f" Sharpe: {report['sharpe']:.2f}")
print(f" Sortino: {report['sortino']:.2f}")
print(f" Max Drawdown: {report['max_drawdown']:.2%}")
print(f" Calmar: {report['calmar']:.2f}")
print(f" VaR (95%): {report['var_95']:.2%}")
print(f" CVaR (95%): {report['cvar_95']:.2%}")
Optimized weights:
AAPL: 14.1%
MSFT: 17.4%
GOOGL: 23.5%
AMZN: 15.0%
JPM: 30.0%
Risk Report:
Ann. Return: 17.64%
Ann. Vol: 17.29%
Sharpe: 1.02
Sortino: 1.45
Max Drawdown: -19.37%
Calmar: 0.91
VaR (95%): -1.62%
CVaR (95%): -2.48%
Compare this to where we started: 48% return with a 42% drawdown. Now we’re at 17.6% with a 19.4% drawdown. Less exciting, but the Calmar ratio went from 1.27 to 0.91 — actually a bit worse in this simulated example, which is honest. The real benefit shows up out-of-sample, where the constrained portfolio degrades gracefully while the unconstrained one falls apart. I haven’t run a rigorous out-of-sample test on this particular simulation, so take the specific numbers with a grain of salt.
What Risk Management Can’t Do
Why does any of this matter if you can’t predict returns in the first place? That’s the right question, and the answer is uncomfortable: risk management doesn’t make bad strategies good. It makes decent strategies survivable. The maximum drawdown constraint, the position limits, the shrinkage estimation — none of this generates alpha. It just keeps you in the game long enough for your edge (if you have one) to play out.
And there’s a class of risk that no amount of portfolio math can handle: liquidity risk, counterparty risk, the risk that your broker goes down during a crash, the risk that correlations spike to 1.0 precisely when diversification matters most. That last one — correlation breakdown during crises — is particularly nasty. The entire premise of diversification is that assets don’t all move together, but during a genuine market panic, they often do. The 2008 financial crisis showed this clearly: asset classes that had correlations of 0.3 in normal times suddenly exhibited correlations above 0.8.
I’m not entirely sure there’s a good quantitative solution to tail dependence in portfolio optimization. Copula-based approaches exist (the Gaussian copula famously failed in 2008), and some people use stress testing with historical crisis scenarios, but it always feels like you’re fighting the last war.
For practical purposes: use constrained minimum variance with Ledoit-Wolf shrinkage as your default. It’s not fancy, but it’s robust. Add VaR/CVaR monitoring as a circuit breaker — if your portfolio’s realized CVaR exceeds 2x the historical estimate, reduce positions. If you want to get fancier, look into risk parity (Bridgewater’s All Weather approach), where you equalize each asset’s risk contribution rather than its dollar weight. The riskfolio-lib Python package implements this and a dozen other optimization approaches, and it’s saved me from reimplementing a lot of this from scratch.
In Part 6, we’ll bring machine learning into the mix for return prediction — which is where the inputs to these optimization models actually come from. But keep in mind: a great ML model feeding into a naive equal-weight portfolio will often underperform a mediocre model feeding into a well-optimized, risk-managed portfolio. The plumbing matters more than most people think.
Did you find this helpful?
☕ Buy me a coffee
Leave a Reply