Backtesting Frameworks: Building Your First Trading Strategy

Updated Feb 6, 2026

Why Most Backtests Lie

Run a simple moving average crossover strategy on Apple stock from 2010 to 2020, and you’ll probably see 200%+ returns. Wonderful. Now run it on 2021-2023 data and watch it lose 40%. The problem isn’t the strategy—it’s that you built it to fit the past, not predict the future.

Backtesting frameworks exist to catch this before you lose real money. A good framework forces you to separate in-sample (where you tune parameters) from out-of-sample (where you test if it actually works). It tracks every trade, every commission, every slippage assumption. And if you’re doing it right, it makes you uncomfortable—because seeing your brilliant idea fail on held-out data is supposed to hurt a little.

This part walks through building a complete backtesting system from scratch, then compares it to vectorbt and backtrader. We’ll implement a dual moving average strategy (the simplest thing that could possibly work) and discover why the obvious approach—vectorized operations on the full dataset—produces gorgeous but meaningless results.

Burmese python coiled on a smooth rock surface showcasing its intricate scale patterns. — Photo by Olli on Pexels

The Obvious Approach (That Breaks Everything)

Here’s what everyone tries first. You have a DataFrame with OHLCV data from Part 2, you calculate two moving averages, and you trade when they cross:

import pandas as pd
import numpy as np

# Assume df has columns: ['open', 'high', 'low', 'close', 'volume']
df['sma_fast'] = df['close'].rolling(window=20).mean()
df['sma_slow'] = df['close'].rolling(window=50).mean()

# Buy when fast crosses above slow, sell when it crosses below
df['signal'] = 0
df.loc[df['sma_fast'] > df['sma_slow'], 'signal'] = 1
df.loc[df['sma_fast'] <= df['sma_slow'], 'signal'] = -1

# Calculate returns
df['position'] = df['signal'].shift(1)  # Trade next day
df['strategy_return'] = df['position'] * df['close'].pct_change()
df['cumulative_return'] = (1 + df['strategy_return']).cumprod()

print(f"Final return: {df['cumulative_return'].iloc[-1]:.2%}")

This runs in milliseconds. It’s clean, vectorized, leverages pandas. And it’s completely wrong.

Why? Because df['sma_fast'] at row 100 uses data from rows 80-100. When you’re sitting at row 100 in real time, you don’t have access to the final close price for that day until market close. But this code calculates the moving average using that close, then makes a trade decision. You’ve leaked future information into your backtest. The technical term is look-ahead bias, and it’s responsible for approximately 90% of backtests that look amazing in Jupyter but explode in production.

The fix is painful: you need to iterate row-by-row, making decisions based only on data available before that row. And that means giving up vectorization.

Building a Point-in-Time Backtest Engine

Here’s a minimal event-driven backtester that doesn’t cheat:

class SimpleBacktester:
    def __init__(self, initial_capital=100000, commission=0.001):
        self.initial_capital = initial_capital
        self.commission = commission  # 0.1% per trade
        self.reset()

    def reset(self):
        self.cash = self.initial_capital
        self.position = 0  # shares held
        self.equity_curve = []
        self.trades = []

    def run(self, df, strategy_func):
        """Run backtest row-by-row. strategy_func returns 1 (buy), -1 (sell), 0 (hold)"""
        self.reset()

        for i in range(len(df)):
            # Only use data up to (and including) current row
            history = df.iloc[:i+1].copy()

            if i < 50:  # Need 50 days for slow MA
                self.equity_curve.append(self.cash)
                continue

            current_price = history['close'].iloc[-1]
            signal = strategy_func(history)

            # Execute trades
            if signal == 1 and self.position == 0:  # Buy
                shares_to_buy = int(self.cash / (current_price * (1 + self.commission)))
                if shares_to_buy > 0:
                    cost = shares_to_buy * current_price * (1 + self.commission)
                    self.cash -= cost
                    self.position = shares_to_buy
                    self.trades.append({
                        'date': history.index[-1],
                        'type': 'BUY',
                        'price': current_price,
                        'shares': shares_to_buy,
                        'value': cost
                    })

            elif signal == -1 and self.position > 0:  # Sell
                proceeds = self.position * current_price * (1 - self.commission)
                self.cash += proceeds
                self.trades.append({
                    'date': history.index[-1],
                    'type': 'SELL',
                    'price': current_price,
                    'shares': self.position,
                    'value': proceeds
                })
                self.position = 0

            # Track portfolio value
            portfolio_value = self.cash + (self.position * current_price)
            self.equity_curve.append(portfolio_value)

        return pd.Series(self.equity_curve, index=df.index[len(df)-len(self.equity_curve):])

def sma_crossover_strategy(history, fast=20, slow=50):
    """Returns 1 (buy), -1 (sell), 0 (hold) based on SMA crossover"""
    if len(history) < slow:
        return 0

    sma_fast = history['close'].rolling(window=fast).mean().iloc[-1]
    sma_slow = history['close'].rolling(window=slow).mean().iloc[-1]

    # Check if we just crossed (compare to previous values)
    prev_fast = history['close'].rolling(window=fast).mean().iloc[-2]
    prev_slow = history['close'].rolling(window=slow).mean().iloc[-2]

    if prev_fast <= prev_slow and sma_fast > sma_slow:
        return 1  # Golden cross
    elif prev_fast >= prev_slow and sma_fast < sma_slow:
        return -1  # Death cross

    return 0  # Hold current position

# Usage
bt = SimpleBacktester(initial_capital=100000, commission=0.001)
equity = bt.run(df, lambda h: sma_crossover_strategy(h, fast=20, slow=50))

final_return = (equity.iloc[-1] - 100000) / 100000
print(f"Total return: {final_return:.2%}")
print(f"Number of trades: {len(bt.trades)}")

This takes about 2 seconds to run on 5 years of daily data (roughly 1250 rows). The vectorized version took 10ms. That’s a 200x slowdown, and it’s the price of correctness.

Notice the guard in sma_crossover_strategy: we only trade when the moving averages just crossed, not when one is simply above the other. Without this, you’d generate buy signals on every single day the fast MA is above the slow MA, which would attempt to buy repeatedly while already holding a position. (In a real system, you’d track position state more carefully, but this works for a demo.)

What Actually Matters: Performance Metrics

Returns alone are useless. A strategy that makes 50% but crashes 80% along the way is worse than one that makes 30% with a 15% max drawdown. Here’s what you actually need to measure:

def calculate_metrics(equity_curve, trades_df, risk_free_rate=0.02):
    """Calculate standard backtest metrics"""
    returns = equity_curve.pct_change().dropna()

    # Total return
    total_return = (equity_curve.iloc[-1] / equity_curve.iloc[0]) - 1

    # Annualized return (assumes daily data)
    days = len(equity_curve)
    years = days / 252  # Trading days per year
    annualized_return = (1 + total_return) ** (1 / years) - 1

    # Volatility (annualized)
    volatility = returns.std() * np.sqrt(252)

    # Sharpe ratio
    excess_return = annualized_return - risk_free_rate
    sharpe = excess_return / volatility if volatility > 0 else 0

    # Maximum drawdown
    cummax = equity_curve.cummax()
    drawdown = (equity_curve - cummax) / cummax
    max_drawdown = drawdown.min()

    # Win rate (requires trade log)
    if len(trades_df) > 1:
        # Match buys with sells to calculate P&L per trade
        buy_trades = trades_df[trades_df['type'] == 'BUY'].copy()
        sell_trades = trades_df[trades_df['type'] == 'SELL'].copy()

        trade_returns = []
        for i in range(min(len(buy_trades), len(sell_trades))):
            buy_price = buy_trades.iloc[i]['price']
            sell_price = sell_trades.iloc[i]['price']
            ret = (sell_price - buy_price) / buy_price
            trade_returns.append(ret)

        win_rate = sum(1 for r in trade_returns if r > 0) / len(trade_returns) if trade_returns else 0
    else:
        win_rate = 0

    return {
        'Total Return': f"{total_return:.2%}",
        'Annualized Return': f"{annualized_return:.2%}",
        'Volatility': f"{volatility:.2%}",
        'Sharpe Ratio': f"{sharpe:.2f}",
        'Max Drawdown': f"{max_drawdown:.2%}",
        'Win Rate': f"{win_rate:.2%}",
        'Total Trades': len(trades_df)
    }

trades_df = pd.DataFrame(bt.trades)
metrics = calculate_metrics(equity, trades_df, risk_free_rate=0.02)
for k, v in metrics.items():
    print(f"{k}: {v}")

The Sharpe ratio formula is:

$\text{Sharpe} = \frac{R_p – R_f}{\sigma_p}$

where $R_p$ is portfolio return, $R_f$ is risk-free rate (usually T-bills, around 2-4%), and $\sigma_p$ is portfolio volatility. A Sharpe above 1.0 is decent, above 2.0 is excellent, above 3.0 is suspiciously good (check for bugs).

Maximum drawdown is the largest peak-to-trough decline:

$\text{MDD} = \max_{t} \left( \frac{\text{Peak}_t – \text{Trough}_t}{\text{Peak}_t} \right)$

If your equity curve hit \$150k then dropped to \$100k, that’s a 33% drawdown. Professional funds often have risk limits like “shut down the strategy if MDD exceeds 20%.”

Win rate is deceptive. A strategy with 90% win rate can still lose money if the 10% of losing trades are catastrophic. You’d rather have 40% win rate with a 3:1 reward-to-risk ratio.

Using vectorbt: When You Need Speed

Writing your own backtester is educational, but for production work you want something battle-tested. vectorbt brings back vectorization while avoiding look-ahead bias through careful API design:

import vectorbt as vbt
import yfinance as yf

# Download data (vectorbt plays nicely with yfinance)
data = yf.download('AAPL', start='2020-01-01', end='2024-01-01')['Close']

# Calculate indicators
fast_ma = vbt.MA.run(data, window=20)
slow_ma = vbt.MA.run(data, window=50)

# Generate entry/exit signals
entries = fast_ma.ma_crossed_above(slow_ma)
exits = fast_ma.ma_crossed_below(slow_ma)

# Run backtest
pf = vbt.Portfolio.from_signals(
    data,
    entries,
    exits,
    init_cash=100000,
    fees=0.001,  # 0.1% commission
    freq='1D'
)

print(f"Total return: {pf.total_return():.2%}")
print(f"Sharpe ratio: {pf.sharpe_ratio():.2f}")
print(f"Max drawdown: {pf.max_drawdown():.2%}")
print(f"Win rate: {pf.trades.win_rate:.2%}")

# Plot equity curve
pf.plot().show()

This runs in under 100ms on the same dataset that took our custom backtester 2 seconds. How? vectorbt pre-computes the entire signal array using ma_crossed_above(), which looks at $\text{MA}_{t-1}$ and $\text{MA}_t$ to detect crosses—no future data leakage. Then it simulates trades in a compiled loop (uses Numba under the hood).

The Portfolio object gives you everything: Sharpe, Sortino, Calmar ratio, trade analysis, drawdown periods. You can even run walk-forward optimization:

# Test multiple parameter combinations
windows = np.arange(10, 100, 10)  # Fast MA from 10 to 90
results = []

for fast_window in windows:
    fast_ma = vbt.MA.run(data, window=fast_window)
    slow_ma = vbt.MA.run(data, window=50)

    entries = fast_ma.ma_crossed_above(slow_ma)
    exits = fast_ma.ma_crossed_below(slow_ma)

    pf = vbt.Portfolio.from_signals(data, entries, exits, init_cash=100000, fees=0.001)
    results.append({
        'fast_window': fast_window,
        'sharpe': pf.sharpe_ratio(),
        'return': pf.total_return()
    })

results_df = pd.DataFrame(results)
print(results_df.sort_values('sharpe', ascending=False).head())

This kind of parameter sweep would take minutes with our custom backtester. vectorbt does it in seconds. But—and this is critical—don’t optimize on your entire dataset. Split into train (e.g., 2020-2022) and test (2023-2024). Find the best parameters on train, then verify they still work on test. If the Sharpe drops from 2.5 to 0.3, you overfit.

When to Use backtrader Instead

If you need more control—custom order types, stop losses, multi-asset portfolios—backtrader is the go-to. It’s event-driven like our custom engine, but with a mature API:

import backtrader as bt

class SMAStrategy(bt.Strategy):
    params = (
        ('fast', 20),
        ('slow', 50),
    )

    def __init__(self):
        self.sma_fast = bt.indicators.SimpleMovingAverage(
            self.data.close, period=self.params.fast
        )
        self.sma_slow = bt.indicators.SimpleMovingAverage(
            self.data.close, period=self.params.slow
        )
        self.crossover = bt.indicators.CrossOver(self.sma_fast, self.sma_slow)

    def next(self):
        if not self.position:  # Not in the market
            if self.crossover > 0:  # Fast crossed above slow
                self.buy()
        elif self.crossover < 0:  # Fast crossed below slow
            self.close()

# Set up cerebro engine
cerebro = bt.Cerebro()
cerebro.addstrategy(SMAStrategy, fast=20, slow=50)

# Load data (backtrader expects specific format)
data = bt.feeds.PandasData(dataname=df)  # df from Part 2
cerebro.adddata(data)

cerebro.broker.setcash(100000)
cerebro.broker.setcommission(commission=0.001)

print(f"Starting portfolio value: {cerebro.broker.getvalue():.2f}")
cerebro.run()
print(f"Final portfolio value: {cerebro.broker.getvalue():.2f}")

cerebro.plot()  # Generates matplotlib chart

The next() method gets called on every bar (daily candle in this case). You have access to self.data.close[0] (today’s close), self.data.close[-1] (yesterday’s close), etc. No risk of look-ahead bias because the framework only passes historical data to each next() call.

What makes backtrader powerful is the ecosystem of built-in indicators (100+), analyzers (Sharpe, drawdown, trade stats), and order types. Want a trailing stop loss?

def next(self):
    if not self.position:
        if self.crossover > 0:
            self.buy()
            self.sell(exectype=bt.Order.StopTrail, trailpercent=0.05)  # 5% trailing stop
    elif self.crossover < 0:
        self.close()

That one line adds a trailing stop that automatically adjusts as price moves in your favor. Implementing this in our custom backtester would take 50+ lines and probably have bugs.

But backtrader is verbose. For simple strategies, the boilerplate overwhelms the logic. And it’s slower than vectorbt—iterating bar-by-bar in Python has overhead that vectorized NumPy avoids.

The Walk-Forward Gotcha Nobody Mentions

Here’s a mistake I see constantly: someone optimizes parameters on 2020-2023 data, gets a Sharpe of 2.8, then tests on 2024 and gets 0.4. They conclude the strategy doesn’t work. But what actually happened is they overfit the specific volatility regime of 2020-2023 (COVID crash, recovery, rate hikes).

The fix is walk-forward analysis: optimize on a rolling window and test on the period immediately after.

# Pseudo-code for walk-forward
train_window = 252 * 2  # 2 years
test_window = 252 // 4  # 3 months
step = 252 // 4  # Re-optimize every 3 months

for start in range(0, len(data) - train_window - test_window, step):
    train_data = data[start:start + train_window]
    test_data = data[start + train_window:start + train_window + test_window]

    # Optimize parameters on train_data
    best_params = optimize(train_data)  # Returns (fast_window, slow_window)

    # Test on out-of-sample data
    test_sharpe = backtest(test_data, best_params)

    # Record result
    walk_forward_results.append(test_sharpe)

avg_oos_sharpe = np.mean(walk_forward_results)
print(f"Average out-of-sample Sharpe: {avg_oos_sharpe:.2f}")

If the average out-of-sample Sharpe is above 1.0, the strategy might be legit. Below 0.5, it’s probably curve-fit noise.

But even this isn’t bulletproof. If you test 100 different strategies and only publish the one with the best walk-forward Sharpe, you’ve just done selection bias at a higher level. The only real test is live trading with small capital—and that’s terrifying because it means risking actual money to validate your code.

Slippage and Other Lies We Tell Ourselves

Our backtest assumes you can always buy/sell at the close price with a fixed 0.1% commission. In reality:

Slippage: On a market order, you get filled at the ask (buying) or bid (selling), not the midpoint. For liquid stocks like AAPL, maybe 1-2 basis points. For small caps, could be 20+ bps.
Partial fills: If you’re trading \$1M of a low-volume stock, your order moves the market. You won’t get filled at a single price.
After-hours gaps: Your backtest sees a signal at market close and assumes you trade at that price. In reality, you submit an order after hours and it executes at tomorrow’s open, which could gap 2% against you.

vectorbt and backtrader both let you model slippage:

# vectorbt
pf = vbt.Portfolio.from_signals(
    data, entries, exits,
    init_cash=100000,
    fees=0.001,
    slippage=0.0005  # 5 bps per trade
)

# backtrader
cerebro.broker.set_slippage_perc(0.0005)  # 0.05%

Add 5 bps slippage and 10 bps commission, and your “profitable” strategy might turn breakeven. This is why high-frequency strategies need Sharpe ratios above 5—tiny execution costs destroy the edge.

What I’d Actually Use

For quick experiments: vectorbt. It’s fast, the API is intuitive, and it has excellent visualization (heatmaps, tear sheets, trade plots). If I’m testing whether a simple idea has any merit, I can go from data to backtest in 10 lines.

For serious strategies: backtrader. The verbosity is annoying, but you get precise control over order execution, position sizing, and multi-timeframe logic. When you’re about to risk real money, you want the framework to handle edge cases (halted stocks, dividends, stock splits) without you having to think about it.

For learning: build your own. The exercise of implementing commission, slippage, and drawdown tracking forces you to understand what’s actually happening. Every professional quant I know has written at least one backtester from scratch, not because it’s better than existing tools, but because it clarifies the assumptions you’re making.

Next up: risk management and portfolio optimization. Because even if your backtest looks amazing, putting 100% of your capital into a single strategy is a great way to blow up spectacularly. We’ll cover position sizing, Kelly criterion, and why diversification actually matters when the correlations go to 1.0 during a crash (which is exactly when you need diversification to work).

Quant Investment with Python Series (4/8)

← Previous: Technical Indicators and Feature Engineering with Pandas for Quant Trading Next: Risk Management and Portfolio Optimization Techniques in Python →

Did you find this helpful?

☕ Buy me a coffee

Backtesting Frameworks: Building Your First Trading Strategy

Why Most Backtests Lie

The Obvious Approach (That Breaks Everything)

Building a Point-in-Time Backtest Engine

What Actually Matters: Performance Metrics

Using vectorbt: When You Need Speed

When to Use backtrader Instead

The Walk-Forward Gotcha Nobody Mentions

Slippage and Other Lies We Tell Ourselves

What I’d Actually Use

Comments

Leave a Reply Cancel reply

Backtesting Frameworks: Building Your First Trading Strategy

Why Most Backtests Lie

The Obvious Approach (That Breaks Everything)

Building a Point-in-Time Backtest Engine

What Actually Matters: Performance Metrics

Using vectorbt: When You Need Speed

When to Use backtrader Instead

The Walk-Forward Gotcha Nobody Mentions

Slippage and Other Lies We Tell Ourselves

What I’d Actually Use

Related Posts

Comments

Leave a Reply Cancel reply