Introduction
In the previous episode, we explored how to conduct effective EDA on stock price data, identifying patterns, trends, and anomalies that inform our modeling approach. Now we move into one of the most critical phases of any financial machine learning project: feature engineering.
Feature engineering is where domain knowledge meets data science. Raw stock prices alone rarely capture the complex dynamics of financial markets. By constructing technical indicators, statistical transformations, and temporal features, we create a richer representation that helps our models understand market behavior.
This episode focuses on practical feature engineering techniques specifically designed for financial time-series data, with hands-on examples using real Kaggle datasets.
Setting Up the Environment
Let’s start by loading a financial dataset from Kaggle. We’ll use the S&P 500 Stock Data dataset as our primary example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
# Load sample S&P 500 data
df = pd.read_csv('sp500_stocks.csv')
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values('Date')
df.set_index('Date', inplace=True)
# Focus on a single stock for demonstration
stock_data = df[df['Symbol'] == 'AAPL'].copy()
print(stock_data.head())
Our baseline dataframe typically contains:
– Date: Trading date
– Open: Opening price
– High: Highest price during the day
– Low: Lowest price during the day
– Close: Closing price
– Volume: Number of shares traded
Understanding Returns: The Foundation
Simple Returns vs. Log Returns
Before diving into technical indicators, we must understand how to properly calculate returns. Financial analysts use two primary types:
Simple Return:
Where:
– = return at time
– = price at time
– = price at time
Log Return:
Where:
– = logarithmic return at time
– = natural logarithm
Why prefer log returns?
- Time additivity: Log returns can be summed over time periods
- Statistical properties: More closely approximate normal distribution
- Symmetry: A 50% gain followed by 50% loss returns to original with log returns
- Mathematical convenience: Easier to work with in statistical models
def calculate_returns(df, price_col='Close'):
"""
Calculate both simple and log returns.
Parameters:
-----------
df : pd.DataFrame
DataFrame with price data
price_col : str
Column name containing prices
Returns:
--------
pd.DataFrame
DataFrame with added return columns
"""
# Simple returns
df['simple_return'] = df[price_col].pct_change()
# Log returns
df['log_return'] = np.log(df[price_col] / df[price_col].shift(1))
return df
stock_data = calculate_returns(stock_data)
# Compare distributions
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].hist(stock_data['simple_return'].dropna(), bins=50, alpha=0.7, color='blue')
axes[0].set_title('Simple Returns Distribution')
axes[0].set_xlabel('Return')
axes[1].hist(stock_data['log_return'].dropna(), bins=50, alpha=0.7, color='green')
axes[1].set_title('Log Returns Distribution')
axes[1].set_xlabel('Log Return')
plt.tight_layout()
plt.show()
Technical Indicators: Capturing Market Psychology
Relative Strength Index (RSI)
RSI measures the magnitude of recent price changes to evaluate overbought or oversold conditions. It ranges from 0 to 100.
Where:
– = Average Gain / Average Loss over a period (typically 14 days)
Interpretation:
– RSI > 70: Potentially overbought (sell signal)
– RSI < 30: Potentially oversold (buy signal)
def calculate_rsi(df, column='Close', period=14):
"""
Calculate Relative Strength Index (RSI).
Parameters:
-----------
df : pd.DataFrame
DataFrame with price data
column : str
Column to calculate RSI on
period : int
Lookback period (default 14)
Returns:
--------
pd.Series
RSI values
"""
# Calculate price changes
delta = df[column].diff()
# Separate gains and losses
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)
# Calculate exponential moving averages
avg_gain = gain.ewm(com=period-1, min_periods=period).mean()
avg_loss = loss.ewm(com=period-1, min_periods=period).mean()
# Calculate RS and RSI
rs = avg_gain / avg_loss
rsi = 100 - (100 / (1 + rs))
return rsi
stock_data['RSI_14'] = calculate_rsi(stock_data)
# Visualize RSI with price
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
ax1.plot(stock_data.index, stock_data['Close'], label='Close Price', color='black')
ax1.set_ylabel('Price ($)')
ax1.set_title('Stock Price and RSI Indicator')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax2.plot(stock_data.index, stock_data['RSI_14'], label='RSI (14)', color='purple')
ax2.axhline(y=70, color='r', linestyle='--', label='Overbought (70)')
ax2.axhline(y=30, color='g', linestyle='--', label='Oversold (30)')
ax2.fill_between(stock_data.index, 30, 70, alpha=0.1, color='gray')
ax2.set_ylabel('RSI')
ax2.set_xlabel('Date')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Moving Average Convergence Divergence (MACD)
MACD is a trend-following momentum indicator that shows the relationship between two moving averages.
Where:
– = 12-period exponential moving average
– = 26-period exponential moving average
– = 9-period EMA of MACD (signal line)
Trading signals:
– MACD crosses above signal line: Bullish signal
– MACD crosses below signal line: Bearish signal
def calculate_macd(df, column='Close', fast=12, slow=26, signal=9):
"""
Calculate MACD, Signal line, and Histogram.
Parameters:
-----------
df : pd.DataFrame
DataFrame with price data
column : str
Column to calculate MACD on
fast : int
Fast EMA period (default 12)
slow : int
Slow EMA period (default 26)
signal : int
Signal line period (default 9)
Returns:
--------
tuple
(MACD, Signal, Histogram)
"""
# Calculate EMAs
ema_fast = df[column].ewm(span=fast, adjust=False).mean()
ema_slow = df[column].ewm(span=slow, adjust=False).mean()
# MACD line
macd_line = ema_fast - ema_slow
# Signal line
signal_line = macd_line.ewm(span=signal, adjust=False).mean()
# Histogram
histogram = macd_line - signal_line
return macd_line, signal_line, histogram
stock_data['MACD'], stock_data['MACD_signal'], stock_data['MACD_hist'] = calculate_macd(stock_data)
# Visualize MACD
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
ax1.plot(stock_data.index, stock_data['Close'], label='Close Price', color='black')
ax1.set_ylabel('Price ($)')
ax1.set_title('Stock Price and MACD Indicator')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax2.plot(stock_data.index, stock_data['MACD'], label='MACD', color='blue')
ax2.plot(stock_data.index, stock_data['MACD_signal'], label='Signal', color='red')
ax2.bar(stock_data.index, stock_data['MACD_hist'], label='Histogram', color='gray', alpha=0.3)
ax2.set_ylabel('MACD')
ax2.set_xlabel('Date')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Bollinger Bands
Bollinger Bands measure market volatility and provide relative price levels.
Where:
– = 20-period simple moving average
– = 20-period standard deviation
Interpretation:
– Price near upper band: Potentially overbought
– Price near lower band: Potentially oversold
– Band width indicates volatility
def calculate_bollinger_bands(df, column='Close', period=20, num_std=2):
"""
Calculate Bollinger Bands.
Parameters:
-----------
df : pd.DataFrame
DataFrame with price data
column : str
Column to calculate bands on
period : int
Moving average period (default 20)
num_std : float
Number of standard deviations (default 2)
Returns:
--------
tuple
(Middle band, Upper band, Lower band)
"""
# Middle band (SMA)
middle_band = df[column].rolling(window=period).mean()
# Standard deviation
std = df[column].rolling(window=period).std()
# Upper and lower bands
upper_band = middle_band + (num_std * std)
lower_band = middle_band - (num_std * std)
return middle_band, upper_band, lower_band
stock_data['BB_middle'], stock_data['BB_upper'], stock_data['BB_lower'] = calculate_bollinger_bands(stock_data)
# Calculate bandwidth as a feature
stock_data['BB_bandwidth'] = (stock_data['BB_upper'] - stock_data['BB_lower']) / stock_data['BB_middle']
# Visualize Bollinger Bands
plt.figure(figsize=(14, 7))
plt.plot(stock_data.index, stock_data['Close'], label='Close Price', color='black', linewidth=1.5)
plt.plot(stock_data.index, stock_data['BB_upper'], label='Upper Band', color='red', linestyle='--', alpha=0.7)
plt.plot(stock_data.index, stock_data['BB_middle'], label='Middle Band (SMA 20)', color='blue', linestyle='--', alpha=0.7)
plt.plot(stock_data.index, stock_data['BB_lower'], label='Lower Band', color='green', linestyle='--', alpha=0.7)
plt.fill_between(stock_data.index, stock_data['BB_upper'], stock_data['BB_lower'], alpha=0.1, color='gray')
plt.title('Bollinger Bands')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Rolling Statistics: Capturing Temporal Dynamics
Rolling statistics help capture changing market conditions over time windows.
Rolling Mean and Volatility
def calculate_rolling_features(df, column='Close', windows=[5, 10, 20, 50, 200]):
"""
Calculate rolling statistics for multiple windows.
Parameters:
-----------
df : pd.DataFrame
DataFrame with price data
column : str
Column to calculate statistics on
windows : list
List of window sizes
Returns:
--------
pd.DataFrame
DataFrame with added rolling features
"""
for window in windows:
# Rolling mean
df[f'MA_{window}'] = df[column].rolling(window=window).mean()
# Rolling standard deviation (volatility)
df[f'STD_{window}'] = df[column].rolling(window=window).std()
# Rolling min and max
df[f'MIN_{window}'] = df[column].rolling(window=window).min()
df[f'MAX_{window}'] = df[column].rolling(window=window).max()
# Price position within range
df[f'RANGE_POS_{window}'] = (
(df[column] - df[f'MIN_{window}']) /
(df[f'MAX_{window}'] - df[f'MIN_{window}'])
)
return df
stock_data = calculate_rolling_features(stock_data, windows=[10, 20, 50])
# Visualize multiple moving averages
plt.figure(figsize=(14, 7))
plt.plot(stock_data.index, stock_data['Close'], label='Close Price', color='black', linewidth=1.5)
plt.plot(stock_data.index, stock_data['MA_10'], label='MA 10', alpha=0.7)
plt.plot(stock_data.index, stock_data['MA_20'], label='MA 20', alpha=0.7)
plt.plot(stock_data.index, stock_data['MA_50'], label='MA 50', alpha=0.7)
plt.title('Price with Multiple Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Rolling Returns and Momentum
def calculate_momentum_features(df, price_col='Close', windows=[5, 10, 20, 60]):
"""
Calculate momentum-based features.
Parameters:
-----------
df : pd.DataFrame
DataFrame with price data
price_col : str
Column containing prices
windows : list
List of lookback periods
Returns:
--------
pd.DataFrame
DataFrame with momentum features
"""
for window in windows:
# Momentum (price change over period)
df[f'momentum_{window}'] = df[price_col].diff(window)
# Rate of change
df[f'ROC_{window}'] = df[price_col].pct_change(window)
# Cumulative return over window
df[f'cumulative_return_{window}'] = (
(df[price_col] / df[price_col].shift(window)) - 1
)
return df
stock_data = calculate_momentum_features(stock_data)
Lag Features: Using Historical Information
Lag features allow models to learn from past values. This is crucial for time-series predictions.
def create_lag_features(df, columns, lags=[1, 2, 3, 5, 10]):
"""
Create lagged features for specified columns.
Parameters:
-----------
df : pd.DataFrame
DataFrame with time-series data
columns : list
List of column names to create lags for
lags : list
List of lag periods
Returns:
--------
pd.DataFrame
DataFrame with lagged features
"""
for col in columns:
for lag in lags:
df[f'{col}_lag_{lag}'] = df[col].shift(lag)
return df
# Create lags for key features
lag_columns = ['Close', 'Volume', 'log_return', 'RSI_14']
stock_data = create_lag_features(stock_data, lag_columns, lags=[1, 2, 3, 5, 10])
print(f"Total features created: {len(stock_data.columns)}")
print("\nSample lag features:")
print(stock_data[['Close', 'Close_lag_1', 'Close_lag_2', 'Close_lag_3']].head(10))
Volume-Based Features
Trading volume provides insight into the strength of price movements.
def calculate_volume_features(df):
"""
Calculate volume-based features.
Parameters:
-----------
df : pd.DataFrame
DataFrame with OHLCV data
Returns:
--------
pd.DataFrame
DataFrame with volume features
"""
# Volume moving averages
df['volume_MA_5'] = df['Volume'].rolling(window=5).mean()
df['volume_MA_20'] = df['Volume'].rolling(window=20).mean()
# Volume ratio
df['volume_ratio'] = df['Volume'] / df['volume_MA_20']
# On-Balance Volume (OBV)
df['price_change'] = df['Close'].diff()
df['direction'] = np.where(df['price_change'] > 0, 1, -1)
df['volume_signed'] = df['Volume'] * df['direction']
df['OBV'] = df['volume_signed'].cumsum()
# Volume-Weighted Average Price (VWAP)
df['typical_price'] = (df['High'] + df['Low'] + df['Close']) / 3
df['VWAP'] = (df['typical_price'] * df['Volume']).cumsum() / df['Volume'].cumsum()
# Cleanup temporary columns
df.drop(['price_change', 'direction', 'volume_signed', 'typical_price'], axis=1, inplace=True)
return df
stock_data = calculate_volume_features(stock_data)
Price Action Features
def calculate_price_action_features(df):
"""
Calculate price action and candlestick-based features.
Parameters:
-----------
df : pd.DataFrame
DataFrame with OHLC data
Returns:
--------
pd.DataFrame
DataFrame with price action features
"""
# Daily range
df['daily_range'] = df['High'] - df['Low']
df['daily_range_pct'] = (df['High'] - df['Low']) / df['Close']
# Body size (open to close)
df['body_size'] = abs(df['Close'] - df['Open'])
df['body_size_pct'] = df['body_size'] / df['Close']
# Upper and lower shadows
df['upper_shadow'] = df['High'] - df[['Open', 'Close']].max(axis=1)
df['lower_shadow'] = df[['Open', 'Close']].min(axis=1) - df['Low']
# Gap features
df['gap'] = df['Open'] - df['Close'].shift(1)
df['gap_pct'] = df['gap'] / df['Close'].shift(1)
# True Range (for ATR calculation)
df['high_low'] = df['High'] - df['Low']
df['high_close'] = abs(df['High'] - df['Close'].shift(1))
df['low_close'] = abs(df['Low'] - df['Close'].shift(1))
df['true_range'] = df[['high_low', 'high_close', 'low_close']].max(axis=1)
# Average True Range (ATR)
df['ATR_14'] = df['true_range'].rolling(window=14).mean()
# Cleanup
df.drop(['high_low', 'high_close', 'low_close'], axis=1, inplace=True)
return df
stock_data = calculate_price_action_features(stock_data)
Cyclical Time Features
Markets exhibit cyclical patterns based on time of day, day of week, and month.
def create_time_features(df):
"""
Create cyclical time-based features.
Parameters:
-----------
df : pd.DataFrame
DataFrame with datetime index
Returns:
--------
pd.DataFrame
DataFrame with time features
"""
# Extract time components
df['day_of_week'] = df.index.dayofweek
df['day_of_month'] = df.index.day
df['week_of_year'] = df.index.isocalendar().week
df['month'] = df.index.month
df['quarter'] = df.index.quarter
# Cyclical encoding (sine/cosine transformation)
df['day_of_week_sin'] = np.sin(2 * np.pi * df['day_of_week'] / 7)
df['day_of_week_cos'] = np.cos(2 * np.pi * df['day_of_week'] / 7)
df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)
# Is Monday/Friday (potential day-of-week effects)
df['is_monday'] = (df['day_of_week'] == 0).astype(int)
df['is_friday'] = (df['day_of_week'] == 4).astype(int)
# Is month end/start
df['is_month_start'] = df.index.is_month_start.astype(int)
df['is_month_end'] = df.index.is_month_end.astype(int)
return df
stock_data = create_time_features(stock_data)
Putting It All Together: Feature Engineering Pipeline
class FinancialFeatureEngineer:
"""
Complete feature engineering pipeline for financial time-series.
"""
def __init__(self, rsi_period=14, macd_fast=12, macd_slow=26, macd_signal=9,
bb_period=20, bb_std=2):
self.rsi_period = rsi_period
self.macd_fast = macd_fast
self.macd_slow = macd_slow
self.macd_signal = macd_signal
self.bb_period = bb_period
self.bb_std = bb_std
def fit_transform(self, df):
"""
Apply all feature engineering steps.
Parameters:
-----------
df : pd.DataFrame
Raw OHLCV dataframe
Returns:
--------
pd.DataFrame
Dataframe with engineered features
"""
df = df.copy()
# Returns
df = calculate_returns(df)
# Technical indicators
df['RSI_14'] = calculate_rsi(df, period=self.rsi_period)
df['MACD'], df['MACD_signal'], df['MACD_hist'] = calculate_macd(
df, fast=self.macd_fast, slow=self.macd_slow, signal=self.macd_signal
)
df['BB_middle'], df['BB_upper'], df['BB_lower'] = calculate_bollinger_bands(
df, period=self.bb_period, num_std=self.bb_std
)
df['BB_bandwidth'] = (df['BB_upper'] - df['BB_lower']) / df['BB_middle']
# Rolling statistics
df = calculate_rolling_features(df, windows=[10, 20, 50])
# Momentum
df = calculate_momentum_features(df, windows=[5, 10, 20])
# Volume features
df = calculate_volume_features(df)
# Price action
df = calculate_price_action_features(df)
# Time features
df = create_time_features(df)
# Lag features
lag_cols = ['Close', 'log_return', 'RSI_14', 'Volume']
df = create_lag_features(df, lag_cols, lags=[1, 2, 3, 5])
return df
# Apply the pipeline
engine = FinancialFeatureEngineer()
stock_data_enriched = engine.fit_transform(stock_data)
print(f"\nOriginal features: 6 (OHLCV + Date)")
print(f"Engineered features: {len(stock_data_enriched.columns)}")
print(f"\nFeature categories created:")
print("- Returns (simple, log)")
print("- Technical indicators (RSI, MACD, Bollinger Bands)")
print("- Rolling statistics (MA, STD, MIN, MAX)")
print("- Momentum features")
print("- Volume features (OBV, VWAP)")
print("- Price action (ranges, shadows, ATR)")
print("- Time features (cyclical encoding)")
print("- Lag features")
Feature Importance and Selection
Not all engineered features will be useful. Let’s identify the most predictive ones:
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
def calculate_feature_importance(df, target_col='log_return', top_n=20):
"""
Calculate feature importance using Random Forest.
Parameters:
-----------
df : pd.DataFrame
DataFrame with features
target_col : str
Target variable column
top_n : int
Number of top features to display
Returns:
--------
pd.DataFrame
Feature importance rankings
"""
# Prepare data
df_clean = df.dropna()
# Create target (next day's return)
df_clean['target'] = df_clean[target_col].shift(-1)
df_clean = df_clean.dropna()
# Separate features and target
feature_cols = [col for col in df_clean.columns if col not in
['target', 'Open', 'High', 'Low', 'Close', 'Volume']]
X = df_clean[feature_cols]
y = df_clean['target']
# Train Random Forest
rf = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)
rf.fit(X, y)
# Get feature importance
importance_df = pd.DataFrame({
'feature': feature_cols,
'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)
# Visualize top features
plt.figure(figsize=(10, 8))
top_features = importance_df.head(top_n)
plt.barh(range(len(top_features)), top_features['importance'])
plt.yticks(range(len(top_features)), top_features['feature'])
plt.xlabel('Feature Importance')
plt.title(f'Top {top_n} Most Important Features')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()
return importance_df
importance_df = calculate_feature_importance(stock_data_enriched, top_n=20)
print("\nTop 10 Most Important Features:")
print(importance_df.head(10))
Handling Missing Values and Data Quality
def clean_features(df, method='forward_fill', max_missing_pct=0.3):
"""
Clean engineered features and handle missing values.
Parameters:
-----------
df : pd.DataFrame
DataFrame with engineered features
method : str
Filling method ('forward_fill', 'interpolate', 'drop')
max_missing_pct : float
Maximum allowed missing percentage per column
Returns:
--------
pd.DataFrame
Cleaned dataframe
"""
df_clean = df.copy()
# Check missing values
missing_pct = df_clean.isnull().sum() / len(df_clean)
# Drop columns with too many missing values
cols_to_drop = missing_pct[missing_pct > max_missing_pct].index.tolist()
if cols_to_drop:
print(f"Dropping {len(cols_to_drop)} columns with >{max_missing_pct*100}% missing:")
print(cols_to_drop)
df_clean = df_clean.drop(columns=cols_to_drop)
# Handle remaining missing values
if method == 'forward_fill':
df_clean = df_clean.fillna(method='ffill').fillna(method='bfill')
elif method == 'interpolate':
df_clean = df_clean.interpolate(method='linear').fillna(method='bfill')
elif method == 'drop':
df_clean = df_clean.dropna()
# Remove infinite values
df_clean = df_clean.replace([np.inf, -np.inf], np.nan)
df_clean = df_clean.fillna(method='ffill').fillna(method='bfill')
return df_clean
stock_data_clean = clean_features(stock_data_enriched, method='forward_fill')
print(f"\nFinal dataset shape: {stock_data_clean.shape}")
print(f"Remaining missing values: {stock_data_clean.isnull().sum().sum()}")
Real-World Example: Multi-Stock Feature Engineering
# Load full S&P 500 dataset
df_full = pd.read_csv('sp500_stocks.csv')
df_full['Date'] = pd.to_datetime(df_full['Date'])
# Process multiple stocks
stocks = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA']
engineered_data = {}
for symbol in stocks:
print(f"Processing {symbol}...")
stock_df = df_full[df_full['Symbol'] == symbol].copy()
stock_df = stock_df.sort_values('Date').set_index('Date')
# Apply feature engineering
engine = FinancialFeatureEngineer()
stock_df_enriched = engine.fit_transform(stock_df)
stock_df_clean = clean_features(stock_df_enriched)
engineered_data[symbol] = stock_df_clean
print(f" Features: {stock_df_clean.shape[1]}, Rows: {stock_df_clean.shape[0]}")
# Save processed data
for symbol, data in engineered_data.items():
data.to_csv(f'features_{symbol}.csv')
print(f"Saved features_{symbol}.csv")
Summary Table: Key Features Created
| Category | Features | Formula/Method | Use Case |
|---|---|---|---|
| Returns | log_return, simple_return | Target variable, momentum | |
| RSI | RSI_14 | Overbought/oversold detection | |
| MACD | MACD, MACD_signal, MACD_hist | Trend and momentum | |
| Bollinger Bands | BB_upper, BB_middle, BB_lower, BB_bandwidth | Volatility and price extremes | |
| Rolling Stats | MA_10, MA_20, STD_20, etc. | Rolling windows | Trend and volatility |
| Momentum | ROC_10, momentum_20 | Price changes over periods | Trend strength |
| Volume | OBV, VWAP, volume_ratio | Cumulative signed volume | Confirmation of price moves |
| Price Action | ATR_14, daily_range, gaps | True range calculations | Volatility measurement |
| Lags | Close_lag_1, RSI_lag_5 | Shifted values | Historical context |
| Time | day_of_week_sin, month_cos | Cyclical encoding | Seasonal patterns |
Conclusion
In this episode, we’ve transformed raw OHLCV data into a rich feature set ready for machine learning models. We covered:
- Returns calculation: Understanding the difference between simple and log returns
- Technical indicators: RSI, MACD, and Bollinger Bands with full mathematical explanations
- Rolling statistics: Capturing temporal dynamics through moving averages and volatility
- Lag features: Providing historical context to models
- Volume analysis: Understanding the conviction behind price movements
- Feature importance: Identifying the most predictive features
These engineered features capture market psychology, momentum, volatility, and temporal patterns—the essential ingredients for predictive financial models.
Next Episode Preview: In episode 4, we’ll apply these features to build a Credit Risk Scoring Model. We’ll explore classification algorithms, handle imbalanced datasets, and create interpretable risk scores using techniques like logistic regression, gradient boosting, and neural networks. We’ll also dive into model evaluation metrics specific to credit risk, including precision-recall trade-offs and business impact analysis.
Did you find this helpful?
☕ Buy me a coffee
Leave a Reply