Feature Engineering for Predictive Maintenance: Extracting Health Indicators from Vibration and Temperature Data

,
Updated Feb 6, 2026

When Your Features Make the Model Worse

Here’s something nobody tells you about feature engineering for CBM: more features can actively hurt your RUL predictions. I spent a week building 47 different vibration and temperature features from the sensor data we collected in Part 1, thinking I was being thorough. The random forest model’s RMSE went up by 22%. Turns out, half of those features were just different flavors of noise.

The problem wasn’t the features themselves — it was that I’d extracted them blindly without understanding what actually correlates with bearing degradation. You can’t just throw FFT bins and rolling statistics at a model and expect it to figure out which ones matter. (Well, you can, but your laptop fan will hate you and your predictions will be garbage.)

This post walks through the feature extraction process that actually worked: starting from domain knowledge about how machines fail, building features that capture those failure modes, and ruthlessly cutting the ones that don’t pull their weight.

Close-up of mechanic's hands working on engine cylinder head assembly with valves.
Photo by Auto Tech on Pexels

What Features Actually Matter for Bearing Health

Bearings don’t fail randomly — they follow predictable degradation patterns. The outer race develops microcracks, spalling increases surface roughness, lubrication breaks down, friction rises, and you get measurable changes in vibration harmonics and heat dissipation. If your features don’t capture any of this physics, you’re just fitting to training set quirks.

The vibration signature is where most of the signal lives. Healthy bearings produce relatively clean sinusoidal vibration at the shaft rotation frequency. As wear progresses, you start seeing:

  • Higher amplitude at characteristic defect frequencies (BPFO, BPFI, BSF — functions of bearing geometry)
  • Increased spectral entropy (the frequency spectrum gets messier)
  • Broader peaks in the FFT (less concentrated energy)
  • Rising RMS and kurtosis in the time domain (sharper impacts from surface defects)

Temperature is simpler but still useful. Degradation → more friction → more heat. The absolute temperature matters, but the rate of change and deviation from baseline often matter more. A bearing running 5°C hotter than usual is a red flag even if it’s still within spec.

So the goal is to extract features that quantify these changes. Not just “apply every scipy.signal function and see what sticks.”

Time-Domain Features: RMS, Kurtosis, and Crest Factor

Start with the basics. These are cheap to compute and surprisingly informative.

import numpy as np
from scipy import stats

def extract_time_features(vibration_signal):
    """
    Extract time-domain health indicators from raw vibration.
    vibration_signal: 1D array, assumed sampled at 10kHz (from Part 1)
    """
    features = {}

    # RMS amplitude - monotonically increases with wear
    features['rms'] = np.sqrt(np.mean(vibration_signal**2))

    # Kurtosis - measures impulsiveness (spalling creates sharp spikes)
    # scipy.stats.kurtosis defaults to Fisher=True (excess kurtosis)
    features['kurtosis'] = stats.kurtosis(vibration_signal)

    # Crest factor - ratio of peak to RMS
    # Jumps when you get intermittent impacts from defects
    features['crest_factor'] = np.max(np.abs(vibration_signal)) / features['rms']

    # Peak-to-peak - another amplitude metric
    features['peak_to_peak'] = np.ptp(vibration_signal)

    # Shape factor - RMS over mean absolute
    # Less common but captures waveform shape changes
    mean_abs = np.mean(np.abs(vibration_signal))
    features['shape_factor'] = features['rms'] / mean_abs if mean_abs > 0 else 0

    return features

RMS is the most reliable single feature in my tests. It correlates strongly with overall energy, which rises as bearings degrade. Kurtosis is more sensitive to early-stage defects — you’ll see it spike before RMS moves much. But kurtosis is also noisier, so I typically smooth it with a 10-sample rolling median before feeding it to a model.

Crest factor is hit-or-miss. In theory it should catch sudden impacts, but in practice it’s sensitive to outliers from electrical noise or sensor glitches. I’ve had better luck using the 99th percentile instead of the absolute max:

features['crest_factor_robust'] = np.percentile(np.abs(vibration_signal), 99) / features['rms']

One gotcha: if you’re working with accelerometers (which most CBM systems use), remember that RMS scales with sensor mounting and preload. You need to normalize by the baseline RMS from the healthy period, otherwise you’re just measuring how tight you screwed in the sensor. I learned this the hard way when my model predicted a bearing was 80% degraded on installation day.

Frequency-Domain Features: FFT and Spectral Characteristics

Time-domain features tell you something changed. Frequency-domain features tell you what changed and often give you earlier warning.

The standard approach is to compute the FFT, extract magnitude spectrum, and pull out features from specific frequency bands:

from scipy.fft import rfft, rfftfreq
from scipy.signal import welch

def extract_frequency_features(vibration_signal, fs=10000):
    """
    Extract frequency-domain indicators.
    fs: sampling frequency in Hz
    """
    features = {}

    # Use Welch's method for better spectral estimate
    # (averages FFT over overlapping windows - reduces noise)
    freqs, psd = welch(vibration_signal, fs=fs, nperseg=2048)

    # Total power (should match time-domain RMS^2)
    features['total_power'] = np.sum(psd)

    # Spectral entropy - disorder in frequency distribution
    # Healthy bearings have concentrated energy; worn ones spread out
    psd_norm = psd / np.sum(psd)
    features['spectral_entropy'] = -np.sum(psd_norm * np.log2(psd_norm + 1e-12))

    # Spectral centroid - weighted average frequency
    # Shifts higher as high-frequency content increases
    features['spectral_centroid'] = np.sum(freqs * psd) / np.sum(psd)

    # Peak frequency and its amplitude
    peak_idx = np.argmax(psd)
    features['peak_freq'] = freqs[peak_idx]
    features['peak_amplitude'] = psd[peak_idx]

    # Band power ratios (example: high-freq / low-freq)
    # Bearing defects generate high-frequency content
    low_band = (freqs < 1000)
    high_band = (freqs >= 1000) & (freqs < 5000)
    features['hf_lf_ratio'] = np.sum(psd[high_band]) / (np.sum(psd[low_band]) + 1e-12)

    return features

Spectral entropy is the MVP here. The formula is H=ipilog2(pi)H = -\sum_{i} p_i \log_2(p_i), where pip_i is the normalized power in bin ii. It’s borrowed from information theory: more spread-out energy → higher entropy → more degradation. In my tests it starts rising around 60-70% of bearing life, well before RMS shows anything.

The high-freq to low-freq ratio is domain-specific. For rolling element bearings, defect frequencies are typically 5-20x the shaft speed, so I split at 1 kHz for a motor running at ~30 Hz (1800 RPM). If you’re monitoring a different machine, adjust the bands accordingly. You can calculate the exact bearing defect frequencies from geometry:

fBPFO=n2fr(1dDcosϕ)f_{\text{BPFO}} = \frac{n}{2} f_r \left(1 – \frac{d}{D} \cos \phi\right)

where nn is number of balls, frf_r is shaft frequency, dd is ball diameter, DD is pitch diameter, and ϕ\phi is contact angle. But honestly, I’ve found that just monitoring “is there more energy in the 1-5 kHz range than usual” works almost as well without needing exact geometry specs.

Temperature Features: Rate of Change and Baseline Deviation

Temperature data is simpler but you can still extract useful features beyond the raw reading:

def extract_temperature_features(temp_signal, baseline_temp=None):
    """
    temp_signal: 1D array of temperature readings (°C)
    baseline_temp: healthy operating temperature (compute from first 20% of data)
    """
    features = {}

    # Current temperature stats
    features['temp_mean'] = np.mean(temp_signal)
    features['temp_std'] = np.std(temp_signal)
    features['temp_max'] = np.max(temp_signal)

    # Rate of change (derivative approximation)
    temp_diff = np.diff(temp_signal)
    features['temp_rate_mean'] = np.mean(temp_diff)
    features['temp_rate_std'] = np.std(temp_diff)

    # Deviation from baseline (if available)
    if baseline_temp is not None:
        features['temp_deviation'] = features['temp_mean'] - baseline_temp
        features['temp_deviation_max'] = features['temp_max'] - baseline_temp

    return features

The rate of change features capture sudden thermal events — if the temperature jumps 2°C in 10 minutes, something’s wrong even if the absolute value is still reasonable. I compute these over a sliding window (typically 100 samples, which is ~10 minutes in my setup from Part 1).

One thing I’m still not entirely sure about: how to handle ambient temperature variation. My test rig is in a climate-controlled lab, but real industrial environments swing by 10-15°C across a shift. Subtracting a running ambient baseline helps, but you need a reference sensor that’s definitely not affected by the bearing. I’ve tried using the motor casing temperature as reference, but if the motor itself is degrading, you’re chasing your tail.

Combining Features: Dimensionality Reduction and Selection

At this point you’ve got ~15-20 features per data window. Some are highly correlated (RMS and total power are basically the same thing). Some add noise. How do you pick?

Option 1: Throw everything at a tree-based model and check feature importances:

from sklearn.ensemble import RandomForestRegressor
import pandas as pd

# X: feature matrix (N_samples x N_features)
# y: RUL labels (N_samples,)
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X, y)

importances = pd.Series(rf.feature_importances_, index=X.columns).sort_values(ascending=False)
print(importances.head(10))

In my case, the top 5 were: RMS, spectral entropy, kurtosis, temp_deviation, and hf_lf_ratio. Everything else contributed <5% importance. So I dropped the rest and retrained — RMSE dropped by 18% and inference got 3x faster.

Option 2: Use PCA if you want uncorrelated features for something like a Gaussian process. But be warned: principal components are linear combinations of original features, so you lose interpretability. When your model predicts a bearing will fail in 2 days, you want to know why — “PC1 is high” doesn’t cut it.

Option 3: Domain knowledge + manual selection. I kept RMS (amplitude), kurtosis (impulsiveness), spectral_entropy (frequency spread), temp_deviation (thermal), and hf_lf_ratio (defect signature). Five features, each measuring something physically distinct. This worked better than data-driven selection for me, probably because my training set was small (~200 run-to-failure cycles).

Feature Engineering for LSTM: Sequences, Not Snapshots

Everything above assumes you’re feeding features into a classical ML model (random forest, XGBoost, etc.). If you’re using an LSTM (which we’ll cover in Part 3), you need to think about sequences.

Instead of extracting features from a single window and predicting RUL, you extract features from the last TT windows and feed the whole sequence to the LSTM:

def create_sequences(features_df, window_size=50):
    """
    features_df: DataFrame with shape (N_windows, N_features)
    window_size: number of consecutive windows to use as input
    Returns: (N_samples, window_size, N_features) array
    """
    sequences = []
    for i in range(len(features_df) - window_size):
        seq = features_df.iloc[i:i+window_size].values
        sequences.append(seq)
    return np.array(sequences)

The LSTM can learn temporal patterns — “spectral entropy has been rising for 20 windows” — that a static model misses. But this also means you need more training data and careful tuning. I found window_size=50 worked well (5 hours of history with 6-minute windows), but YMMV.

One surprise: LSTMs are less sensitive to feature selection than random forests. Even if you include redundant features, the LSTM learns to ignore them. But training still takes longer, so I’d start with the important features and add more only if you plateau.

What I’d Do Differently Next Time

If I were starting over, I’d spend more time on envelope analysis. The basic FFT approach works, but bearing defects produce amplitude modulation — the defect impacts repeat at BPFO/BPFI, modulating the carrier frequency. You can extract this by bandpass filtering, taking the Hilbert transform to get the envelope, then FFT-ing the envelope. This isolates the defect frequencies even when they’re buried in noise.

I haven’t tested this thoroughly because it’s fiddly to get the bandpass cutoffs right, but the papers (see Randall & Antoni, “Rolling element bearing diagnostics—A tutorial,” Mechanical Systems and Signal Processing, 2011, if I recall correctly) claim significant SNR improvement. Worth trying if your vibration data is noisy.

Another thing: I treated each bearing in isolation, but in a real system you’d have multiple sensors (motor, gearbox, pump, etc.). Cross-correlating features across sensors can catch coupling effects — maybe the bearing is fine but the motor is misaligned, causing abnormal vibration. I didn’t have multi-sensor data, but it’s something to consider.

Next up in Part 3: training and evaluating RUL models. We’ll take these features and build both classical ML baselines (random forest, XGBoost) and a deep learning model (LSTM). Spoiler: the LSTM wins on accuracy, but the random forest is way easier to debug when it mispredicts.

CBM Portfolio Project Series (2/4)

Did you find this helpful?

☕ Buy me a coffee

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

TODAY 369 | TOTAL 2,592