Building RUL Prediction Models in Python: Step-by-Step from Linear Regression to LSTM

,
Updated Feb 6, 2026

Introduction

In the previous episode, we explored the fundamental concepts of Remaining Useful Life (RUL) prediction in Prognostics and Health Management (PHM) and prepared our dataset using the NASA C-MAPSS Turbofan Engine Degradation dataset. Now it’s time to roll up our sleeves and build actual prediction models.

This episode walks you through implementing RUL prediction models in Python, starting from simple linear regression and progressing to sophisticated deep learning architectures like LSTM. You’ll learn not just the code, but the reasoning behind each modeling choice and when to use which approach.

Setting Up the Environment

Before diving into modeling, let’s ensure our Python environment has all necessary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

Install required packages if needed:

pip install numpy pandas matplotlib seaborn scikit-learn tensorflow

Data Preprocessing Revisited

Recall from Episode 1 that we loaded the C-MAPSS dataset and calculated true RUL values. Let’s quickly reconstruct that foundation:

# Load training data
column_names = ['unit_id', 'time_cycles', 'setting_1', 'setting_2', 'setting_3']
column_names += [f'sensor_{i}' for i in range(1, 22)]

train_df = pd.read_csv('train_FD001.txt', sep='s+', header=None, names=column_names)

# Calculate RUL for each sample
max_cycles = train_df.groupby('unit_id')['time_cycles'].max().reset_index()
max_cycles.columns = ['unit_id', 'max_cycle']
train_df = train_df.merge(max_cycles, on='unit_id', how='left')
train_df['RUL'] = train_df['max_cycle'] - train_df['time_cycles']

# Cap RUL at 125 cycles (common practice in PHM)
train_df['RUL'] = train_df['RUL'].clip(upper=125)

print(f"Dataset shape: {train_df.shape}")
print(f"RUL statistics:n{train_df['RUL'].describe()}")

Feature Selection

Not all sensors contribute equally to RUL prediction. Let’s identify the most informative features by calculating correlation with RUL and removing low-variance sensors:

# Remove sensors with near-zero variance
sensor_columns = [col for col in train_df.columns if col.startswith('sensor_')]
variance_threshold = 0.01

variances = train_df[sensor_columns].var()
low_variance_sensors = variances[variances < variance_threshold].index.tolist()
print(f"Removing low-variance sensors: {low_variance_sensors}")

# Calculate correlation with RUL
correlations = train_df[sensor_columns].corrwith(train_df['RUL']).abs().sort_values(ascending=False)
print(f"nTop 10 sensors by correlation with RUL:n{correlations.head(10)}")

# Select top features
top_sensors = correlations.head(14).index.tolist()
feature_columns = ['setting_1', 'setting_2', 'setting_3'] + top_sensors

print(f"nSelected {len(feature_columns)} features for modeling")

Baseline Model: Linear Regression

Let’s start with the simplest approach: linear regression. This assumes RUL degradation follows a linear pattern, which is often not true but provides a useful baseline.

Theory

Linear regression models RUL as a weighted sum of input features:

RUL=β0+β1x1+β2x2++βnxn+ϵRUL = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon

Where:
β0\beta_0 is the intercept term
β1,β2,,βn\beta_1, \beta_2, \ldots, \beta_n are the feature coefficients
x1,x2,,xnx_1, x_2, \ldots, x_n are the input features (sensor readings, operational settings)
ϵ\epsilon is the error term

Implementation

# Prepare features and target
X = train_df[feature_columns].values
y = train_df['RUL'].values

# Train-test split (80-20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Normalize features (critical for many ML algorithms)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train linear regression
lr_model = LinearRegression()
lr_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_train = lr_model.predict(X_train_scaled)
y_pred_test = lr_model.predict(X_test_scaled)

# Evaluation metrics
def evaluate_model(y_true, y_pred, model_name):
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mae = mean_absolute_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)

    print(f"n{model_name} Performance:")
    print(f"RMSE: {rmse:.2f} cycles")
    print(f"MAE: {mae:.2f} cycles")
    print(f"R² Score: {r2:.4f}")

    return {'rmse': rmse, 'mae': mae, 'r2': r2}

lr_metrics = evaluate_model(y_test, y_pred_test, "Linear Regression")

Visualization

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(y_test, y_pred_test, alpha=0.5, s=10)
plt.plot([0, 125], [0, 125], 'r--', lw=2, label='Perfect Prediction')
plt.xlabel('True RUL (cycles)')
plt.ylabel('Predicted RUL (cycles)')
plt.title('Linear Regression: Predicted vs True RUL')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
residuals = y_test - y_pred_test
plt.hist(residuals, bins=50, edgecolor='black', alpha=0.7)
plt.xlabel('Prediction Error (cycles)')
plt.ylabel('Frequency')
plt.title('Residual Distribution')
plt.axvline(x=0, color='r', linestyle='--', lw=2)
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Tree-Based Models: Random Forest

Linear regression struggles with non-linear degradation patterns. Tree-based ensemble methods like Random Forest can capture complex relationships.

Why Random Forest?

  • Non-linearity: Handles non-linear relationships between sensors and RUL
  • Feature interactions: Automatically captures interactions between sensors
  • Robustness: Less sensitive to outliers and noisy sensor readings
  • Interpretability: Provides feature importance rankings

Implementation

# Train Random Forest with optimized hyperparameters
rf_model = RandomForestRegressor(
    n_estimators=200,
    max_depth=20,
    min_samples_split=10,
    min_samples_leaf=4,
    max_features='sqrt',
    random_state=42,
    n_jobs=-1,
    verbose=0
)

rf_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_rf = rf_model.predict(X_test_scaled)

# Evaluation
rf_metrics = evaluate_model(y_test, y_pred_rf, "Random Forest")

# Feature importance analysis
feature_importance = pd.DataFrame({
    'feature': feature_columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print(f"nTop 10 Important Features:")
print(feature_importance.head(10))

# Visualize feature importance
plt.figure(figsize=(10, 6))
plt.barh(feature_importance['feature'].head(15), 
         feature_importance['importance'].head(15))
plt.xlabel('Importance Score')
plt.title('Top 15 Feature Importances (Random Forest)')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

Gradient Boosting: Pushing Performance Further

Gradient Boosting builds trees sequentially, each correcting errors from previous trees. This often yields better performance than Random Forest.

# Train Gradient Boosting model
gb_model = GradientBoostingRegressor(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=5,
    min_samples_split=10,
    min_samples_leaf=4,
    subsample=0.8,
    random_state=42,
    verbose=0
)

gb_model.fit(X_train_scaled, y_train)

# Predictions and evaluation
y_pred_gb = gb_model.predict(X_test_scaled)
gb_metrics = evaluate_model(y_test, y_pred_gb, "Gradient Boosting")

Deep Learning: LSTM for Temporal Patterns

All previous models treat each sample independently, ignoring the temporal sequence of sensor readings. Engine degradation is inherently a time-series problem—LSTM networks can capture these temporal dependencies.

Why LSTM for RUL Prediction?

Long Short-Term Memory (LSTM) networks excel at:
Temporal dependencies: Learning how sensor patterns evolve over time
Memory: Remembering early degradation signals that predict future failures
Non-linear dynamics: Modeling complex degradation physics

Preparing Sequential Data

LSTM requires input in the shape (samples, timesteps, features). We’ll create sliding windows of sensor readings:

def create_sequences(df, unit_col, feature_cols, target_col, sequence_length=30):
    """
    Create sequences for LSTM training.
    Each sequence contains 'sequence_length' consecutive timesteps.
    """
    sequences = []
    targets = []

    for unit_id in df[unit_col].unique():
        unit_data = df[df[unit_col] == unit_id].sort_values('time_cycles')

        # Extract features and target
        features = unit_data[feature_cols].values
        rul_values = unit_data[target_col].values

        # Create sliding windows
        for i in range(len(features) - sequence_length + 1):
            sequences.append(features[i:i+sequence_length])
            targets.append(rul_values[i+sequence_length-1])

    return np.array(sequences), np.array(targets)

# Create sequences
sequence_length = 30
X_seq, y_seq = create_sequences(
    train_df, 
    'unit_id', 
    feature_columns, 
    'RUL', 
    sequence_length
)

print(f"Sequence shape: {X_seq.shape}")
print(f"Target shape: {y_seq.shape}")

# Train-test split
X_seq_train, X_seq_test, y_seq_train, y_seq_test = train_test_split(
    X_seq, y_seq, test_size=0.2, random_state=42
)

# Normalize features
scaler_seq = StandardScaler()
X_seq_train_scaled = scaler_seq.fit_transform(
    X_seq_train.reshape(-1, X_seq_train.shape[-1])
).reshape(X_seq_train.shape)

X_seq_test_scaled = scaler_seq.transform(
    X_seq_test.reshape(-1, X_seq_test.shape[-1])
).reshape(X_seq_test.shape)

Building the LSTM Model

def build_lstm_model(sequence_length, n_features):
    """
    Build LSTM model for RUL prediction.
    Architecture: LSTM -> Dropout -> LSTM -> Dropout -> Dense -> Output
    """
    model = models.Sequential([
        layers.LSTM(
            units=128, 
            return_sequences=True, 
            input_shape=(sequence_length, n_features)
        ),
        layers.Dropout(0.2),

        layers.LSTM(units=64, return_sequences=False),
        layers.Dropout(0.2),

        layers.Dense(units=32, activation='relu'),
        layers.Dropout(0.1),

        layers.Dense(units=1)  # Output: RUL prediction
    ])

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.001),
        loss='mse',
        metrics=['mae']
    )

    return model

# Build model
lstm_model = build_lstm_model(sequence_length, len(feature_columns))
lstm_model.summary()

Training with Callbacks

# Define callbacks for better training
early_stop = callbacks.EarlyStopping(
    monitor='val_loss',
    patience=15,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=5,
    min_lr=1e-6,
    verbose=1
)

# Train the model
history = lstm_model.fit(
    X_seq_train_scaled, y_seq_train,
    validation_split=0.2,
    epochs=100,
    batch_size=256,
    callbacks=[early_stop, reduce_lr],
    verbose=1
)

# Predictions
y_pred_lstm = lstm_model.predict(X_seq_test_scaled).flatten()

# Evaluation
lstm_metrics = evaluate_model(y_seq_test, y_pred_lstm, "LSTM")

Training History Visualization

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('LSTM Training History')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history.history['mae'], label='Training MAE')
plt.plot(history.history['val_mae'], label='Validation MAE')
plt.xlabel('Epoch')
plt.ylabel('Mean Absolute Error')
plt.title('LSTM MAE History')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Model Comparison

Let’s compare all models side-by-side:

# Compile results
results = pd.DataFrame([
    {'Model': 'Linear Regression', **lr_metrics},
    {'Model': 'Random Forest', **rf_metrics},
    {'Model': 'Gradient Boosting', **gb_metrics},
    {'Model': 'LSTM', **lstm_metrics}
])

print("n" + "="*60)
print("MODEL COMPARISON")
print("="*60)
print(results.to_string(index=False))
print("="*60)

# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
metrics = ['rmse', 'mae', 'r2']
titles = ['RMSE (Lower is Better)', 'MAE (Lower is Better)', 'R² Score (Higher is Better)']

for idx, (metric, title) in enumerate(zip(metrics, titles)):
    axes[idx].bar(results['Model'], results[metric], 
                  color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'])
    axes[idx].set_title(title)
    axes[idx].set_ylabel(metric.upper())
    axes[idx].tick_params(axis='x', rotation=45)
    axes[idx].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

Understanding Model Predictions

Let’s visualize how each model predicts RUL over the lifetime of a single engine unit:

# Select a random test engine unit
test_unit = train_df[train_df['unit_id'] == 50].copy()
test_unit_features = test_unit[feature_columns].values
test_unit_rul = test_unit['RUL'].values

# Scale features
test_unit_scaled = scaler.transform(test_unit_features)

# Predictions from traditional models
pred_lr = lr_model.predict(test_unit_scaled)
pred_rf = rf_model.predict(test_unit_scaled)
pred_gb = gb_model.predict(test_unit_scaled)

# LSTM prediction (requires sequences)
pred_lstm = []
for i in range(len(test_unit_features) - sequence_length + 1):
    seq = test_unit_features[i:i+sequence_length]
    seq_scaled = scaler_seq.transform(seq).reshape(1, sequence_length, -1)
    pred_lstm.append(lstm_model.predict(seq_scaled, verbose=0)[0, 0])

# Align LSTM predictions
lstm_cycles = test_unit['time_cycles'].values[sequence_length-1:]

# Plot
plt.figure(figsize=(14, 6))
plt.plot(test_unit['time_cycles'], test_unit_rul, 
         'k-', lw=2, label='True RUL', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_lr, 
         '--', label='Linear Regression', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_rf, 
         '--', label='Random Forest', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_gb, 
         '--', label='Gradient Boosting', alpha=0.7)
plt.plot(lstm_cycles, pred_lstm, 
         '--', label='LSTM', alpha=0.7, lw=2)

plt.xlabel('Time Cycles')
plt.ylabel('Remaining Useful Life (cycles)')
plt.title('RUL Prediction Comparison for Engine Unit 50')
plt.legend(loc='upper right')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Key Insights and Trade-offs

Model Selection Guidelines

Model Pros Cons Best For
Linear Regression Fast, interpretable, low compute Poor with non-linearity Quick baseline, simple systems
Random Forest Good accuracy, feature importance Memory-intensive, slow inference Medium datasets, interpretability needed
Gradient Boosting Highest tabular performance Longer training, hyperparameter-sensitive Competition-grade accuracy
LSTM Captures temporal patterns Requires more data, slower, needs GPU Time-series dependencies critical

When to Use LSTM?

Use LSTM when:
Long degradation history matters: Early sensor patterns predict future failures
Sufficient data: At least 50-100 units with full degradation cycles
Temporal patterns exist: Sensor trends more informative than point values
Computational resources available: GPU recommended for training

Stick with tree-based models when:
Limited data: Fewer than 50 engine units
Real-time constraints: Need sub-millisecond inference
Interpretability critical: Need to explain predictions to stakeholders

Hyperparameter Tuning Example

Brief example of tuning the LSTM learning rate:

# Grid search over learning rates
learning_rates = [0.0001, 0.001, 0.01]
best_val_loss = float('inf')
best_lr = None

for lr in learning_rates:
    print(f"nTesting learning rate: {lr}")

    model = build_lstm_model(sequence_length, len(feature_columns))
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss='mse',
        metrics=['mae']
    )

    history = model.fit(
        X_seq_train_scaled, y_seq_train,
        validation_split=0.2,
        epochs=20,
        batch_size=256,
        verbose=0
    )

    val_loss = min(history.history['val_loss'])
    print(f"Best validation loss: {val_loss:.4f}")

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_lr = lr

print(f"nOptimal learning rate: {best_lr}")

Conclusion

In this episode, we progressed from simple linear regression to sophisticated LSTM networks for RUL prediction. Here’s what we covered:

  1. Linear Regression: Established baseline performance (~30-40 RMSE typical)
  2. Random Forest: Captured non-linear relationships with feature importance analysis
  3. Gradient Boosting: Pushed performance further with sequential error correction
  4. LSTM Networks: Leveraged temporal dependencies for superior predictions (~15-20 RMSE typical)

Key takeaways:
Start simple: Always establish baseline with linear/tree models before deep learning
Feature engineering matters: Even LSTM benefits from good sensor selection
Temporal patterns are gold: For degradation prediction, sequence models often win
Trade-offs exist: Balance accuracy, interpretability, and computational cost

Typical performance on C-MAPSS FD001:
– Linear Regression: RMSE ~37-42 cycles
– Random Forest: RMSE ~25-30 cycles
– Gradient Boosting: RMSE ~22-27 cycles
– LSTM: RMSE ~16-21 cycles

In the next and final episode, we’ll take these models to production level: hyperparameter optimization, ensemble methods, uncertainty quantification, and real-world deployment strategies for industrial PHM systems. We’ll also explore how to handle concept drift, sensor failures, and other challenges you’ll face when deploying RUL prediction in actual factories.

Next up: RUL 예측 모델 고도화: 성능 최적화와 실제 산업 적용 전략

Predicting Remaining Useful Life (RUL) in PHM: A Hello World Guide – Part 4 Series (2/3)

Did you find this helpful?

☕ Buy me a coffee

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

TODAY 369 | TOTAL 2,592