Building RUL Prediction Models in Python: Step-by-Step from Linear Regression to LSTM

Updated Feb 6, 2026

Introduction

In the previous episode, we explored the fundamental concepts of Remaining Useful Life (RUL) prediction in Prognostics and Health Management (PHM) and prepared our dataset using the NASA C-MAPSS Turbofan Engine Degradation dataset. Now it’s time to roll up our sleeves and build actual prediction models.

This episode walks you through implementing RUL prediction models in Python, starting from simple linear regression and progressing to sophisticated deep learning architectures like LSTM. You’ll learn not just the code, but the reasoning behind each modeling choice and when to use which approach.

Setting Up the Environment

Before diving into modeling, let’s ensure our Python environment has all necessary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

Install required packages if needed:

pip install numpy pandas matplotlib seaborn scikit-learn tensorflow

Data Preprocessing Revisited

Recall from Episode 1 that we loaded the C-MAPSS dataset and calculated true RUL values. Let’s quickly reconstruct that foundation:

# Load training data
column_names = ['unit_id', 'time_cycles', 'setting_1', 'setting_2', 'setting_3']
column_names += [f'sensor_{i}' for i in range(1, 22)]

train_df = pd.read_csv('train_FD001.txt', sep='s+', header=None, names=column_names)

# Calculate RUL for each sample
max_cycles = train_df.groupby('unit_id')['time_cycles'].max().reset_index()
max_cycles.columns = ['unit_id', 'max_cycle']
train_df = train_df.merge(max_cycles, on='unit_id', how='left')
train_df['RUL'] = train_df['max_cycle'] - train_df['time_cycles']

# Cap RUL at 125 cycles (common practice in PHM)
train_df['RUL'] = train_df['RUL'].clip(upper=125)

print(f"Dataset shape: {train_df.shape}")
print(f"RUL statistics:n{train_df['RUL'].describe()}")

Feature Selection

Not all sensors contribute equally to RUL prediction. Let’s identify the most informative features by calculating correlation with RUL and removing low-variance sensors:

# Remove sensors with near-zero variance
sensor_columns = [col for col in train_df.columns if col.startswith('sensor_')]
variance_threshold = 0.01

variances = train_df[sensor_columns].var()
low_variance_sensors = variances[variances < variance_threshold].index.tolist()
print(f"Removing low-variance sensors: {low_variance_sensors}")

# Calculate correlation with RUL
correlations = train_df[sensor_columns].corrwith(train_df['RUL']).abs().sort_values(ascending=False)
print(f"nTop 10 sensors by correlation with RUL:n{correlations.head(10)}")

# Select top features
top_sensors = correlations.head(14).index.tolist()
feature_columns = ['setting_1', 'setting_2', 'setting_3'] + top_sensors

print(f"nSelected {len(feature_columns)} features for modeling")

Baseline Model: Linear Regression

Let’s start with the simplest approach: linear regression. This assumes RUL degradation follows a linear pattern, which is often not true but provides a useful baseline.

Theory

Linear regression models RUL as a weighted sum of input features:

$RUL = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon$

Where:
– $\beta_0$ is the intercept term
– $\beta_1, \beta_2, \ldots, \beta_n$ are the feature coefficients
– $x_1, x_2, \ldots, x_n$ are the input features (sensor readings, operational settings)
– $\epsilon$ is the error term

Implementation

# Prepare features and target
X = train_df[feature_columns].values
y = train_df['RUL'].values

# Train-test split (80-20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Normalize features (critical for many ML algorithms)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train linear regression
lr_model = LinearRegression()
lr_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_train = lr_model.predict(X_train_scaled)
y_pred_test = lr_model.predict(X_test_scaled)

# Evaluation metrics
def evaluate_model(y_true, y_pred, model_name):
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    mae = mean_absolute_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)

    print(f"n{model_name} Performance:")
    print(f"RMSE: {rmse:.2f} cycles")
    print(f"MAE: {mae:.2f} cycles")
    print(f"R² Score: {r2:.4f}")

    return {'rmse': rmse, 'mae': mae, 'r2': r2}

lr_metrics = evaluate_model(y_test, y_pred_test, "Linear Regression")

Visualization

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(y_test, y_pred_test, alpha=0.5, s=10)
plt.plot([0, 125], [0, 125], 'r--', lw=2, label='Perfect Prediction')
plt.xlabel('True RUL (cycles)')
plt.ylabel('Predicted RUL (cycles)')
plt.title('Linear Regression: Predicted vs True RUL')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
residuals = y_test - y_pred_test
plt.hist(residuals, bins=50, edgecolor='black', alpha=0.7)
plt.xlabel('Prediction Error (cycles)')
plt.ylabel('Frequency')
plt.title('Residual Distribution')
plt.axvline(x=0, color='r', linestyle='--', lw=2)
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Tree-Based Models: Random Forest

Linear regression struggles with non-linear degradation patterns. Tree-based ensemble methods like Random Forest can capture complex relationships.

Why Random Forest?

Non-linearity: Handles non-linear relationships between sensors and RUL
Feature interactions: Automatically captures interactions between sensors
Robustness: Less sensitive to outliers and noisy sensor readings
Interpretability: Provides feature importance rankings

Implementation

# Train Random Forest with optimized hyperparameters
rf_model = RandomForestRegressor(
    n_estimators=200,
    max_depth=20,
    min_samples_split=10,
    min_samples_leaf=4,
    max_features='sqrt',
    random_state=42,
    n_jobs=-1,
    verbose=0
)

rf_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_rf = rf_model.predict(X_test_scaled)

# Evaluation
rf_metrics = evaluate_model(y_test, y_pred_rf, "Random Forest")

# Feature importance analysis
feature_importance = pd.DataFrame({
    'feature': feature_columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print(f"nTop 10 Important Features:")
print(feature_importance.head(10))

# Visualize feature importance
plt.figure(figsize=(10, 6))
plt.barh(feature_importance['feature'].head(15), 
         feature_importance['importance'].head(15))
plt.xlabel('Importance Score')
plt.title('Top 15 Feature Importances (Random Forest)')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

Gradient Boosting: Pushing Performance Further

Gradient Boosting builds trees sequentially, each correcting errors from previous trees. This often yields better performance than Random Forest.

# Train Gradient Boosting model
gb_model = GradientBoostingRegressor(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=5,
    min_samples_split=10,
    min_samples_leaf=4,
    subsample=0.8,
    random_state=42,
    verbose=0
)

gb_model.fit(X_train_scaled, y_train)

# Predictions and evaluation
y_pred_gb = gb_model.predict(X_test_scaled)
gb_metrics = evaluate_model(y_test, y_pred_gb, "Gradient Boosting")

Deep Learning: LSTM for Temporal Patterns

All previous models treat each sample independently, ignoring the temporal sequence of sensor readings. Engine degradation is inherently a time-series problem—LSTM networks can capture these temporal dependencies.

Why LSTM for RUL Prediction?

Long Short-Term Memory (LSTM) networks excel at:
– Temporal dependencies: Learning how sensor patterns evolve over time
– Memory: Remembering early degradation signals that predict future failures
– Non-linear dynamics: Modeling complex degradation physics

Preparing Sequential Data

LSTM requires input in the shape (samples, timesteps, features). We’ll create sliding windows of sensor readings:

def create_sequences(df, unit_col, feature_cols, target_col, sequence_length=30):
    """
    Create sequences for LSTM training.
    Each sequence contains 'sequence_length' consecutive timesteps.
    """
    sequences = []
    targets = []

    for unit_id in df[unit_col].unique():
        unit_data = df[df[unit_col] == unit_id].sort_values('time_cycles')

        # Extract features and target
        features = unit_data[feature_cols].values
        rul_values = unit_data[target_col].values

        # Create sliding windows
        for i in range(len(features) - sequence_length + 1):
            sequences.append(features[i:i+sequence_length])
            targets.append(rul_values[i+sequence_length-1])

    return np.array(sequences), np.array(targets)

# Create sequences
sequence_length = 30
X_seq, y_seq = create_sequences(
    train_df, 
    'unit_id', 
    feature_columns, 
    'RUL', 
    sequence_length
)

print(f"Sequence shape: {X_seq.shape}")
print(f"Target shape: {y_seq.shape}")

# Train-test split
X_seq_train, X_seq_test, y_seq_train, y_seq_test = train_test_split(
    X_seq, y_seq, test_size=0.2, random_state=42
)

# Normalize features
scaler_seq = StandardScaler()
X_seq_train_scaled = scaler_seq.fit_transform(
    X_seq_train.reshape(-1, X_seq_train.shape[-1])
).reshape(X_seq_train.shape)

X_seq_test_scaled = scaler_seq.transform(
    X_seq_test.reshape(-1, X_seq_test.shape[-1])
).reshape(X_seq_test.shape)

Building the LSTM Model

def build_lstm_model(sequence_length, n_features):
    """
    Build LSTM model for RUL prediction.
    Architecture: LSTM -> Dropout -> LSTM -> Dropout -> Dense -> Output
    """
    model = models.Sequential([
        layers.LSTM(
            units=128, 
            return_sequences=True, 
            input_shape=(sequence_length, n_features)
        ),
        layers.Dropout(0.2),

        layers.LSTM(units=64, return_sequences=False),
        layers.Dropout(0.2),

        layers.Dense(units=32, activation='relu'),
        layers.Dropout(0.1),

        layers.Dense(units=1)  # Output: RUL prediction
    ])

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.001),
        loss='mse',
        metrics=['mae']
    )

    return model

# Build model
lstm_model = build_lstm_model(sequence_length, len(feature_columns))
lstm_model.summary()

Training with Callbacks

# Define callbacks for better training
early_stop = callbacks.EarlyStopping(
    monitor='val_loss',
    patience=15,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=5,
    min_lr=1e-6,
    verbose=1
)

# Train the model
history = lstm_model.fit(
    X_seq_train_scaled, y_seq_train,
    validation_split=0.2,
    epochs=100,
    batch_size=256,
    callbacks=[early_stop, reduce_lr],
    verbose=1
)

# Predictions
y_pred_lstm = lstm_model.predict(X_seq_test_scaled).flatten()

# Evaluation
lstm_metrics = evaluate_model(y_seq_test, y_pred_lstm, "LSTM")

Training History Visualization

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('LSTM Training History')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history.history['mae'], label='Training MAE')
plt.plot(history.history['val_mae'], label='Validation MAE')
plt.xlabel('Epoch')
plt.ylabel('Mean Absolute Error')
plt.title('LSTM MAE History')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Model Comparison

Let’s compare all models side-by-side:

# Compile results
results = pd.DataFrame([
    {'Model': 'Linear Regression', **lr_metrics},
    {'Model': 'Random Forest', **rf_metrics},
    {'Model': 'Gradient Boosting', **gb_metrics},
    {'Model': 'LSTM', **lstm_metrics}
])

print("n" + "="*60)
print("MODEL COMPARISON")
print("="*60)
print(results.to_string(index=False))
print("="*60)

# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
metrics = ['rmse', 'mae', 'r2']
titles = ['RMSE (Lower is Better)', 'MAE (Lower is Better)', 'R² Score (Higher is Better)']

for idx, (metric, title) in enumerate(zip(metrics, titles)):
    axes[idx].bar(results['Model'], results[metric], 
                  color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'])
    axes[idx].set_title(title)
    axes[idx].set_ylabel(metric.upper())
    axes[idx].tick_params(axis='x', rotation=45)
    axes[idx].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

Understanding Model Predictions

Let’s visualize how each model predicts RUL over the lifetime of a single engine unit:

# Select a random test engine unit
test_unit = train_df[train_df['unit_id'] == 50].copy()
test_unit_features = test_unit[feature_columns].values
test_unit_rul = test_unit['RUL'].values

# Scale features
test_unit_scaled = scaler.transform(test_unit_features)

# Predictions from traditional models
pred_lr = lr_model.predict(test_unit_scaled)
pred_rf = rf_model.predict(test_unit_scaled)
pred_gb = gb_model.predict(test_unit_scaled)

# LSTM prediction (requires sequences)
pred_lstm = []
for i in range(len(test_unit_features) - sequence_length + 1):
    seq = test_unit_features[i:i+sequence_length]
    seq_scaled = scaler_seq.transform(seq).reshape(1, sequence_length, -1)
    pred_lstm.append(lstm_model.predict(seq_scaled, verbose=0)[0, 0])

# Align LSTM predictions
lstm_cycles = test_unit['time_cycles'].values[sequence_length-1:]

# Plot
plt.figure(figsize=(14, 6))
plt.plot(test_unit['time_cycles'], test_unit_rul, 
         'k-', lw=2, label='True RUL', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_lr, 
         '--', label='Linear Regression', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_rf, 
         '--', label='Random Forest', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_gb, 
         '--', label='Gradient Boosting', alpha=0.7)
plt.plot(lstm_cycles, pred_lstm, 
         '--', label='LSTM', alpha=0.7, lw=2)

plt.xlabel('Time Cycles')
plt.ylabel('Remaining Useful Life (cycles)')
plt.title('RUL Prediction Comparison for Engine Unit 50')
plt.legend(loc='upper right')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Key Insights and Trade-offs

Model Selection Guidelines

Model	Pros	Cons	Best For
Linear Regression	Fast, interpretable, low compute	Poor with non-linearity	Quick baseline, simple systems
Random Forest	Good accuracy, feature importance	Memory-intensive, slow inference	Medium datasets, interpretability needed
Gradient Boosting	Highest tabular performance	Longer training, hyperparameter-sensitive	Competition-grade accuracy
LSTM	Captures temporal patterns	Requires more data, slower, needs GPU	Time-series dependencies critical

When to Use LSTM?

Use LSTM when:
– Long degradation history matters: Early sensor patterns predict future failures
– Sufficient data: At least 50-100 units with full degradation cycles
– Temporal patterns exist: Sensor trends more informative than point values
– Computational resources available: GPU recommended for training

Stick with tree-based models when:
– Limited data: Fewer than 50 engine units
– Real-time constraints: Need sub-millisecond inference
– Interpretability critical: Need to explain predictions to stakeholders

Hyperparameter Tuning Example

Brief example of tuning the LSTM learning rate:

# Grid search over learning rates
learning_rates = [0.0001, 0.001, 0.01]
best_val_loss = float('inf')
best_lr = None

for lr in learning_rates:
    print(f"nTesting learning rate: {lr}")

    model = build_lstm_model(sequence_length, len(feature_columns))
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=lr),
        loss='mse',
        metrics=['mae']
    )

    history = model.fit(
        X_seq_train_scaled, y_seq_train,
        validation_split=0.2,
        epochs=20,
        batch_size=256,
        verbose=0
    )

    val_loss = min(history.history['val_loss'])
    print(f"Best validation loss: {val_loss:.4f}")

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_lr = lr

print(f"nOptimal learning rate: {best_lr}")

Conclusion

In this episode, we progressed from simple linear regression to sophisticated LSTM networks for RUL prediction. Here’s what we covered:

Linear Regression: Established baseline performance (~30-40 RMSE typical)
Random Forest: Captured non-linear relationships with feature importance analysis
Gradient Boosting: Pushed performance further with sequential error correction
LSTM Networks: Leveraged temporal dependencies for superior predictions (~15-20 RMSE typical)

Key takeaways:
– Start simple: Always establish baseline with linear/tree models before deep learning
– Feature engineering matters: Even LSTM benefits from good sensor selection
– Temporal patterns are gold: For degradation prediction, sequence models often win
– Trade-offs exist: Balance accuracy, interpretability, and computational cost

Typical performance on C-MAPSS FD001:
– Linear Regression: RMSE ~37-42 cycles
– Random Forest: RMSE ~25-30 cycles
– Gradient Boosting: RMSE ~22-27 cycles
– LSTM: RMSE ~16-21 cycles

In the next and final episode, we’ll take these models to production level: hyperparameter optimization, ensemble methods, uncertainty quantification, and real-world deployment strategies for industrial PHM systems. We’ll also explore how to handle concept drift, sensor failures, and other challenges you’ll face when deploying RUL prediction in actual factories.

Next up: RUL 예측 모델 고도화: 성능 최적화와 실제 산업 적용 전략

Predicting Remaining Useful Life (RUL) in PHM: A Hello World Guide – Part 4 Series (2/3)

← Previous: Introduction to PHM: What Is Remaining Useful Life (RUL) Prediction? From Concepts to Data Preparation Next: Deploying RUL Prediction Models: Evaluation, Optimization, and Real-World PHM System Integration →

Did you find this helpful?

☕ Buy me a coffee

Building RUL Prediction Models in Python: Step-by-Step from Linear Regression to LSTM

Introduction

Setting Up the Environment

Data Preprocessing Revisited

Feature Selection

Baseline Model: Linear Regression

Theory

Implementation

Visualization

Tree-Based Models: Random Forest

Why Random Forest?

Implementation

Gradient Boosting: Pushing Performance Further

Deep Learning: LSTM for Temporal Patterns

Why LSTM for RUL Prediction?

Preparing Sequential Data

Building the LSTM Model

Training with Callbacks

Training History Visualization

Model Comparison

Understanding Model Predictions

Key Insights and Trade-offs

Model Selection Guidelines

When to Use LSTM?

Hyperparameter Tuning Example

Conclusion

Comments

Leave a Reply Cancel reply

Building RUL Prediction Models in Python: Step-by-Step from Linear Regression to LSTM

Introduction

Setting Up the Environment

Data Preprocessing Revisited

Feature Selection

Baseline Model: Linear Regression

Theory

Implementation

Visualization

Tree-Based Models: Random Forest

Why Random Forest?

Implementation

Gradient Boosting: Pushing Performance Further

Deep Learning: LSTM for Temporal Patterns

Why LSTM for RUL Prediction?

Preparing Sequential Data

Building the LSTM Model

Training with Callbacks

Training History Visualization

Model Comparison

Understanding Model Predictions

Key Insights and Trade-offs

Model Selection Guidelines

When to Use LSTM?

Hyperparameter Tuning Example

Conclusion

Related Posts

Comments

Leave a Reply Cancel reply