Introduction
In the previous episode, we explored the fundamental concepts of Remaining Useful Life (RUL) prediction in Prognostics and Health Management (PHM) and prepared our dataset using the NASA C-MAPSS Turbofan Engine Degradation dataset. Now it’s time to roll up our sleeves and build actual prediction models.
This episode walks you through implementing RUL prediction models in Python, starting from simple linear regression and progressing to sophisticated deep learning architectures like LSTM. You’ll learn not just the code, but the reasoning behind each modeling choice and when to use which approach.
Setting Up the Environment
Before diving into modeling, let’s ensure our Python environment has all necessary libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks
import warnings
warnings.filterwarnings('ignore')
# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)
Install required packages if needed:
pip install numpy pandas matplotlib seaborn scikit-learn tensorflow
Data Preprocessing Revisited
Recall from Episode 1 that we loaded the C-MAPSS dataset and calculated true RUL values. Let’s quickly reconstruct that foundation:
# Load training data
column_names = ['unit_id', 'time_cycles', 'setting_1', 'setting_2', 'setting_3']
column_names += [f'sensor_{i}' for i in range(1, 22)]
train_df = pd.read_csv('train_FD001.txt', sep='s+', header=None, names=column_names)
# Calculate RUL for each sample
max_cycles = train_df.groupby('unit_id')['time_cycles'].max().reset_index()
max_cycles.columns = ['unit_id', 'max_cycle']
train_df = train_df.merge(max_cycles, on='unit_id', how='left')
train_df['RUL'] = train_df['max_cycle'] - train_df['time_cycles']
# Cap RUL at 125 cycles (common practice in PHM)
train_df['RUL'] = train_df['RUL'].clip(upper=125)
print(f"Dataset shape: {train_df.shape}")
print(f"RUL statistics:n{train_df['RUL'].describe()}")
Feature Selection
Not all sensors contribute equally to RUL prediction. Let’s identify the most informative features by calculating correlation with RUL and removing low-variance sensors:
# Remove sensors with near-zero variance
sensor_columns = [col for col in train_df.columns if col.startswith('sensor_')]
variance_threshold = 0.01
variances = train_df[sensor_columns].var()
low_variance_sensors = variances[variances < variance_threshold].index.tolist()
print(f"Removing low-variance sensors: {low_variance_sensors}")
# Calculate correlation with RUL
correlations = train_df[sensor_columns].corrwith(train_df['RUL']).abs().sort_values(ascending=False)
print(f"nTop 10 sensors by correlation with RUL:n{correlations.head(10)}")
# Select top features
top_sensors = correlations.head(14).index.tolist()
feature_columns = ['setting_1', 'setting_2', 'setting_3'] + top_sensors
print(f"nSelected {len(feature_columns)} features for modeling")
Baseline Model: Linear Regression
Let’s start with the simplest approach: linear regression. This assumes RUL degradation follows a linear pattern, which is often not true but provides a useful baseline.
Theory
Linear regression models RUL as a weighted sum of input features:
Where:
– is the intercept term
– are the feature coefficients
– are the input features (sensor readings, operational settings)
– is the error term
Implementation
# Prepare features and target
X = train_df[feature_columns].values
y = train_df['RUL'].values
# Train-test split (80-20)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Normalize features (critical for many ML algorithms)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train linear regression
lr_model = LinearRegression()
lr_model.fit(X_train_scaled, y_train)
# Predictions
y_pred_train = lr_model.predict(X_train_scaled)
y_pred_test = lr_model.predict(X_test_scaled)
# Evaluation metrics
def evaluate_model(y_true, y_pred, model_name):
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mae = mean_absolute_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
print(f"n{model_name} Performance:")
print(f"RMSE: {rmse:.2f} cycles")
print(f"MAE: {mae:.2f} cycles")
print(f"R² Score: {r2:.4f}")
return {'rmse': rmse, 'mae': mae, 'r2': r2}
lr_metrics = evaluate_model(y_test, y_pred_test, "Linear Regression")
Visualization
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.scatter(y_test, y_pred_test, alpha=0.5, s=10)
plt.plot([0, 125], [0, 125], 'r--', lw=2, label='Perfect Prediction')
plt.xlabel('True RUL (cycles)')
plt.ylabel('Predicted RUL (cycles)')
plt.title('Linear Regression: Predicted vs True RUL')
plt.legend()
plt.grid(True, alpha=0.3)
plt.subplot(1, 2, 2)
residuals = y_test - y_pred_test
plt.hist(residuals, bins=50, edgecolor='black', alpha=0.7)
plt.xlabel('Prediction Error (cycles)')
plt.ylabel('Frequency')
plt.title('Residual Distribution')
plt.axvline(x=0, color='r', linestyle='--', lw=2)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Tree-Based Models: Random Forest
Linear regression struggles with non-linear degradation patterns. Tree-based ensemble methods like Random Forest can capture complex relationships.
Why Random Forest?
- Non-linearity: Handles non-linear relationships between sensors and RUL
- Feature interactions: Automatically captures interactions between sensors
- Robustness: Less sensitive to outliers and noisy sensor readings
- Interpretability: Provides feature importance rankings
Implementation
# Train Random Forest with optimized hyperparameters
rf_model = RandomForestRegressor(
n_estimators=200,
max_depth=20,
min_samples_split=10,
min_samples_leaf=4,
max_features='sqrt',
random_state=42,
n_jobs=-1,
verbose=0
)
rf_model.fit(X_train_scaled, y_train)
# Predictions
y_pred_rf = rf_model.predict(X_test_scaled)
# Evaluation
rf_metrics = evaluate_model(y_test, y_pred_rf, "Random Forest")
# Feature importance analysis
feature_importance = pd.DataFrame({
'feature': feature_columns,
'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)
print(f"nTop 10 Important Features:")
print(feature_importance.head(10))
# Visualize feature importance
plt.figure(figsize=(10, 6))
plt.barh(feature_importance['feature'].head(15),
feature_importance['importance'].head(15))
plt.xlabel('Importance Score')
plt.title('Top 15 Feature Importances (Random Forest)')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()
Gradient Boosting: Pushing Performance Further
Gradient Boosting builds trees sequentially, each correcting errors from previous trees. This often yields better performance than Random Forest.
# Train Gradient Boosting model
gb_model = GradientBoostingRegressor(
n_estimators=200,
learning_rate=0.1,
max_depth=5,
min_samples_split=10,
min_samples_leaf=4,
subsample=0.8,
random_state=42,
verbose=0
)
gb_model.fit(X_train_scaled, y_train)
# Predictions and evaluation
y_pred_gb = gb_model.predict(X_test_scaled)
gb_metrics = evaluate_model(y_test, y_pred_gb, "Gradient Boosting")
Deep Learning: LSTM for Temporal Patterns
All previous models treat each sample independently, ignoring the temporal sequence of sensor readings. Engine degradation is inherently a time-series problem—LSTM networks can capture these temporal dependencies.
Why LSTM for RUL Prediction?
Long Short-Term Memory (LSTM) networks excel at:
– Temporal dependencies: Learning how sensor patterns evolve over time
– Memory: Remembering early degradation signals that predict future failures
– Non-linear dynamics: Modeling complex degradation physics
Preparing Sequential Data
LSTM requires input in the shape (samples, timesteps, features). We’ll create sliding windows of sensor readings:
def create_sequences(df, unit_col, feature_cols, target_col, sequence_length=30):
"""
Create sequences for LSTM training.
Each sequence contains 'sequence_length' consecutive timesteps.
"""
sequences = []
targets = []
for unit_id in df[unit_col].unique():
unit_data = df[df[unit_col] == unit_id].sort_values('time_cycles')
# Extract features and target
features = unit_data[feature_cols].values
rul_values = unit_data[target_col].values
# Create sliding windows
for i in range(len(features) - sequence_length + 1):
sequences.append(features[i:i+sequence_length])
targets.append(rul_values[i+sequence_length-1])
return np.array(sequences), np.array(targets)
# Create sequences
sequence_length = 30
X_seq, y_seq = create_sequences(
train_df,
'unit_id',
feature_columns,
'RUL',
sequence_length
)
print(f"Sequence shape: {X_seq.shape}")
print(f"Target shape: {y_seq.shape}")
# Train-test split
X_seq_train, X_seq_test, y_seq_train, y_seq_test = train_test_split(
X_seq, y_seq, test_size=0.2, random_state=42
)
# Normalize features
scaler_seq = StandardScaler()
X_seq_train_scaled = scaler_seq.fit_transform(
X_seq_train.reshape(-1, X_seq_train.shape[-1])
).reshape(X_seq_train.shape)
X_seq_test_scaled = scaler_seq.transform(
X_seq_test.reshape(-1, X_seq_test.shape[-1])
).reshape(X_seq_test.shape)
Building the LSTM Model
def build_lstm_model(sequence_length, n_features):
"""
Build LSTM model for RUL prediction.
Architecture: LSTM -> Dropout -> LSTM -> Dropout -> Dense -> Output
"""
model = models.Sequential([
layers.LSTM(
units=128,
return_sequences=True,
input_shape=(sequence_length, n_features)
),
layers.Dropout(0.2),
layers.LSTM(units=64, return_sequences=False),
layers.Dropout(0.2),
layers.Dense(units=32, activation='relu'),
layers.Dropout(0.1),
layers.Dense(units=1) # Output: RUL prediction
])
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='mse',
metrics=['mae']
)
return model
# Build model
lstm_model = build_lstm_model(sequence_length, len(feature_columns))
lstm_model.summary()
Training with Callbacks
# Define callbacks for better training
early_stop = callbacks.EarlyStopping(
monitor='val_loss',
patience=15,
restore_best_weights=True,
verbose=1
)
reduce_lr = callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=5,
min_lr=1e-6,
verbose=1
)
# Train the model
history = lstm_model.fit(
X_seq_train_scaled, y_seq_train,
validation_split=0.2,
epochs=100,
batch_size=256,
callbacks=[early_stop, reduce_lr],
verbose=1
)
# Predictions
y_pred_lstm = lstm_model.predict(X_seq_test_scaled).flatten()
# Evaluation
lstm_metrics = evaluate_model(y_seq_test, y_pred_lstm, "LSTM")
Training History Visualization
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('LSTM Training History')
plt.legend()
plt.grid(True, alpha=0.3)
plt.subplot(1, 2, 2)
plt.plot(history.history['mae'], label='Training MAE')
plt.plot(history.history['val_mae'], label='Validation MAE')
plt.xlabel('Epoch')
plt.ylabel('Mean Absolute Error')
plt.title('LSTM MAE History')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Model Comparison
Let’s compare all models side-by-side:
# Compile results
results = pd.DataFrame([
{'Model': 'Linear Regression', **lr_metrics},
{'Model': 'Random Forest', **rf_metrics},
{'Model': 'Gradient Boosting', **gb_metrics},
{'Model': 'LSTM', **lstm_metrics}
])
print("n" + "="*60)
print("MODEL COMPARISON")
print("="*60)
print(results.to_string(index=False))
print("="*60)
# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
metrics = ['rmse', 'mae', 'r2']
titles = ['RMSE (Lower is Better)', 'MAE (Lower is Better)', 'R² Score (Higher is Better)']
for idx, (metric, title) in enumerate(zip(metrics, titles)):
axes[idx].bar(results['Model'], results[metric],
color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'])
axes[idx].set_title(title)
axes[idx].set_ylabel(metric.upper())
axes[idx].tick_params(axis='x', rotation=45)
axes[idx].grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
Understanding Model Predictions
Let’s visualize how each model predicts RUL over the lifetime of a single engine unit:
# Select a random test engine unit
test_unit = train_df[train_df['unit_id'] == 50].copy()
test_unit_features = test_unit[feature_columns].values
test_unit_rul = test_unit['RUL'].values
# Scale features
test_unit_scaled = scaler.transform(test_unit_features)
# Predictions from traditional models
pred_lr = lr_model.predict(test_unit_scaled)
pred_rf = rf_model.predict(test_unit_scaled)
pred_gb = gb_model.predict(test_unit_scaled)
# LSTM prediction (requires sequences)
pred_lstm = []
for i in range(len(test_unit_features) - sequence_length + 1):
seq = test_unit_features[i:i+sequence_length]
seq_scaled = scaler_seq.transform(seq).reshape(1, sequence_length, -1)
pred_lstm.append(lstm_model.predict(seq_scaled, verbose=0)[0, 0])
# Align LSTM predictions
lstm_cycles = test_unit['time_cycles'].values[sequence_length-1:]
# Plot
plt.figure(figsize=(14, 6))
plt.plot(test_unit['time_cycles'], test_unit_rul,
'k-', lw=2, label='True RUL', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_lr,
'--', label='Linear Regression', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_rf,
'--', label='Random Forest', alpha=0.7)
plt.plot(test_unit['time_cycles'], pred_gb,
'--', label='Gradient Boosting', alpha=0.7)
plt.plot(lstm_cycles, pred_lstm,
'--', label='LSTM', alpha=0.7, lw=2)
plt.xlabel('Time Cycles')
plt.ylabel('Remaining Useful Life (cycles)')
plt.title('RUL Prediction Comparison for Engine Unit 50')
plt.legend(loc='upper right')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Key Insights and Trade-offs
Model Selection Guidelines
| Model | Pros | Cons | Best For |
|---|---|---|---|
| Linear Regression | Fast, interpretable, low compute | Poor with non-linearity | Quick baseline, simple systems |
| Random Forest | Good accuracy, feature importance | Memory-intensive, slow inference | Medium datasets, interpretability needed |
| Gradient Boosting | Highest tabular performance | Longer training, hyperparameter-sensitive | Competition-grade accuracy |
| LSTM | Captures temporal patterns | Requires more data, slower, needs GPU | Time-series dependencies critical |
When to Use LSTM?
Use LSTM when:
– Long degradation history matters: Early sensor patterns predict future failures
– Sufficient data: At least 50-100 units with full degradation cycles
– Temporal patterns exist: Sensor trends more informative than point values
– Computational resources available: GPU recommended for training
Stick with tree-based models when:
– Limited data: Fewer than 50 engine units
– Real-time constraints: Need sub-millisecond inference
– Interpretability critical: Need to explain predictions to stakeholders
Hyperparameter Tuning Example
Brief example of tuning the LSTM learning rate:
# Grid search over learning rates
learning_rates = [0.0001, 0.001, 0.01]
best_val_loss = float('inf')
best_lr = None
for lr in learning_rates:
print(f"nTesting learning rate: {lr}")
model = build_lstm_model(sequence_length, len(feature_columns))
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=lr),
loss='mse',
metrics=['mae']
)
history = model.fit(
X_seq_train_scaled, y_seq_train,
validation_split=0.2,
epochs=20,
batch_size=256,
verbose=0
)
val_loss = min(history.history['val_loss'])
print(f"Best validation loss: {val_loss:.4f}")
if val_loss < best_val_loss:
best_val_loss = val_loss
best_lr = lr
print(f"nOptimal learning rate: {best_lr}")
Conclusion
In this episode, we progressed from simple linear regression to sophisticated LSTM networks for RUL prediction. Here’s what we covered:
- Linear Regression: Established baseline performance (~30-40 RMSE typical)
- Random Forest: Captured non-linear relationships with feature importance analysis
- Gradient Boosting: Pushed performance further with sequential error correction
- LSTM Networks: Leveraged temporal dependencies for superior predictions (~15-20 RMSE typical)
Key takeaways:
– Start simple: Always establish baseline with linear/tree models before deep learning
– Feature engineering matters: Even LSTM benefits from good sensor selection
– Temporal patterns are gold: For degradation prediction, sequence models often win
– Trade-offs exist: Balance accuracy, interpretability, and computational cost
Typical performance on C-MAPSS FD001:
– Linear Regression: RMSE ~37-42 cycles
– Random Forest: RMSE ~25-30 cycles
– Gradient Boosting: RMSE ~22-27 cycles
– LSTM: RMSE ~16-21 cycles
In the next and final episode, we’ll take these models to production level: hyperparameter optimization, ensemble methods, uncertainty quantification, and real-world deployment strategies for industrial PHM systems. We’ll also explore how to handle concept drift, sensor failures, and other challenges you’ll face when deploying RUL prediction in actual factories.
Next up: RUL 예측 모델 고도화: 성능 최적화와 실제 산업 적용 전략
Did you find this helpful?
☕ Buy me a coffee
Leave a Reply