Skip to content

Add LSTM vs CNN timeseries classification example for FordA dataset #2120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 257 additions & 0 deletions examples/timeseries/lstm_vs_cnn_timeseries_classification.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
"""
Title: Timeseries Classification with LSTM and CNN Models
Author: [Georgios Sklavounakos] (https://github.com/gsklavounakos)
Date created: 2025/06/05
Last modified: 2025/06/05
Description: Comparing LSTM and 1D CNN models for timeseries classification on the FordA dataset from the UCR/UEA archive.
Accelerator: GPU
"""

"""
## Introduction

This example demonstrates how to perform timeseries classification using two deep learning models:
a Long Short-Term Memory (LSTM) model, which processes sequences using recurrent layers, and a 1D Convolutional
Neural Network (CNN) model, which uses convolutions to detect local temporal patterns. We use the FordA dataset
from the [UCR/UEA archive](https://www.cs.ucr.edu/~eamonn/time_series_data_2018/), which contains univariate
timeseries of engine noise measurements for binary classification (fault detection). This example compares the
performance of both models under identical training conditions.

The workflow includes:
- Loading and preprocessing the FordA dataset
- Building and training LSTM and CNN models
- Evaluating and comparing their performance
- Visualizing training metrics
"""

"""
## Setup
"""

import numpy as np
import keras
from keras import layers
import matplotlib.pyplot as plt

"""
## Load the Data: FordA Dataset

### Dataset Description

The FordA dataset contains 3601 training instances and 1320 testing instances, each a univariate timeseries of
500 timesteps representing engine noise measurements. The task is to classify whether a fault is present (label 1)
or not (label -1). The data is z-normalized (mean=0, std=1) and sourced from the UCR/UEA archive. For details, see
[the dataset description](http://www.j-wichard.de/publications/FordPaper.pdf).
"""

def readucr(filename):
"""Read UCR timeseries dataset from a TSV file."""
data = np.loadtxt(filename, delimiter="\t")
y = data[:, 0].astype(int)
x = data[:, 1:].astype(np.float32)
return x, y

root_url = "https://raw.githubusercontent.com/hfawaz/cd-diagram/master/FordA/"

x_train, y_train = readucr(root_url + "FordA_TRAIN.tsv")
x_test, y_test = readucr(root_url + "FordA_TEST.tsv")

"""
## Visualize the Data

We plot one example timeseries from each class to understand the data's characteristics.
"""

classes = np.unique(np.concatenate((y_train, y_test), axis=0))

plt.figure(figsize=(8, 4))
for c in classes:
c_x_train = x_train[y_train == c]
plt.plot(c_x_train[0], label=f"class {c}")
plt.title("Sample Time Series from Each Class")
plt.xlabel("Timestep")
plt.ylabel("Amplitude")
plt.legend(loc="best")
plt.show()
plt.close()

"""
## Preprocess the Data

The timeseries are already z-normalized and of fixed length (500 timesteps). We reshape the data to
`(samples, timesteps, 1)` to represent univariate timeseries as multivariate with one channel, enabling
compatibility with both LSTM and CNN models. We also standardize labels to 0 and 1 for binary classification.
"""

x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], 1))
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], 1))

# Standardize labels: map {-1, 1} to {0, 1}
y_train = np.where(y_train == -1, 0, y_train)
y_test = np.where(y_test == -1, 0, y_test)

# Shuffle training data for validation split
idx = np.random.permutation(len(x_train))
x_train = x_train[idx]
y_train = y_train[idx]

num_classes = len(np.unique(y_train))

"""
## Build the Models

We define two models:
1. **LSTM Model**: A recurrent model with two LSTM layers for sequential processing, with dropout for regularization.
2. **CNN Model**: A convolutional model with 1D convolutions, batch normalization, max-pooling, and global average pooling,
inspired by [this paper](https://arxiv.org/abs/1611.06455).

Both models use similar hyperparameters for a fair comparison.
"""

def build_lstm_model(input_shape, num_classes):
"""Build an LSTM-based model for timeseries classification."""
model = keras.Sequential([
layers.Input(shape=input_shape),
layers.LSTM(64, return_sequences=True, dropout=0.3, recurrent_dropout=0.3),
layers.LSTM(32, dropout=0.3, recurrent_dropout=0.3),
layers.Dense(16, activation="relu"),
layers.Dense(num_classes, activation="softmax")
])
return model

def build_cnn_model(input_shape, num_classes):
"""Build a 1D CNN-based model for timeseries classification."""
model = keras.Sequential([
layers.Input(shape=input_shape),
layers.Conv1D(filters=64, kernel_size=7, padding="same", activation="relu"),
layers.BatchNormalization(),
layers.MaxPooling1D(pool_size=2),
layers.Dropout(0.3),
layers.Conv1D(filters=128, kernel_size=5, padding="same", activation="relu"),
layers.BatchNormalization(),
layers.MaxPooling1D(pool_size=2),
layers.Dropout(0.3),
layers.GlobalAveragePooling1D(),
layers.Dense(16, activation="relu"),
layers.Dense(num_classes, activation="softmax")
])
return model

input_shape = x_train.shape[1:]
lstm_model = build_lstm_model(input_shape, num_classes)
cnn_model = build_cnn_model(input_shape, num_classes)

"""
## Train the Models

We train both models with identical settings: Adam optimizer, sparse categorical crossentropy loss,
and early stopping to prevent overfitting. We save the best model weights based on validation loss
and reload them for evaluation.
"""

epochs = 100
batch_size = 32

def train_model(model, model_name, x_train, y_train, x_test, y_test):
"""Train and evaluate a model, return history and test metrics."""
model.compile(
optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["sparse_categorical_accuracy"]
)

history = model.fit(
x_train,
y_train,
epochs=epochs,
batch_size=batch_size,
validation_split=0.2,
callbacks=[
keras.callbacks.ModelCheckpoint(
f"best_{model_name}_model.keras",
save_best_only=True,
monitor="val_loss"
),
keras.callbacks.EarlyStopping(
monitor="val_loss",
patience=20,
restore_best_weights=True
)
],
verbose=1
)

# Load best model for evaluation
best_model = keras.models.load_model(f"best_{model_name}_model.keras")
test_loss, test_acc = best_model.evaluate(x_test, y_test, verbose=0)
print(f"{model_name} Test Accuracy: {test_acc:.4f}, Loss: {test_loss:.4f}")

return history, test_acc, test_loss

# Train LSTM model
print("Training LSTM model...")
lstm_history, lstm_test_acc, lstm_test_loss = train_model(lstm_model, "LSTM", x_train, y_train, x_test, y_test)

# Train CNN model
print("Training CNN model...")
cnn_history, cnn_test_acc, cnn_test_loss = train_model(cnn_model, "CNN", x_train, y_train, x_test, y_test)

"""
## Visualize Training Metrics

We plot the training and validation accuracy and loss for both models to compare their performance.
"""

def plot_training_metrics(histories, model_names):
"""Plot training and validation accuracy/loss for multiple models."""
plt.figure(figsize=(12, 4))

# Accuracy plot
plt.subplot(1, 2, 1)
for history, name in zip(histories, model_names):
plt.plot(history.history["sparse_categorical_accuracy"], label=f"{name} Train")
plt.plot(history.history["val_sparse_categorical_accuracy"], linestyle="--", label=f"{name} Val")
plt.title("Model Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()

# Loss plot
plt.subplot(1, 2, 2)
for history, name in zip(histories, model_names):
plt.plot(history.history["loss"], label=f"{name} Train")
plt.plot(history.history["val_loss"], linestyle="--", label=f"{name} Val")
plt.title("Model Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()

plt.tight_layout()
plt.show()

plot_training_metrics([lstm_history, cnn_history], ["LSTM", "CNN"])

"""
## Evaluate and Compare Models

We compare the test accuracy and loss of both models to assess their performance.
"""

print("\nModel Comparison:")
print(f"LSTM Test Accuracy: {lstm_test_acc:.4f}, Loss: {lstm_test_loss:.4f}")
print(f"CNN Test Accuracy: {cnn_test_acc:.4f}, Loss: {cnn_test_loss:.4f}")

"""
## Conclusions

This example compared an LSTM-based model and a 1D CNN-based model for timeseries classification
on the FordA dataset. The LSTM model leverages sequential dependencies, while the CNN model captures
local temporal patterns through convolutions. The CNN often converges faster due to its robust architecture
with batch normalization and global pooling, while the LSTM may better handle long-term dependencies.

To improve performance, consider:
- Tuning hyperparameters (e.g., number of layers, units, or kernel sizes) using Keras Tuner.
- Adding further regularization (e.g., L2 regularization) to prevent overfitting.
- Experimenting with hybrid architectures combining LSTM and CNN layers.
- Using data augmentation techniques for timeseries, such as jittering or scaling.
"""
54 changes: 54 additions & 0 deletions examples/timeseries/md/lstm_vs_cnn_timeseries_classification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Timeseries Classification with LSTM and CNN Models

**Author**: [Georgios Sklavounakos] (https://github.com/gsklavounakos)
**Date created**: 2025/06/05
**Last modified**: 2025/06/05
**Description**: Compare LSTM and 1D CNN models for timeseries classification using the FordA dataset.
**Accelerator**: GPU

---

## Introduction

This example shows how to classify univariate timeseries data using two types of deep learning models:

- A Long Short-Term Memory (LSTM) model, which processes sequences using recurrent layers.
- A 1D Convolutional Neural Network (CNN) model, which extracts local patterns via convolution.

We use the FordA dataset from the [UCR/UEA archive](https://www.cs.ucr.edu/~eamonn/time_series_data_2018/), which consists of engine noise measurements recorded over time. The classification task is to determine whether a fault is present in the signal.

---

## What the Example Demonstrates

- How to preprocess univariate timeseries data for model input
- How to build and train two deep learning models with different architectures
- How to compare performance in terms of accuracy and training dynamics
- How to visualize model learning curves

---

## Dataset Overview

- **Name**: FordA
- **Source**: [UCR/UEA Timeseries Archive](https://www.cs.ucr.edu/~eamonn/time_series_data_2018/)
- **Samples**: 3601 train, 1320 test
- **Length**: 500 time steps
- **Channels**: 1 (univariate)
- **Labels**: -1 (no fault), 1 (fault) → mapped to 0 and 1

Each timeseries is z-normalized and represents a sensor reading from an automotive engine.

---

## Results

The models are trained with the same optimizer, loss function, and validation strategy. After training, the test accuracy and loss are reported for both models. Training and validation curves are also plotted for visual comparison.

---

## References

- [UCR/UEA Timeseries Archive](https://www.cs.ucr.edu/~eamonn/time_series_data_2018/)
- Wang et al., “Time Series Classification from Scratch with Deep Neural Networks”
[[arXiv:1611.06455](https://arxiv.org/abs/1611.06455)]