Synthetic E-Nose Data Generation Using Transformer

📌 Project Overview

This project focuses on generating realistic synthetic Electronic Nose (E-Nose) time-series data using a Transformer-based model. The goal is to improve data availability, generalization, and downstream classification performance for coffee quality assessment.

This work was carried out as part of an internship project.

🎯 Objectives

Generate realistic synthetic E-Nose sensor data
Capture and distinguish spectral patterns of different coffee quality types
Evaluate the usefulness of synthetic data in downstream machine learning tasks

📊 Dataset Description

The dataset is based on E-Nose measurements for Colombian coffee quality control, aimed at detecting defects during cup tests.

Key details:

58 coffee samples
3 quality labels:
- High Quality (HQ)
- Average Quality (AQ)
- Low Quality (LQ)
Time-series data sampled at 1 Hz for 300 seconds
8 gas sensors per sample:
- SP-12A, SP-31, TGS-813, TGS-842
- SP-AQ3, TGS-823, ST-31, TGS-800
Sensor readings are resistance values (kΩ)

🧠 Model Architecture

A Transformer Encoder–Decoder architecture was used for time-series reconstruction and synthetic data generation.

Key components:

Self-Attention to capture long-range temporal dependencies
Multi-Head Attention for diverse feature learning
Positional Encoding to preserve time-step order

⚙️ Training Setup

Framework: PyTorch
Environment: Google Colab
Version Control: GitHub

Preprocessing & Training:

Data normalization
Loss function: Mean Squared Error (MSE)
Train–validation split for generalization testing

📈 Results

Reconstruction Quality

Low MSE between real and synthetic data
Strong sensor-wise similarity
Real vs. Synthetic plots show close alignment

Training Behavior

Smooth decrease in training and validation loss
No overfitting observed
Stable and well-regularized training process

🔍 Downstream Evaluation

Synthetic data was evaluated using an LSTM-based classifier.

Classification Accuracy:

Real data only: 0.6667
Synthetic data only: 0.7500
Hybrid (real + synthetic): 0.7083

✅ Synthetic data improves classification performance.

⚠️ Challenges

Imbalanced dataset
Overfitting risks
Long training times
Handling time-series data with Transformer models

📚 Key Learnings

Importance of synthetic data in machine learning
Transformers are effective for time-series generation
Visualization and downstream evaluation are critical
Synthetic data enhances model robustness

🚀 Conclusion & Future Work

Successfully generated realistic synthetic E-Nose data
Demonstrated usefulness for coffee quality classification
Future directions:
- Statistical comparisons (PCA, t-SNE) between real and synthetic data
- Exploring GANs and Informer models for improved performance

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Synthetic_data_gen_using_transformer.ipynb		Synthetic_data_gen_using_transformer.ipynb
data_gen_using_transformer.ipynb		data_gen_using_transformer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic E-Nose Data Generation Using Transformer

📌 Project Overview

🎯 Objectives

📊 Dataset Description

🧠 Model Architecture

⚙️ Training Setup

📈 Results

Reconstruction Quality

Training Behavior

🔍 Downstream Evaluation

⚠️ Challenges

📚 Key Learnings

🚀 Conclusion & Future Work

About

Uh oh!

Releases

Packages

Languages

shekharsharma100001/Data-Generation-using-Transformer

Folders and files

Latest commit

History

Repository files navigation

Synthetic E-Nose Data Generation Using Transformer

📌 Project Overview

🎯 Objectives

📊 Dataset Description

🧠 Model Architecture

⚙️ Training Setup

📈 Results

Reconstruction Quality

Training Behavior

🔍 Downstream Evaluation

⚠️ Challenges

📚 Key Learnings

🚀 Conclusion & Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages