Skip to content

Commit

Permalink
update gdd dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
kokikwbt committed Nov 3, 2021
1 parent 09b7172 commit 88253b7
Show file tree
Hide file tree
Showing 82 changed files with 2,431 additions and 46 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,7 @@ dmypy.json
.pyre/
raw
processed
out
out

*.pdf
*.png
40 changes: 27 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Predictive Maintenance (PM)
# Predictive Maintenance

This repository is intended to enable quick access to datasets for predictive maintenance tasks.
The follwoing table summrizes the available features for the PM tasks,
This repository is intended to enable quick access to datasets for predictive maintenance (PM) tasks.
The following table summarizes the available features,
where the marks show:
- `x`: satisfying availablity
- `x`: satisfying availability
- `u`: univariate features
- `m`: multivariate features

Expand All @@ -27,6 +27,20 @@ the richness of attributes you may check them up with higher priority.

</center>

## Usage

Please put `datasets` directory into your workspace and import it like:

```python
import datasets

# Exmaple
datasets.ufd.load_data()
```

## Notebooks

There are Jupyter notebooks for all datasets, which may help interactive processing and visualization of data.

## References

Expand All @@ -40,28 +54,28 @@ the richness of attributes you may check them up with higher priority.
[https://square.github.io/pysurvival/index.html](https://square.github.io/pysurvival/index.html)
1. Types of proactive maintenance:
[https://solutions.borderstates.com/types-of-proactive-maintenance/](https://solutions.borderstates.com/types-of-proactive-maintenance/)
1. Common license types for datasets
1. Common license types for datasets:
[https://www.kaggle.com/general/116302](https://www.kaggle.com/general/116302)

### Dataset Sources

1. ALPI: Diego Tosato, Davide Dalle Pezze, Chiara Masiero, Gian Antonio Susto, Alessandro Beghi, 2020. Alarm Logs in Packaging Industry (ALPI).
[https://dx.doi.org/10.21227/nfv6-k750](https://dx.doi.org/10.21227/nfv6-k750)
1. UFD: Ultrasonic flowmeter diagnostics Data Set
1. UFD: Ultrasonic flowmeter diagnostics Data Set:
[https://archive.ics.uci.edu/ml/datasets/Ultrasonic+flowmeter+diagnostics](https://archive.ics.uci.edu/ml/datasets/Ultrasonic+flowmeter+diagnostics)
1. NASA Bearing Dataset
1. NASA Bearing Dataset:
[https://www.kaggle.com/vinayak123tyagi/bearing-dataset](https://www.kaggle.com/vinayak123tyagi/bearing-dataset)
1. CWRU Bearing Dataset
1. CWRU Bearing Dataset:
[https://www.kaggle.com/brjapon/cwru-bearing-datasets](https://www.kaggle.com/brjapon/cwru-bearing-datasets)
1. MAPM: Microsoft Azure Predictive Maintenance
1. MAPM: Microsoft Azure Predictive Maintenance:
[https://www.kaggle.com/arnabbiswas1/microsoft-azure-predictive-maintenance](https://www.kaggle.com/arnabbiswas1/microsoft-azure-predictive-maintenance)
1. HydSys: Predictive Maintenance Of Hydraulics System
1. HydSys: Predictive Maintenance Of Hydraulics System:
[https://www.kaggle.com/mayank1897/condition-monitoring-of-hydraulic-systems](https://www.kaggle.com/mayank1897/condition-monitoring-of-hydraulic-systems)
1. GFD: Gearbox Fault Diagnosis
1. GFD: Gearbox Fault Diagnosis:
[https://www.kaggle.com/brjapon/gearbox-fault-diagnosis](https://www.kaggle.com/brjapon/gearbox-fault-diagnosis)
1. PPD: Production Plant Data for Condition Monitoring
1. PPD: Production Plant Data for Condition Monitoring:
[https://www.kaggle.com/inIT-OWL/production-plant-data-for-condition-monitoring](https://www.kaggle.com/inIT-OWL/production-plant-data-for-condition-monitoring)
1. GDD: Genesis demonstrator data for machine learning
1. GDD: Genesis demonstrator data for machine learning:
[https://www.kaggle.com/inIT-OWL/genesis-demonstrator-data-for-machine-learning](https://www.kaggle.com/inIT-OWL/genesis-demonstrator-data-for-machine-learning)

<!-- 1. Condition Based Maintenance (CBM) of Naval Propulsion Plants Data Set
Expand Down
8 changes: 8 additions & 0 deletions datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from . import alpi
from . import cbm
from . import gdd
from . import gfd
from . import hydsys
from . import mapm
from . import ppd
from . import ufd
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
176 changes: 176 additions & 0 deletions datasets/gdd/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
import datetime
import os
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import pandas as pd
import seaborn as sns


def load_data(index='state'):

assert index in ['state', 'anomaly', 'normal', 'linear', 'pressure']
fp = os.path.dirname(__file__)

if index == 'state':
df = pd.read_csv(fp + '/Genesis_StateMachineLabel.csv.gz')
elif index == 'anomaly':
df = pd.read_csv(fp + '/Genesis_AnomalyLabels.csv.gz')
elif index == 'normal':
df = pd.read_csv(fp + '/Genesis_normal.csv.gz')
df.Timestamp = df.Timestamp / 1000
elif index == 'linear':
df = pd.read_csv(fp + '/Genesis_lineardrive.csv.gz')
df.Timestamp = df.Timestamp / 1000
elif index == 'pressure':
df = pd.read_csv(fp + '/Genesis_pressure.csv.gz')
df.Timestamp = df.Timestamp / 1000

df.Timestamp = df.Timestamp.apply(datetime.datetime.fromtimestamp)

return df


def plot_genesis_labels(df, figsize=(15, 20), cmap='tab10'):
""" Call this for machine states and anomaly labels """

fig, ax = plt.subplots(10, figsize=figsize)

df['MotorData.ActCurrent'].plot(ax=ax[0], legend=True, cmap=cmap)
df['MotorData.ActPosition'].plot(ax=ax[1], legend=True, cmap=cmap)
df['MotorData.ActSpeed'].plot(ax=ax[2], legend=True, cmap=cmap)

df['MotorData.IsAcceleration'].plot(ax=ax[3], legend=True, cmap=cmap)
df['MotorData.IsForce'].plot(ax=ax[4], legend=True, cmap=cmap)

df[['MotorData.Motor_Pos1reached', # binary
'MotorData.Motor_Pos2reached', # binary
'MotorData.Motor_Pos3reached', # binary
'MotorData.Motor_Pos4reached', # binary
]].plot(ax=ax[5], legend=True, cmap=cmap)

df[['NVL_Recv_Ind.GL_Metall', # binary
'NVL_Recv_Ind.GL_NonMetall', # binary
]].plot(ax=ax[6], legend=True, cmap=cmap)

df[['NVL_Recv_Storage.GL_I_ProcessStarted', # binary
'NVL_Recv_Storage.GL_I_Slider_IN', # binary
'NVL_Recv_Storage.GL_I_Slider_OUT', # binary
'NVL_Recv_Storage.GL_LightBarrier', # binary
'NVL_Send_Storage.ActivateStorage', # binary
]].plot(ax=ax[7], legend=True, cmap=cmap)

df[['PLC_PRG.Gripper', # binary
'PLC_PRG.MaterialIsMetal', # binary
]].plot(ax=ax[8], legend=True, cmap=cmap)

df['Label'].plot(ax=ax[9], legend=True, cmap=cmap)

for axi in ax:
axi.set_xlim(0, df.shape[0])
axi.set_ylabel('Value')

ax[0].set_title('Date: {} to {}'.format(
df.Timestamp.min(), df.Timestamp.max()))
ax[-1].set_xlabel('Time')
fig.tight_layout()

return fig, ax


def plot_genesis_nonlabels(df, figsize=(15, 20), cmap='tab10'):
""" Call this for non-labeled data """

fig, ax = plt.subplots(8, figsize=figsize)

df[['MotorData.SetCurrent',
'MotorData.ActCurrent',
]].plot(ax=ax[0], legend=True, cmap=cmap)

df[['MotorData.SetSpeed',
'MotorData.ActSpeed',
]].plot(ax=ax[1], legend=True, cmap=cmap)

df[['MotorData.SetAcceleration',
'MotorData.IsAcceleration',
]].plot(ax=ax[2], legend=True, cmap=cmap)

df[['MotorData.SetForce',
'MotorData.IsForce'
]].plot(ax=ax[3], legend=True, cmap=cmap)

df[['MotorData.Motor_Pos1reached', # binary
'MotorData.Motor_Pos2reached', # binary
'MotorData.Motor_Pos3reached', # binary
'MotorData.Motor_Pos4reached', # binary
]].plot(ax=ax[4], legend=True, cmap=cmap)

df[['NVL_Recv_Ind.GL_Metall', # binary
'NVL_Recv_Ind.GL_NonMetall', # binary
]].plot(ax=ax[5], legend=True, cmap=cmap)

df[['NVL_Recv_Storage.GL_I_ProcessStarted', # binary
'NVL_Recv_Storage.GL_I_Slider_IN', # binary
'NVL_Recv_Storage.GL_I_Slider_OUT', # binary
'NVL_Recv_Storage.GL_LightBarrier', # binary
'NVL_Send_Storage.ActivateStorage', # binary
]].plot(ax=ax[6], legend=True, cmap=cmap)

df[['PLC_PRG.Gripper', # binary
'PLC_PRG.MaterialIsMetal', # binary
]].plot(ax=ax[7], legend=True, cmap=cmap)

for axi in ax:
axi.set_xlim(0, df.shape[0])
axi.set_ylabel('Value')

ax[0].set_title('Date: {} to {}'.format(df.Timestamp.min(), df.Timestamp.max()))
ax[-1].set_xlabel('Time')

fig.tight_layout()
return fig, ax


def gen_summary(outdir='../out'):

os.makedirs(outdir, exist_ok=True)
fp = os.path.dirname(__file__)
sns.set(font_scale=1.1, style='whitegrid')

with PdfPages(outdir + '/gdd_summary.pdf') as pp:

print('Plotting Genesis_StateMachineLabel...')
df = load_data(index='state')
fig, _ = plot_genesis_labels(df)
fig.savefig(pp, bbox_inches='tight', format='pdf')
plt.clf()
plt.close()

print('Plotting Genesis_AnomalyLabels...')
df = load_data(index='anomaly')
fig, _ = plot_genesis_labels(df)
fig.savefig(pp, bbox_inches='tight', format='pdf')
plt.clf()
plt.close()

print('Plotting Genesis_normal...')
df = load_data(index='normal')
fig, _ = plot_genesis_nonlabels(df)
fig.savefig(pp, bbox_inches='tight', format='pdf')
plt.clf()
plt.close()

print('Plotting Genesis_lineardrive...')
df = load_data(index='linear')
fig, _ = plot_genesis_nonlabels(df)
fig.savefig(pp, bbox_inches='tight', format='pdf')
plt.clf()
plt.close()

print('Plotting Genesis_pressure...')
df = load_data(index='pressure')
fig, _ = plot_genesis_nonlabels(df)
fig.savefig(pp, bbox_inches='tight', format='pdf')
plt.clf()
plt.close()

print("done!")
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
72 changes: 72 additions & 0 deletions datasets/hydsys/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import os
import numpy as np
import pandas as pd


def make_pressure_dataframe(fp, cycle_id=0):
""" 100 Hz. 6000 samples in each cycle
"""
data = pd.DataFrame(columns=[f'PS{i+1}' for i in range(6)])
for i in range(6):
data[f'PS{i+1}'] = np.loadtxt(fp + f'/PS{i+1}.txt.gz')[cycle_id]

return data


def make_motor_power_dataframe(fp, cycle_id=0):
""" 100 Hz. 6000 samples in each cycle
"""
return pd.DataFrame(np.loadtxt(fp + '/EPS1.txt.gz')[cycle_id], columns=['EPS1'])


def make_volume_flow_dataframe(fp, cycle_id=0):
""" 10 Hz. 600 samples in each cycle
"""
data = pd.DataFrame(columns=['FS1', 'FS2'])
data['FS1'] = np.loadtxt(fp + '/FS1.txt.gz')[cycle_id]
data['FS2'] = np.loadtxt(fp + '/FS2.txt.gz')[cycle_id]
return data


def make_temp_dataframe(cycle_id=0):
""" 1 Hz. 60 samples in each cycle
"""
fp = os.path.dirname(__file__)
data = pd.DataFrame(columns=[f'TS{i+1}' for i in range(4)])
data['TS1'] = np.loadtxt(fp + '/TS1.txt.gz')[cycle_id]
data['TS2'] = np.loadtxt(fp + '/TS2.txt.gz')[cycle_id]
data['TS3'] = np.loadtxt(fp + '/TS3.txt.gz')[cycle_id]
data['TS4'] = np.loadtxt(fp + '/TS4.txt.gz')[cycle_id]
return data


def make_vibration_dataframe(cycle_id=0):
fp = os.path.dirname(__file__)
return pd.DataFrame(np.loadtxt(fp + '/VS1.txt.gz')[cycle_id], columns=['VS1'])


def make_efficiency_dataframe(cycle_id=0):
fp = os.path.dirname(__file__)
return pd.DataFrame(np.loadtxt(fp + '/SE.txt.gz')[cycle_id], columns=['SE'])


def make_cooling_dataframe(cycle_id=0):
fp = os.path.dirname(__file__)
data = pd.DataFrame(columns=['CE', 'CP'])
data['CE'] = np.loadtxt(fp + '/CE.txt.gz')[cycle_id]
data['CP'] = np.loadtxt(fp + '/CP.txt.gz')[cycle_id]
return data


def make_condition_dataframe():
fp = os.path.dirname(__file__)
return pd.DataFrame(np.loadtxt(fp + '/profile.txt'),
columns=[
'cooler_condition',
'valve_condition',
'internal_pump_leakage',
'hydraulic_accumulator',
'stable_flag']).reset_index().rename(
columns={'index': 'cycle'})


File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
42 changes: 42 additions & 0 deletions datasets/ufd/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import os
import pandas as pd
import numpy as np


def load_data(meter_id='A'):
fp = os.path.dirname(__file__)
data = np.loadtxt(fp + '/Meter{}.txt'.format(meter_id))

if meter_id == 'A':
columns = ['flatness_ratio', 'symmetry', 'crossflow']
columns += ['flow_velocity_{}'.format(i+1) for i in range(8)]
columns += ['sound_speed_{}'.format(i+1) for i in range(8)]
columns += ['average_speed']
columns += ['gain_{}'.format(i+1) for i in range(16)]
columns += ['health_state']

if meter_id == 'B':
columns = ['profile_factor', 'symmetry', 'crossflow', 'swirl_angle']
columns += ['flow_velocity_{}'.format(i+1) for i in range(4)]
columns += ['average_flow']
columns += ['sound_speed_{}'.format(i+1) for i in range(4)]
columns += ['average_speed']
columns += ['signal_strength_{}'.format(i+1) for i in range(8)]
columns += ['turbulence_{}'.format(i+1) for i in range(4)]
columns += ['meter_performance']
columns += ['signal_quality_{}'.format(i+1) for i in range(8)]
columns += ['gain_{}'.format(i+1) for i in range(8)]
columns += ['transit_time_{}'.format(i+1) for i in range(8)]
columns += ['health_state']

if meter_id == 'C' or meter_id == 'D':
columns = ['profile_factor', 'symmetry', 'crossflow']
columns += ['flow_velocity_{}'.format(i+1) for i in range(4)]
columns += ['sound_speed_{}'.format(i+1) for i in range(4)]
columns += ['signal_strength_{}'.format(i+1) for i in range(8)]
columns += ['signal_quality_{}'.format(i+1) for i in range(8)]
columns += ['gain_{}'.format(i+1) for i in range(8)]
columns += ['transit_time_{}'.format(i+1) for i in range(8)]
columns += ['health_state']

return pd.DataFrame(data, columns=columns)
Loading

0 comments on commit 88253b7

Please sign in to comment.