Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation #7

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
30 changes: 15 additions & 15 deletions benchmark_utils/transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,14 @@
# - getting requirements info when all dependencies are not installed.
with safe_import_context() as import_ctx:
import numpy as np

from numpy import concatenate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer using np.concatenate instead of importing the function, as this makes it easier to read the code.

from torch import as_tensor
from skorch.helper import to_numpy
from braindecode.augmentation import ChannelsDropout, SmoothTimeMask


def channels_dropout(
X, y, n_augmentation, seed=0, probability=0.5, p_drop=0.2
X, y, n_augmentation, probability=0.5, p_drop=0.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X, y, n_augmentation, probability=0.5, p_drop=0.2
X, y, n_augmentation, probability=0.5, p_drop=0.2, seed=None

):
"""
Function to apply channels dropout to X raw data
Expand Down Expand Up @@ -43,10 +42,11 @@ def channels_dropout(
The labels.

"""
transform = ChannelsDropout(probability=probability, random_state=seed)

X_augm = to_numpy(X)
y_augm = y
for i in range(n_augmentation):
transform = ChannelsDropout(probability=probability)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not instantiate this object outside the loop?
Also, a seed should be given otherwise the benchmark is not reproducible.
So not sure why you made this change, can you comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it inside the loop because when it was outside the augmentation was always giving the same X_tr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I have the same issue when I fix the seed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could predefine a sequence a seed, to have a different seed for each transformation, do you have an other idea ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, having a sequence of seeds is indeed a nice idea.
But I would look at the Transform object and try to understand what is happening, because the standard for data augmentation would be that repeated calls to the transform should give different augmented data.

Can you do simple example and check what is the behavior of the object?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I know what is the exact issue, what defines the transformation is mask_len_samples and mask_start_per_sample. They are generated "randomly" in transform.get_augmentation_params, the issue is that to choose randomly, the transformation uses rng.uniform with a seed that is fixed. So we are always having the same parameters for the augmentation and we get at each iteration the same augmented data.

X_tr, _ = transform.operation(
as_tensor(X).float(), None, p_drop=p_drop
)
Expand All @@ -59,33 +59,33 @@ def channels_dropout(


def smooth_timemask(
X, y, n_augmentation, sfreq, seed=0, probability=0.5, second=0.1
X, y, n_augmentation, sfreq, probability=0.8, second=0.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X, y, n_augmentation, sfreq, probability=0.8, second=0.2
X, y, n_augmentation, sfreq, probability=0.8, second=0.2, seed=None

):
"""
Function to apply smooth time mask to X raw data
and concatenate it to the original data.
"""

transform = SmoothTimeMask(
probability=probability,
mask_len_samples=int(sfreq * second),
random_state=seed,
)

X_torch = as_tensor(np.array(X)).float()
y_torch = as_tensor(y).float()
param_augm = transform.get_augmentation_params(X_torch, y_torch)
mls = param_augm["mask_len_samples"]
msps = param_augm["mask_start_per_sample"]

X_augm = to_numpy(X)
y_augm = y

mask_len_samples = int(sfreq * second)
for i in range(n_augmentation):

transform = SmoothTimeMask(
probability=probability,
mask_len_samples=mask_len_samples,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird formatting


param_augm = transform.get_augmentation_params(X_torch, y_torch)
mls = param_augm["mask_len_samples"]
msps = param_augm["mask_start_per_sample"]

X_tr, _ = transform.operation(
X_torch, None, mask_len_samples=mls, mask_start_per_sample=msps
)

X_tr = X_tr.numpy()
X_augm = concatenate((X_augm, X_tr))
y_augm = concatenate((y_augm, y))
Expand Down
10 changes: 7 additions & 3 deletions objective.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import balanced_accuracy_score as BAS
from sklearn.metrics import accuracy_score

from skorch.helper import SliceDataset, to_numpy
from benchmark_utils.dataset import split_windows_train_test
Expand Down Expand Up @@ -113,9 +114,12 @@ def compute(self, model):
self.X_train = to_numpy(self.X_train)
self.X_test = to_numpy(self.X_test)

score_train = model.score(self.X_train, self.y_train)
score_test = model.score(self.X_test, self.y_test)
bl_acc = BAS(self.y_test, model.predict(self.X_test))
y_pred_train = model.predict(self.X_train)
y_pred_test = model.predict(self.X_test)

score_train = accuracy_score(self.y_train, y_pred_train)
score_test = accuracy_score(self.y_test, y_pred_test)
bl_acc = BAS(self.y_test, y_pred_test)

# This method can return many metrics in a dictionary. One of these
# metrics needs to be `value` for convergence detection purposes.
Expand Down