Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation #7

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Data augmentation #7

wants to merge 10 commits into from

Conversation

chris-mrn
Copy link
Collaborator

The augmentation process is now working as we were expecting and we improved the way we compute the scores

Copy link
Member

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

@@ -6,15 +6,14 @@
# - getting requirements info when all dependencies are not installed.
with safe_import_context() as import_ctx:
import numpy as np

from numpy import concatenate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer using np.concatenate instead of importing the function, as this makes it easier to read the code.

X_augm = to_numpy(X)
y_augm = y
for i in range(n_augmentation):
transform = ChannelsDropout(probability=probability)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not instantiate this object outside the loop?
Also, a seed should be given otherwise the benchmark is not reproducible.
So not sure why you made this change, can you comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it inside the loop because when it was outside the augmentation was always giving the same X_tr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I have the same issue when I fix the seed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could predefine a sequence a seed, to have a different seed for each transformation, do you have an other idea ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, having a sequence of seeds is indeed a nice idea.
But I would look at the Transform object and try to understand what is happening, because the standard for data augmentation would be that repeated calls to the transform should give different augmented data.

Can you do simple example and check what is the behavior of the object?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I know what is the exact issue, what defines the transformation is mask_len_samples and mask_start_per_sample. They are generated "randomly" in transform.get_augmentation_params, the issue is that to choose randomly, the transformation uses rng.uniform with a seed that is fixed. So we are always having the same parameters for the augmentation and we get at each iteration the same augmented data.

Comment on lines 78 to 80
probability=probability,
mask_len_samples=mask_len_samples,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird formatting

Copy link
Member

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about the PR :)

def channels_dropout(
X, y, n_augmentation, seed=0, probability=0.5, p_drop=0.2
X, y, n_augmentation, probability=0.5, p_drop=0.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X, y, n_augmentation, probability=0.5, p_drop=0.2
X, y, n_augmentation, probability=0.5, p_drop=0.2, seed=None

@@ -43,51 +47,58 @@ def channels_dropout(
The labels.

"""
transform = ChannelsDropout(probability=probability, random_state=seed)

seed = gen_seed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seed = gen_seed()
rng = np.random.RandomState(seed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a reproducible sequence of random numbers, the best is to use a random number generator, that will be used to sample different number for each draw, but with a reproducible order.

Comment on lines +55 to +58
transform = ChannelsDropout(
probability=probability,
random_state=next(seed)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
transform = ChannelsDropout(
probability=probability,
random_state=next(seed)
)
transform = ChannelsDropout(
probability=probability,
random_state=rng
)


return X_augm, y_augm


def smooth_timemask(
X, y, n_augmentation, sfreq, seed=0, probability=0.5, second=0.1
X, y, n_augmentation, sfreq, probability=0.8, second=0.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X, y, n_augmentation, sfreq, probability=0.8, second=0.2
X, y, n_augmentation, sfreq, probability=0.8, second=0.2, seed=None

@@ -13,7 +13,7 @@
SmoothTimeMask,
)
from braindecode.models import ShallowFBCSPNet
from numpy import linspace, pi
from numpy import linspace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from numpy import linspace
import numpy as np

@@ -96,25 +96,25 @@ def set_objective(self, X, y, sfreq):
mask_len_samples=int(sfreq * second),
random_state=seed,
)
for second in linspace(0.1, 2, 10)
for second in linspace(0.1, 2, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for second in linspace(0.1, 2, 3)
for second in np.linspace(0.1, 2, 3)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decrease the number of samples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was only to have faster results

Comment on lines +22 to 25
"augmentation": [
"SmoothTimeMask",
],
"covariances_estimator": ["oas"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"augmentation": [
"SmoothTimeMask",
],
"covariances_estimator": ["oas"],
"covariances_estimator": ["oas"],

]

elif self.augmentation == "ChannelDropout":
transforms = [
ChannelsDropout(
probability=self.proba, p_drop=prob, random_state=seed
)
for prob in linspace(0, 1, 10)
for prob in linspace(0, 1, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for prob in linspace(0, 1, 3)
for prob in np.linspace(0, 1, 3)

random_state=seed,
)
for prob in linspace(0, 2 * pi, 10)
for phase_freq in linspace(0, 1, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for phase_freq in linspace(0, 1, 3)
for phase_freq in np.linspace(0, 1, 3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants