-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data augmentation #7
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
benchmark_utils/transformation.py
Outdated
@@ -6,15 +6,14 @@ | |||
# - getting requirements info when all dependencies are not installed. | |||
with safe_import_context() as import_ctx: | |||
import numpy as np | |||
|
|||
from numpy import concatenate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer using np.concatenate
instead of importing the function, as this makes it easier to read the code.
benchmark_utils/transformation.py
Outdated
X_augm = to_numpy(X) | ||
y_augm = y | ||
for i in range(n_augmentation): | ||
transform = ChannelsDropout(probability=probability) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not instantiate this object outside the loop?
Also, a seed should be given otherwise the benchmark is not reproducible.
So not sure why you made this change, can you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it inside the loop because when it was outside the augmentation was always giving the same X_tr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I have the same issue when I fix the seed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could predefine a sequence a seed, to have a different seed for each transformation, do you have an other idea ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, having a sequence of seeds is indeed a nice idea.
But I would look at the Transform
object and try to understand what is happening, because the standard for data augmentation would be that repeated calls to the transform
should give different augmented data.
Can you do simple example and check what is the behavior of the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think I know what is the exact issue, what defines the transformation is mask_len_samples and mask_start_per_sample. They are generated "randomly" in transform.get_augmentation_params, the issue is that to choose randomly, the transformation uses rng.uniform with a seed that is fixed. So we are always having the same parameters for the augmentation and we get at each iteration the same augmented data.
benchmark_utils/transformation.py
Outdated
probability=probability, | ||
mask_len_samples=mask_len_samples, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments about the PR :)
def channels_dropout( | ||
X, y, n_augmentation, seed=0, probability=0.5, p_drop=0.2 | ||
X, y, n_augmentation, probability=0.5, p_drop=0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X, y, n_augmentation, probability=0.5, p_drop=0.2 | |
X, y, n_augmentation, probability=0.5, p_drop=0.2, seed=None |
@@ -43,51 +47,58 @@ def channels_dropout( | |||
The labels. | |||
|
|||
""" | |||
transform = ChannelsDropout(probability=probability, random_state=seed) | |||
|
|||
seed = gen_seed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seed = gen_seed() | |
rng = np.random.RandomState(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a reproducible sequence of random numbers, the best is to use a random number generator, that will be used to sample different number for each draw, but with a reproducible order.
transform = ChannelsDropout( | ||
probability=probability, | ||
random_state=next(seed) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transform = ChannelsDropout( | |
probability=probability, | |
random_state=next(seed) | |
) | |
transform = ChannelsDropout( | |
probability=probability, | |
random_state=rng | |
) |
|
||
return X_augm, y_augm | ||
|
||
|
||
def smooth_timemask( | ||
X, y, n_augmentation, sfreq, seed=0, probability=0.5, second=0.1 | ||
X, y, n_augmentation, sfreq, probability=0.8, second=0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X, y, n_augmentation, sfreq, probability=0.8, second=0.2 | |
X, y, n_augmentation, sfreq, probability=0.8, second=0.2, seed=None |
@@ -13,7 +13,7 @@ | |||
SmoothTimeMask, | |||
) | |||
from braindecode.models import ShallowFBCSPNet | |||
from numpy import linspace, pi | |||
from numpy import linspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from numpy import linspace | |
import numpy as np |
@@ -96,25 +96,25 @@ def set_objective(self, X, y, sfreq): | |||
mask_len_samples=int(sfreq * second), | |||
random_state=seed, | |||
) | |||
for second in linspace(0.1, 2, 10) | |||
for second in linspace(0.1, 2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for second in linspace(0.1, 2, 3) | |
for second in np.linspace(0.1, 2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decrease the number of samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was only to have faster results
"augmentation": [ | ||
"SmoothTimeMask", | ||
], | ||
"covariances_estimator": ["oas"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"augmentation": [ | |
"SmoothTimeMask", | |
], | |
"covariances_estimator": ["oas"], | |
"covariances_estimator": ["oas"], |
] | ||
|
||
elif self.augmentation == "ChannelDropout": | ||
transforms = [ | ||
ChannelsDropout( | ||
probability=self.proba, p_drop=prob, random_state=seed | ||
) | ||
for prob in linspace(0, 1, 10) | ||
for prob in linspace(0, 1, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for prob in linspace(0, 1, 3) | |
for prob in np.linspace(0, 1, 3) |
random_state=seed, | ||
) | ||
for prob in linspace(0, 2 * pi, 10) | ||
for phase_freq in linspace(0, 1, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for phase_freq in linspace(0, 1, 3) | |
for phase_freq in np.linspace(0, 1, 3) |
Co-authored-by: Thomas Moreau <[email protected]>
The augmentation process is now working as we were expecting and we improved the way we compute the scores