#43 - Reproducibility #45

jtimko16 · 2024-07-25T21:04:15Z

Fix several reproducibility issues.

However, I didn't manage to fix it in the whole code; there is still some uncaptured randomness that I haven't been able to solve so far.

43 reprod issue

janezlapajne · 2024-07-26T07:38:07Z

Hello, I will go quickly through the changes and check if I everything seems ok - see code comments. I cannot fully test the thing right now, maybe in the following days. I will post new updates under issue: #43.

janezlapajne

I've added some comments and suggestions. However, please note that I'm not familiar with the overall code structure, so some comments may not be relevant.

src/autofeat/featsel.py

janezlapajne · 2024-07-26T07:52:08Z

src/autofeat/featsel.py

@@ -223,36 +242,40 @@ def select_features(

    # select good features in k runs in parallel
    # by doing sort of a cross-validation (i.e., randomly subsample data points)
-    def run_select_features(i: int):
+    def run_select_features(i: int, seed:int):


use space after colon.

also for consistency maybe call it random_seed here as well. but actually I think that setting the same random seed here might be problematic since then all runs could yield in the same result, defeating the purpose of doing this multiple times

Yes, I fixed formatting.

@cod3licious I fixed the issue like this, let me know whether it makes sense to you

np.random.seed(random_seed) loop_seed = np.random.randint(10**6) # Added to random_seed to make sure that the 1run seed is different for each run, but globally reproducible seed = random_seed + loop_seed if random_seed is not None else loop_seed

src/autofeat/featsel.py

cod3licious · 2024-07-26T11:35:11Z

Thanks for looking into this! I think you definitely found some places where setting random seeds should help, but I think there are also some places where it is not necessary. To avoid overcomplicating the implementation, it would be great if in the end you could also check whether some of your additions could be removed again while still keeping it reproducible. I'd be happy to later comment on the places where I think it might be possible to keep the existing implementation while still getting reproducible results.

jtimko16 · 2024-07-26T19:04:00Z

Hello @janezlapajne and @cod3licious, thanks a lot for your comments! I will try to continue with it and incorporate the changes next week.

jtimko16 · 2024-07-26T19:06:53Z

Thanks for looking into this! I think you definitely found some places where setting random seeds should help, but I think there are also some places where it is not necessary. To avoid overcomplicating the implementation, it would be great if in the end you could also check whether some of your additions could be removed again while still keeping it reproducible. I'd be happy to later comment on the places where I think it might be possible to keep the existing implementation while still getting reproducible results.

Hello, thanks for your suggestions. As I mentioned, unfortunately, the implementation is not fully reproducible yet. There is for sure improvement from the last time, but there is probably one more randomness that I didn't capture.

However, once I will be looking at it next week, I will do my best to remove extra random seeds and discover the remaining unsolved randomness.

43 reprod issue

jtimko16 · 2024-08-05T14:28:21Z

@janezlapajne and @cod3licious, Thanks again for your great comments. I did my best to implement all of those.

As I mentioned before, we have made some improvements, but the feature selector is not fully reproducible yet. And unfortunately, I don't have time to dive into this deeper again.

Once you re-review the PR, we can agree on the next steps. We can either keep the PR open, or merge at least the current changes. Whatever you think is better.

src/autofeat/featsel.py

…tils.py

Mod - using KFold with all CV models; move random_seed_generator to u…

cod3licious

Thank you for your work so far! However, I'm against merging this before it is truly ready. This also needs some proper tests (beyond a manually run notebook) to ensure that future changes don't cause a regression.

src/autofeat/utils.py

src/autofeat/featsel.py

…comment

Mod - replaced custom function by np.random.default_rng(); fixed the …

jtimko16 · 2024-08-08T14:29:43Z

Thank you for your work so far! However, I'm against merging this before it is truly ready. This also needs some proper tests (beyond a manually run notebook) to ensure that future changes don't cause a regression.

I agree, it will be better to fully finalize, rather than merge half done. As I said, I am quite busy those days, but I will try to find time for it once possible :)

jtimko16 · 2024-08-17T15:57:34Z

Hello everyone,

I spent another time with debugging today. However, I didn't manage to identify the remaining issue - there is still some randomness, even when running FeatureSelector only for featsel_run=1.

Therefore, I am not sure whether I will be able to finish. In case anyone would like to take over, I can explain/summarize what I have done so far, or we can team up and look at it together.

All the best,
J.

jtimko16 and others added 12 commits July 22, 2024 14:55

Git - added to gitignore folder for testing reproducibility

3ae7f24

Mod - modified gitignore

3fb56dc

Gitignore - added folder autofeat_reproducibility

11d388d

Add - Random seeds

24c2c20

Mod - change list to sorted (avoid randomness)

c1821d2

Mod - fix the Parallel function

ec9457a

Mod - fix reproduciblity when sorting columns

3cae5d2

Mod - Random seed added to definition of run_select_features

5727461

Mod - make consistent another seed

bbbfa7e

Add - added random seed to _noise_fintering

dcdfec0

Clean - remove extra print statements

2a9ea60

Merge pull request #1 from jtimko16/43-reprod-issue

1b9b7da

43 reprod issue

jtimko16 mentioned this pull request Jul 25, 2024

Reproducibility issue #43

Open

janezlapajne suggested changes Jul 26, 2024

View reviewed changes

cod3licious reviewed Jul 26, 2024

View reviewed changes

src/autofeat/featsel.py Outdated Show resolved Hide resolved

cod3licious reviewed Jul 26, 2024

View reviewed changes

src/autofeat/featsel.py Show resolved Hide resolved

cod3licious reviewed Jul 26, 2024

View reviewed changes

src/autofeat/featsel.py Outdated Show resolved Hide resolved

jtimko16 and others added 8 commits August 5, 2024 16:19

Format - run RUFF formatting on featset

306eacf

Mod - added separate cross validation before fitting models

77336c5

Rem - removed extra random seed

1e8e69f

Mod - solve the seed within 1run of select features

b2f6c7a

Mod - solved the random seed generator

ea1f742

Typing - fixed typing hint of random_seed

73b8381

Mod - removed extra randomness in selecting columns

0a02690

Merge pull request #2 from jtimko16/43-reprod-issue

9812d86

43 reprod issue

janezlapajne suggested changes Aug 6, 2024

View reviewed changes

src/autofeat/featsel.py Outdated Show resolved Hide resolved

src/autofeat/featsel.py Outdated Show resolved Hide resolved

jtimko16 and others added 2 commits August 6, 2024 18:47

Mod - using KFold with all CV models; move random_seed_generator to u…

eff9428

…tils.py

Merge pull request #3 from jtimko16/43-reprod-issue

0c87f7d

Mod - using KFold with all CV models; move random_seed_generator to u…

cod3licious requested changes Aug 8, 2024

View reviewed changes

src/autofeat/utils.py Outdated Show resolved Hide resolved

src/autofeat/featsel.py Outdated Show resolved Hide resolved

jtimko16 and others added 2 commits August 8, 2024 17:24

Mod - replaced custom function by np.random.default_rng(); fixed the …

de21a01

…comment

Merge pull request #4 from jtimko16/43-reprod-issue

8d0b566

Mod - replaced custom function by np.random.default_rng(); fixed the …

jtimko16 marked this pull request as draft August 27, 2024 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#43 - Reproducibility #45

#43 - Reproducibility #45

jtimko16 commented Jul 25, 2024

janezlapajne commented Jul 26, 2024

janezlapajne left a comment

janezlapajne Jul 26, 2024

cod3licious Jul 26, 2024 •

edited

Loading

jtimko16 Aug 5, 2024

cod3licious commented Jul 26, 2024

jtimko16 commented Jul 26, 2024

jtimko16 commented Jul 26, 2024

jtimko16 commented Aug 5, 2024

cod3licious left a comment

jtimko16 commented Aug 8, 2024

jtimko16 commented Aug 17, 2024

#43 - Reproducibility #45

Are you sure you want to change the base?

#43 - Reproducibility #45

Conversation

jtimko16 commented Jul 25, 2024

janezlapajne commented Jul 26, 2024

janezlapajne left a comment

Choose a reason for hiding this comment

janezlapajne Jul 26, 2024

Choose a reason for hiding this comment

cod3licious Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

jtimko16 Aug 5, 2024

Choose a reason for hiding this comment

cod3licious commented Jul 26, 2024

jtimko16 commented Jul 26, 2024

jtimko16 commented Jul 26, 2024

jtimko16 commented Aug 5, 2024

cod3licious left a comment

Choose a reason for hiding this comment

jtimko16 commented Aug 8, 2024

jtimko16 commented Aug 17, 2024

cod3licious Jul 26, 2024 •

edited

Loading