-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyGrinder block and sequence missing algorithms are not reaching the correct percentage of missing values #542
Comments
This issue had no activity for 14 days. It will be closed in 1 week unless there is some new activity. Is this issue already resolved? |
Dear Giacomo Guiduzzi, Thank you for reaching out and sharing your observations about sequence-missing and block-missing behaviour in PyGrinder. The behaviour you’ve described could be due to an interaction between the existing missing data in your dataset and the additional missingness introduced. If your dataset already contains missing values, the new missing values added will mix with the original ones. This blending effect could result in the observed actual missing rate being lower than the specified value. This issue is particularly noticeable when there are fewer completely observed sequences or blocks in the data to begin with. Please let me know if this explanation aligns with your situation, or feel free to provide more details about your dataset or experimental setup, and I’d be happy to assist further. Best regards, |
This issue had no activity for 14 days. It will be closed in 1 week unless there is some new activity. Is this issue already resolved? |
Hi @LinglongQian, Thank you for the answer and availability in giving me support on this issue. I'm sorry for the late reply, it's been a busy period. I tried testing the missing values ratio in various scenarios, both with and without already missing values. My dataset has shape 7152x96x5 of type float32. The original dataset has a MVR (missing values ratio) of 0.041. The MVR I'd like to reach is 0.5. If I use block missing with factor 0.2 I get to 0.4562, while if I use a factor of 0.5 I get 0.4556. For some reason using a higher factor lowers the MVR in output, and I don't really get why. I then tried filling all the NaN values in my dataset with
So apparently this happens even when the original MVR is 0.0 (I double checked after filling in the missing values). For clarity purposes, here is how I compute the MVR:
Let me know what you think about this situation and if there is anything more I can do to help you identify the issue. |
This issue had no activity for 14 days. It will be closed in 1 week unless there is some new activity. Is this issue already resolved? |
Issue description
Greetings,
I'm working on a project related to forecasting time series with Deep Learning methods. A quick question about sequence missing and block missing from PyGrinder: I noticed that when I set a replace_pct value of 0.5 I am not actually getting around 50% of missing values, but 39%. If I raise this value to 0.75 then I get around 50%. Is this normal? Am I missing something?
Let me know if there is any additional information I can give you regarding this behaviour.
Thanks in advance, I'm looking forward to your kind response.
Best Regards,
Giacomo Guiduzzi
The text was updated successfully, but these errors were encountered: