Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digitize full WF #705

Closed
wants to merge 8 commits into from
Closed

Digitize full WF #705

wants to merge 8 commits into from

Conversation

JoranAngevaare
Copy link
Contributor

@JoranAngevaare JoranAngevaare commented Oct 26, 2022

What is the problem / what does the code in this PR do
See #704

The sum WF is always conservative by truncating the peak when downsampling. While probably some loss is acceptable, having O(3%) differences between p['data'].sum() and p['area'] might be a bit more confusing than what we'd

Can you briefly describe how it works?
Rather than being conservative with the clipping of the data-field, use the time if there is any after the peak. This is computed by checking if the next peak is far away enough that we can claim the bit of extra time that would be needed for the extra sample

There may still be a chance that there is another peak directly after another peak that would prevent us from filling the entire buffer. There are a couple of things we could do about that:

  • just ignore it - the chances that this happens is extremely small - especially for smaller peaks where this last sample might make any difference
  • add the data of that last sample that doesn't fit within the time interval to the second to last sample
  • rewrite tons of code to support overlapping peaks

@coveralls
Copy link

coveralls commented Oct 26, 2022

Coverage Status

Coverage: 92.763% (+0.6%) from 92.184% when pulling 457b4a3 on digitize_full_wf into 209735b on master.

@terliuk
Copy link
Contributor

terliuk commented Oct 26, 2022

I would vote for option 2 with adding remaining part of the waveform to the last sample is the most reasonable option. As typical difference is below few %, this should preserve all the timing quantities that we typically use (50% width, 90% width, 10-to-50% area time etc). But we have to be sure that we document it clearly somewhere (at least as a very clear comment in code).

On the long run, we can decouple the relation endtime=length*dt + time and to store dedicated end time or total duration of all samples, but this might require way more effort for testing, as it might "backfire" in some place.

@JoranAngevaare
Copy link
Contributor Author

JoranAngevaare commented Oct 26, 2022

I would vote for option 2 with adding remaining part of the waveform to the last sample is the most reasonable option. As typical difference is below few %, this should preserve all the timing quantities that we typically use (50% width, 90% width, 10-to-50% area time etc). But we have to be sure that we document it clearly somewhere (at least as a very clear comment in code).

Yes agree, it's all quite hypothetical but we got to be mindfull of future PHD students banging their heads against the wall why things don't sum up. I just added the functionality. I'll try and add some more tests too to illustrate the point.

On the long run, we can decouple the relation endtime=length*dt + time and to store dedicated end time or total duration of all samples, but this might require way more effort for testing, as it might "backfire" in some place.

Not sure if we would really gain much, We inherently loose information from downsampling, but that does not mean that the information lost is actually useful (of course depending on the application, for strax it's now mostly computing area deciles which relevant parameters aren't actually affected by any the changes made here).

@JoranAngevaare JoranAngevaare marked this pull request as draft October 26, 2022 11:33
@JoranAngevaare
Copy link
Contributor Author

FYI, if someone wants to pick this up, I did not figure out where but something is still a bit fishy about this PR as it creates very different Sum WFs (whereas it should be a very minor change). I'll close this for now, people might find it back in #704

@JoranAngevaare JoranAngevaare deleted the digitize_full_wf branch June 1, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants