Skip to content

Use safe ratio helper in regularity#49

Closed
rexmhall09 wants to merge 3 commits intolingpy:masterfrom
rexmhall09:master
Closed

Use safe ratio helper in regularity#49
rexmhall09 wants to merge 3 commits intolingpy:masterfrom
rexmhall09:master

Conversation

@rexmhall09
Copy link

Add _safe_round_ratio to avoid division-by-zero when computing proportion fields in regularity(). Replace direct round(divisions) with the helper for pattern, full_proportion and word ratios so they return 0.0 if the denominator is zero. Extend tests to cover cases that produce zeroed outputs (different min_refs and sound_classes), ensuring the function returns stable 0.0 proportions instead of raising errors or producing NaN.

Add _safe_round_ratio to avoid division-by-zero when computing proportion fields in regularity(). Replace direct round(divisions) with the helper for pattern, full_proportion and word ratios so they return 0.0 if the denominator is zero. Extend tests to cover cases that produce zeroed outputs (different min_refs and sound_classes), ensuring the function returns stable 0.0 proportions instead of raising errors or producing NaN.
@LinguList
Copy link
Contributor

Hm. I do not know if I want this to be caught. The result would be that you have zero patterns in your data, but without patterns, you should not check for regularity either. So in which real-data scenario have you encountered the problem to suggest the change here, and don't you think it is better to ask people to be aware of the problem that they cannot measure regularity of their data in the absence of patterns in general?

Add runtime warnings in regularity() for edge cases where no patterns or no cognate sets meet min_refs are found, clarifying that regularity proportions are set to 0.0. Import warnings in the module and update tests to assert the appropriate RuntimeWarning messages (use pytest.warns and import warns).
@rexmhall09
Copy link
Author

rexmhall09 commented Mar 11, 2026

@LinguList I see this as a robustness issue for downstream workflows. If lots of data is being run and the error happens once, it could stop the whole process. For that reason, its best to let it run, but give a warning. What do you think?

@LinguList
Copy link
Contributor

Not convinced. I rather prefer this to throw an error, so people realize they use data they should NOT use. I had too many of those cases in the past, where the real errors were not detected, since we were tolerant of input data with problems.

@LinguList
Copy link
Contributor

The workflows in LingRex support active linguistic analyses, where users should also active check the results and engage with the data. I am not against throwing errors earlier in the workflow, but I am against catching them. If a workflow strikes on the zero division, the data in that workflow have a problem and should not be used for this process, so the people using the problematic data can also just check the data properly beforehand. In fact, I am working on a test in that regard, that tells users much earlier where their data has problems, so they should not use it. And we even offer data that is in part at least "good" to use here: https://codeberg.org/lexibank/lexibench

Stop issuing runtime warnings in regularity and instead raise ValueError for three no-data edge cases: when no patterns are detected, when no eligible alignment sites are found (full_proportion == 0), and when no words meet the min_refs threshold. Remove the _safe_round_ratio helper and the warnings import, and replace its uses with inline round calls. Update tests to expect exceptions (raises) and remove warns usage accordingly.
@rexmhall09
Copy link
Author

@LinguList Thanks for the clarification. I’ve pushed an update to the PR that removes the safe-ratio behavior and instead checks for the zero-denominator cases explicitly. If no patterns, sites, or eligible words are found, the function now raises a clear ValueError explaining that regularity cannot be computed, rather than producing a generic ZeroDivisionError. If you think it is better to leave as is close this PR.

@LinguList LinguList closed this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants