Skip to content

Conversation

benjaminking
Copy link
Collaborator

@benjaminking benjaminking commented Sep 23, 2025

This PR improves the accuracy of quote convention detection for projects that are not consistent with their quotation marks by ignoring quotation marks that occur infrequently. This is response to many in-progress translation projects have been having no quote convention detected.


This change is Reviewable

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ddaspit reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)


machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py line 14 at r1 (raw file):

class QuotationMarkCounter:

Can you just use Counter and move this threshold to the PreliminaryQuotationMarkAnalyzer?

Copy link
Collaborator Author

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py line 14 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Can you just use Counter and move this threshold to the PreliminaryQuotationMarkAnalyzer?

Unfortunately, the total() method for Counter (which I would need to compute proportions) was only added in Python 3.10. I could keep track of the total separately, but it seems cleaner for now to use a separate class.

Plus there is something I like about having the proportion logic and threshold decoupled and encapsulated.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @benjaminking)


machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py line 14 at r1 (raw file):

Previously, benjaminking (Ben King) wrote…

Unfortunately, the total() method for Counter (which I would need to compute proportions) was only added in Python 3.10. I could keep track of the total separately, but it seems cleaner for now to use a separate class.

Plus there is something I like about having the proportion logic and threshold decoupled and encapsulated.

Sounds good. I wondered about total().

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @benjaminking)

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.92%. Comparing base (d93dacd) to head (3b0f814).

Files with missing lines Patch % Lines
...on_analysis/preliminary_quotation_mark_analyzer.py 95.65% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #235   +/-   ##
=======================================
  Coverage   90.91%   90.92%           
=======================================
  Files         337      337           
  Lines       21519    21542   +23     
=======================================
+ Hits        19564    19586   +22     
- Misses       1955     1956    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@benjaminking benjaminking merged commit 7021586 into main Sep 27, 2025
14 checks passed
@benjaminking benjaminking deleted the improve_quote_convention_detection branch September 27, 2025 14:26
Enkidu93 added a commit to sillsdev/machine that referenced this pull request Sep 29, 2025
Enkidu93 added a commit to sillsdev/machine that referenced this pull request Sep 30, 2025
Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 1, 2025
Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants