-
-
Notifications
You must be signed in to change notification settings - Fork 3
Improve quote convention detection accuracy #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py
line 14 at r1 (raw file):
class QuotationMarkCounter:
Can you just use Counter
and move this threshold to the PreliminaryQuotationMarkAnalyzer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py
line 14 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Can you just use
Counter
and move this threshold to thePreliminaryQuotationMarkAnalyzer
?
Unfortunately, the total()
method for Counter
(which I would need to compute proportions) was only added in Python 3.10. I could keep track of the total separately, but it seems cleaner for now to use a separate class.
Plus there is something I like about having the proportion logic and threshold decoupled and encapsulated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @benjaminking)
machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py
line 14 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Unfortunately, the
total()
method forCounter
(which I would need to compute proportions) was only added in Python 3.10. I could keep track of the total separately, but it seems cleaner for now to use a separate class.Plus there is something I like about having the proportion logic and threshold decoupled and encapsulated.
Sounds good. I wondered about total()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @benjaminking)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #235 +/- ##
=======================================
Coverage 90.91% 90.92%
=======================================
Files 337 337
Lines 21519 21542 +23
=======================================
+ Hits 19564 19586 +22
- Misses 1955 1956 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This PR improves the accuracy of quote convention detection for projects that are not consistent with their quotation marks by ignoring quotation marks that occur infrequently. This is response to many in-progress translation projects have been having no quote convention detected.
This change is