Commit 68cce1a
committed
fix(dict): Remove only corrections if a space could be inserted as well
The typo dictionary words.csv previously contained
a bunch of problematic entries such as:
abouta,about
algorithmi,algorithm
attachen,attach
shouldbe,should
anumber,number
Which resulted in wrong automatic corrections if the following
spaces (indicated by ␣) were accidentally missed:
about␣a
algorithm␣i developed
attach␣en masse
should␣be
a␣number
Many of these entries were introduced by taking entries from the
codespell-dict and removing corrections containing spaces (since typos
currently doesn't support them), e.g the codespell dictionary contains:
abouta->about a, about,
shouldbe->should, should be,
This commit updates `tests/verify.rs` to automatically remove
corrections in the form of `{correction}{common_word},{correction}`
or `{common_word}{correction},{correction}`, where `{common_word}` is
one of the 1000 most frequent English words (except if `{correction}`
also ends/starts in `{common_word}`, since we still want to correct e.g.
"extrememe" to "extreme").
The top-1000-most-frequent-words.csv file was generated by running:
curl https://norvig.com/ngrams/count_1w.txt \
| head -n1024 \
| awk '{print $1;}' \
| grep -vE '^([^ia]|al|re)$' \
> top-1000-most-frequent-words.csv1 parent d4258b1 commit 68cce1a
File tree
4 files changed
+1229
-162
lines changed- crates/typos-dict
- assets
- src
- tests
4 files changed
+1229
-162
lines changed
0 commit comments