rustrict
is a sophisticated profanity filter for Rust.
- Multiple types (profane, offensive, sexual, mean, spam)
- Multiple levels (mild, moderate, severe)
- Resistant to evasion
- Alternative spellings (like "fck")
- Repeated characters (like "craaaap")
- Confusable characters (like 'ᑭ' vs 'P')
- Spacing (like "c r_a-p")
- Accents (like "pÓöp")
- Self-censoring (like "f*ck")
- Battle-tested in Mk48.io
- Resistant to false positives
- One word (like "assassin")
- Two words (like "push it")
- Flexible
- Censor and/or analyze
- Input
&str
orIterator<Type = char>
- Plenty of options
- Performant
- O(n) analysis and censoring
- No
regex
(uses custom radix trie) - 4 MB/s in
release
mode - 150 KB/s in
debug
mode
- English only
- Censoring removes diacritics (accents)
- Doesn't understand context
- Cannot add words at runtime
use rustrict::CensorStr;
let censored: String = "hello crap".censor();
let inappropriate: bool = "f u c k".is_inappropriate();
assert_eq!(censored, "hello c***");
assert!(inappropriate);
use rustrict::CensorIter;
let censored: String = "hello crap".chars().censor().collect();
assert_eq!(censored, "hello c***")
By constructing a Censor
, one can avoid scanning text multiple times to get a censored String
and/or
answer multiple is
queries. This also opens up more customization options (defaults are below).
use rustrict::{Censor, Type};
let (censored, analysis) = Censor::from_str("123 Crap")
.with_censor_threshold(Type::INAPPROPRIATE)
.with_censor_first_character_threshold(Type::OFFENSIVE & Type::SEVERE)
.with_ignore_false_positives(false)
.with_ignore_self_censoring(false)
.with_censor_replacement('*')
.censor_and_analyze();
assert_eq!(censored, "123 C***");
assert!(analysis.is(Type::INAPPROPRIATE));
assert!(analysis.isnt(Type::PROFANE & Type::SEVERE | Type::SEXUAL));
To compare filters, the first 100,000 items of this list is used as a dataset. Positive accuracy is the percentage of profanity detected as profanity. Negative accuracy is the percentage of clean text detected as clean.
Crate | Accuracy | Positive Accuracy | Negative Accuracy | Time |
---|---|---|---|---|
rustrict | 90.56% | 91.41% | 90.35% | 7s |
censor | 76.16% | 72.76% | 77.01% | 23s |
If you make an adjustment that would affect false positives, you will need to run false_positive_finder
:
- Run
./download.sh
to get the required word lists. - Run
cargo run --bin false_positive_finder --release --all-features
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.