rustrict

rustrict is a sophisticated profanity filter for Rust.

Features

Multiple types (profane, offensive, sexual, mean, spam)
Multiple levels (mild, moderate, severe)
Resistant to evasion
- Alternative spellings (like "fck")
- Repeated characters (like "craaaap")
- Confusable characters (like 'ᑭ' vs 'P')
- Spacing (like "c r_a-p")
- Accents (like "pÓöp")
- Self-censoring (like "f*ck")
- Battle-tested in Mk48.io
Resistant to false positives
- One word (like "assassin")
- Two words (like "push it")
Flexible
- Censor and/or analyze
- Input &str or Iterator<Type = char>
- Plenty of options
Performant
- O(n) analysis and censoring
- No regex (uses custom radix trie)
- 4 MB/s in release mode
- 150 KB/s in debug mode

Limitations

English only
Censoring removes diacritics (accents)
Doesn't understand context
Cannot add words at runtime

Usage

Strings (`&str`)

use rustrict::CensorStr;

let censored: String = "hello crap".censor();
let inappropriate: bool = "f u c k".is_inappropriate();

assert_eq!(censored, "hello c***");
assert!(inappropriate);

Iterators (`Iterator<Type = char>`)

use rustrict::CensorIter;

let censored: String = "hello crap".chars().censor().collect();

assert_eq!(censored, "hello c***")

Advanced

By constructing a Censor, one can avoid scanning text multiple times to get a censored String and/or answer multiple is queries. This also opens up more customization options (defaults are below).

use rustrict::{Censor, Type};

let (censored, analysis) = Censor::from_str("123 Crap")
    .with_censor_threshold(Type::INAPPROPRIATE)
    .with_censor_first_character_threshold(Type::OFFENSIVE & Type::SEVERE)
    .with_ignore_false_positives(false)
    .with_ignore_self_censoring(false)
    .with_censor_replacement('*')
    .censor_and_analyze();

assert_eq!(censored, "123 C***");
assert!(analysis.is(Type::INAPPROPRIATE));
assert!(analysis.isnt(Type::PROFANE & Type::SEVERE | Type::SEXUAL));

Comparison

To compare filters, the first 100,000 items of this list is used as a dataset. Positive accuracy is the percentage of profanity detected as profanity. Negative accuracy is the percentage of clean text detected as clean.

Crate	Accuracy	Positive Accuracy	Negative Accuracy	Time
rustrict	90.56%	91.41%	90.35%	7s
censor	76.16%	72.76%	77.01%	23s

Development

If you make an adjustment that would affect false positives, you will need to run false_positive_finder:

Run ./download.sh to get the required word lists.
Run cargo run --bin false_positive_finder --release --all-features

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
examples		examples
fuzz		fuzz
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-MIT		LICENSE-MIT
README.md		README.md
downloads.sh		downloads.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rustrict

Features

Limitations

Usage

Strings (`&str`)

Iterators (`Iterator<Type = char>`)

Advanced

Comparison

Development

License

Contribution

About

Uh oh!

Releases

Packages

Languages

License

dmartin/rustrict

Folders and files

Latest commit

History

Repository files navigation

rustrict

Features

Limitations

Usage

Strings (&str)

Iterators (Iterator<Type = char>)

Advanced

Comparison

Development

License

Contribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Strings (`&str`)

Iterators (`Iterator<Type = char>`)

Packages