blingfire pypi package v0.1.3
SergeiAlonichau
released this
25 Jun 17:27
·
172 commits
to master
since this release
Four tokenization algorithms supported: patterns, word-piece, unigram lm, bpe. Added space normalization api, Added a few more popular models, added unigram lm tokenization models trained on uniformly represented ~84 languages from wikimatrix set. Bug fixes, parity fixes.