Releases: segment-any-text/wtpsplit
Releases · segment-any-text/wtpsplit
Release 2.2.1
What's Changed
- add compat with hf-hub >1 by @markus583 in #176
- Add long description to pypi info
- [DEV] Update release.sh for toolchain compat
Full Changelog: 2.2.0...2.2.1
Release 2.2.0
What's Changed
New Features
- Length-constrained segmentation (#164 by @harikesavan): Control segment lengths with
min_lengthandmax_lengthparameters. Uses Viterbi (optimal) or greedy algorithms with configurable priors (uniform,gaussian,lognormal,clipped_polynomial) and language-aware defaults. Useful for embedding pipelines, storage limits, or any downstream task requiring fixed-size chunks. See docs/LENGTH_CONSTRAINTS.md.
Bug Fixes & Improvements
- Transformers ≥5 compatibility (#172 by @markus583): Full support for
transformersv5 while remaining backward-compatible with v4. Also removes theadapterslibrary as a hard inference dependency - LoRA weights can now be merged without it installed. - Auto-detect
num_labelsfor LoRA on sm models (#170 by @markus583): Fixes loading LoRA adapters trained withnum_labels > 1onto-smmodels, which previously caused a shape mismatch error (#168).
Other
- Minimum supported Python version: 3.9
- Python 3.13 added to CI matrix
- CI now runs new length-constrained segmentation tests
New Contributors
- @harikesavan made their first contribution in #164
Full Changelog: 2.1.7...2.2.0
Release 2.1.7
- Suppress annoying warnings of upstream dependencies in some Python version
- Add possibility to not merge LoRA weights (still defaults to merging for efficiency reasons)
Full Changelog: 2.1.6...2.1.7
Release 2.1.6
Release 2.1.5
Changelog
Release 2.1.4
- Introduce optional hat weighting by @lsorber
- Clarify LoRA adaptation
- Clarify
treat_newline_as_space: renamed tosplit_on_input_newlines.treat_newline_as_spacewill be deprecated in a future release.
Release 2.1.2
- Fixes #142: AssertionError when string is only comprised of newlines, whitespace, or if its an empty strong.
Release 2.1.1
- Change default behaviour for newlines in SaT.split.
- Now, while the model ignores them, they will used to split as simple post-processing.
- Small bugfixes for LoRA training
- Update Readme for advanced usage
Release 2.1.0
- Adds ONNX support for SaT models.
- Including export scripts and an updated README.
- This results in 50% improved inference time on GPU.
Release 2.0.8
- Fix splitting of short sequences into individual characters (#127)