Problem
tests/chunking_test.py has no coverage for several code paths in chunking.py:
- Guard clauses in SentenceIterator.init (IndexError for negative or out-of-range curr_token_pos) are never exercised
- create_token_interval, get_token_interval_text, and get_char_interval all have ValueError / TokenUtilError error paths with zero tests
- ChunkIterator constructor edge cases (both text and document being None, text=None falling back to document.text, empty TokenizedText triggering re-tokenization) are untested
- TextChunk.chunk_text and TextChunk.char_interval raise ValueError when document is None — untested
- TextChunk.sanitized_chunk_text is entirely untested
- The lazy caching of _chunk_text and _char_interval is never verified
- make_batches_of_textchunk is only tested with one specific batch size
- The broken_sentence flag reset — which controls whether subsequent sentences are merged after a mid-sentence chunk break — has no dedicated test
Proposed fix
Add tests covering all of the above in tests/chunking_test.py, organized into focused test classes per concern.
Problem
tests/chunking_test.py has no coverage for several code paths in chunking.py:
Proposed fix
Add tests covering all of the above in tests/chunking_test.py, organized into focused test classes per concern.