test: Add missing test coverage for chunking module edge cases

Problem

  tests/chunking_test.py has no coverage for several code paths in chunking.py:

  - Guard clauses in SentenceIterator.__init__ (IndexError for negative or out-of-range curr_token_pos) are never exercised
  - create_token_interval, get_token_interval_text, and get_char_interval all have ValueError / TokenUtilError error paths with zero tests
  - ChunkIterator constructor edge cases (both text and document being None, text=None falling back to document.text, empty TokenizedText triggering re-tokenization) are untested
  - TextChunk.chunk_text and TextChunk.char_interval raise ValueError when document is None — untested
  - TextChunk.sanitized_chunk_text is entirely untested
  - The lazy caching of _chunk_text and _char_interval is never verified
  - make_batches_of_textchunk is only tested with one specific batch size
  - The broken_sentence flag reset — which controls whether subsequent sentences are merged after a mid-sentence chunk break — has no dedicated test

  Proposed fix

  Add tests covering all of the above in tests/chunking_test.py, organized into focused test classes per concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Add missing test coverage for chunking module edge cases #430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test: Add missing test coverage for chunking module edge cases #430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions