Skip to content

test: Add missing test coverage for chunking module edge cases #430

@vedssharma

Description

@vedssharma

Problem

tests/chunking_test.py has no coverage for several code paths in chunking.py:

  • Guard clauses in SentenceIterator.init (IndexError for negative or out-of-range curr_token_pos) are never exercised
  • create_token_interval, get_token_interval_text, and get_char_interval all have ValueError / TokenUtilError error paths with zero tests
  • ChunkIterator constructor edge cases (both text and document being None, text=None falling back to document.text, empty TokenizedText triggering re-tokenization) are untested
  • TextChunk.chunk_text and TextChunk.char_interval raise ValueError when document is None — untested
  • TextChunk.sanitized_chunk_text is entirely untested
  • The lazy caching of _chunk_text and _char_interval is never verified
  • make_batches_of_textchunk is only tested with one specific batch size
  • The broken_sentence flag reset — which controls whether subsequent sentences are merged after a mid-sentence chunk break — has no dedicated test

Proposed fix

Add tests covering all of the above in tests/chunking_test.py, organized into focused test classes per concern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions