fix: alignment span slicing drops characters when gaps present by thatbudakguy · Pull Request #409 · direct-phonology/dphon

thatbudakguy · 2026-05-21T19:18:58Z

The span slicing in SmithWatermanAligner used len(cu)/len(cv) which
includes gap characters, causing the span to include tokens beyond the
aligned region. This led to mismatched character display and phoneme
transcription.

Changes:

align.py: count only non-gap elements when slicing spans
align.py: skip gap chars in edge trimming to prevent over-trimming
console.py: use alignment length for _mark_span loop range
console.py: add bounds check before accessing span/other pointers

The span slicing in SmithWatermanAligner used len(cu)/len(cv) which includes gap characters, causing the span to include tokens beyond the aligned region. This led to mismatched character display and phoneme transcription. Changes: - align.py: count only non-gap elements when slicing spans - align.py: skip gap chars in edge trimming to prevent over-trimming - console.py: use alignment length for _mark_span loop range - console.py: add bounds check before accessing span/other pointers

GDRom and others added 3 commits May 21, 2026 12:18

Add a test to catch alignment issues

d66a3a9

Ensure global state isn't shared between tests

3477ee0

thatbudakguy merged commit 2d6dc52 into main May 30, 2026
3 checks passed

thatbudakguy deleted the alignment-fix branch May 30, 2026 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: alignment span slicing drops characters when gaps present#409

fix: alignment span slicing drops characters when gaps present#409
thatbudakguy merged 3 commits into
mainfrom
alignment-fix

thatbudakguy commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thatbudakguy commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants