Problem
Two functions in langextract/io.py create tqdm progress bars but fail to close them when an exception occurs mid-iteration, causing terminal state corruption and resource leaks.
Location 1: save_annotated_documents (lines ~113-134)
progress_bar = progress.create_save_progress_bar(
output_path=str(output_file), disable=not show_progress
)
with open(output_file, 'w', encoding='utf-8') as f:
for adoc in annotated_documents: # <-- generator can raise here
...
progress_bar.update(1)
progress_bar.close() # <-- unreachable if the for-loop raises
If annotated_documents is a generator that raises (e.g., InferenceOutputError from the LLM call), the with open(...) context manager correctly closes the file, but progress_bar.close() is never called because it sits unconditionally after the with block with no try/finally.
Location 2: download_text_from_url (lines ~295-345)
if show_progress and total_size > 0:
progress_bar = progress.create_download_progress_bar(...)
for chunk in response.iter_content(chunk_size=chunk_size): # <-- network error here
if chunk:
chunks.append(chunk)
progress_bar.update(len(chunk))
progress_bar.close() # <-- unreachable on network exception
except requests.RequestException as e:
raise requests.RequestException(...) from e # <-- progress_bar still open
If response.iter_content() raises a requests.RequestException (connection reset mid-download), the except block re-raises without calling progress_bar.close().
Impact
- Terminal state corruption: tqdm modifies terminal cursor state and escape sequences; without
.close(), the terminal may be left in a broken state (no newline, cursor hidden, partial progress line)
- Resource leak: tqdm holds file descriptor references and I/O streams that are not released until garbage collection
- Common scenario: Any interrupted LLM extraction pipeline or flaky network download will trigger this
Proposed Fix
Wrap both in try/finally:
# save_annotated_documents
progress_bar = progress.create_save_progress_bar(...)
try:
with open(output_file, 'w', encoding='utf-8') as f:
for adoc in annotated_documents:
...
progress_bar.update(1)
finally:
progress_bar.close()
# download_text_from_url
progress_bar = progress.create_download_progress_bar(...)
try:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
chunks.append(chunk)
progress_bar.update(len(chunk))
finally:
progress_bar.close()
Happy to submit a PR for both fixes.
Problem
Two functions in
langextract/io.pycreate tqdm progress bars but fail to close them when an exception occurs mid-iteration, causing terminal state corruption and resource leaks.Location 1:
save_annotated_documents(lines ~113-134)If
annotated_documentsis a generator that raises (e.g.,InferenceOutputErrorfrom the LLM call), thewith open(...)context manager correctly closes the file, butprogress_bar.close()is never called because it sits unconditionally after thewithblock with notry/finally.Location 2:
download_text_from_url(lines ~295-345)If
response.iter_content()raises arequests.RequestException(connection reset mid-download), theexceptblock re-raises without callingprogress_bar.close().Impact
.close(), the terminal may be left in a broken state (no newline, cursor hidden, partial progress line)Proposed Fix
Wrap both in
try/finally:Happy to submit a PR for both fixes.