Skip to content

Bug: progress_bar resource leak — not closed on exception in save_annotated_documents and download_text_from_url #401

@marlonbarreto-git

Description

@marlonbarreto-git

Problem

Two functions in langextract/io.py create tqdm progress bars but fail to close them when an exception occurs mid-iteration, causing terminal state corruption and resource leaks.

Location 1: save_annotated_documents (lines ~113-134)

progress_bar = progress.create_save_progress_bar(
    output_path=str(output_file), disable=not show_progress
)

with open(output_file, 'w', encoding='utf-8') as f:
    for adoc in annotated_documents:   # <-- generator can raise here
        ...
        progress_bar.update(1)

progress_bar.close()   # <-- unreachable if the for-loop raises

If annotated_documents is a generator that raises (e.g., InferenceOutputError from the LLM call), the with open(...) context manager correctly closes the file, but progress_bar.close() is never called because it sits unconditionally after the with block with no try/finally.

Location 2: download_text_from_url (lines ~295-345)

if show_progress and total_size > 0:
    progress_bar = progress.create_download_progress_bar(...)

    for chunk in response.iter_content(chunk_size=chunk_size):  # <-- network error here
        if chunk:
            chunks.append(chunk)
            progress_bar.update(len(chunk))

    progress_bar.close()   # <-- unreachable on network exception

except requests.RequestException as e:
    raise requests.RequestException(...) from e   # <-- progress_bar still open

If response.iter_content() raises a requests.RequestException (connection reset mid-download), the except block re-raises without calling progress_bar.close().

Impact

  • Terminal state corruption: tqdm modifies terminal cursor state and escape sequences; without .close(), the terminal may be left in a broken state (no newline, cursor hidden, partial progress line)
  • Resource leak: tqdm holds file descriptor references and I/O streams that are not released until garbage collection
  • Common scenario: Any interrupted LLM extraction pipeline or flaky network download will trigger this

Proposed Fix

Wrap both in try/finally:

# save_annotated_documents
progress_bar = progress.create_save_progress_bar(...)
try:
    with open(output_file, 'w', encoding='utf-8') as f:
        for adoc in annotated_documents:
            ...
            progress_bar.update(1)
finally:
    progress_bar.close()

# download_text_from_url
progress_bar = progress.create_download_progress_bar(...)
try:
    for chunk in response.iter_content(chunk_size=chunk_size):
        if chunk:
            chunks.append(chunk)
            progress_bar.update(len(chunk))
finally:
    progress_bar.close()

Happy to submit a PR for both fixes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions