Skip to content

Fix review issues: async safety, schemas, model relations, and storage#2

Merged
Iventyk merged 1 commit into
developfrom
codex/review-project-for-necessary-fixes
Apr 27, 2026
Merged

Fix review issues: async safety, schemas, model relations, and storage#2
Iventyk merged 1 commit into
developfrom
codex/review-project-for-necessary-fixes

Conversation

@Iventyk
Copy link
Copy Markdown
Owner

@Iventyk Iventyk commented Apr 27, 2026

Motivation

  • Address several review findings: missing bidirectional relationship between Document and ChunkEmbedding, duplicated schema types/fields, and fragile filename parsing.
  • Eliminate synchronous blocking calls and unsafe global state that can block the event loop or introduce race conditions in Celery workers.
  • Make file storage scalable for large uploads and ensure API serialization/validation behaves predictably for aliased fields.

Description

  • Added a bidirectional SQLAlchemy relationship between Document and ChunkEmbedding and removed the duplicated source_document column usage so chunk rows use chunk.document.name instead of a separate string field.
  • Consolidated duplicate error response schemas into a single ErrorResponse and re-exported it for ask schema compatibility.
  • Enabled Pydantic model population by name for DocumentListItem with populate_by_name=True so the chunks alias maps to chunks_count reliably during (de)serialization.
  • Replaced per-endpoint manual service construction with FastAPI Depends provider helpers so routers use Depends(get_document_service) / Depends(get_question_answer_service) instead of calling Service(db) directly.
  • Hardened document upload by parsing filename suffix with Path(...).suffix, adding enqueue error handling around process_document_task.delay(...), and marking documents failed with error_message on enqueue errors.
  • Removed event-loop blocking synchronous calls by offloading embedding/LLM/loader/ splitter calls to asyncio.to_thread(...) and implemented an async _agenerate in the demo EchoChatModel.
  • Rewrote FileStorageService.save to stream writes in 1MB chunks via thread offload to avoid reading entire uploads into memory, and made remove() asynchronous.
  • Protected the Celery worker event loop global (_worker_loop) with a threading.Lock to avoid race conditions during initialization and shutdown.

Testing

  • Ran formatter: black on modified files (reformatted src/services/storage.py) and completed without blocking.
  • Verified bytecode compilation with python -m compileall src, which completed successfully.
  • Linted the code with flake8 src, which returned clean results after fixes.

Codex Task

@Iventyk Iventyk merged commit fb918b1 into develop Apr 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant