Skip to content

Feature/voice profile and refactors#58

Merged
ericksonlopes merged 5 commits into
mainfrom
feature/voice-profile-and-refactors
Apr 7, 2026
Merged

Feature/voice profile and refactors#58
ericksonlopes merged 5 commits into
mainfrom
feature/voice-profile-and-refactors

Conversation

@ericksonlopes

Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings April 7, 2026 18:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds UX + API support for “voice profile training in progress” while performing broad formatting/refactor cleanups across backend services, repositories, routers, and the test suite.

Changes:

  • Frontend: track per-speaker voice-training “processing” state and update UI labels/translations accordingly.
  • Backend: harden voice profile upload handling (filename sanitization) and minor API/service refactors.
  • Repo-wide refactor: extensive line-wrapping/formatting normalization across Python + tests + Alembic migrations.

Reviewed changes

Copilot reviewed 156 out of 156 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/presentation/api/test_dependencies.py Formatting-only refactor in tests (patch blocks).
tests/presentation/api/routes/test_subject_router.py Formatting-only refactor in tests.
tests/presentation/api/routes/test_source_router.py Formatting-only refactor in tests.
tests/presentation/api/routes/test_settings_router.py Formatting-only refactor in tests.
tests/presentation/api/routes/test_ingest_router.py Formatting-only refactor in tests.
tests/presentation/api/routes/test_ingest_router_file.py Formatting-only refactor in tests.
tests/presentation/api/routes/test_chunk_router.py Formatting-only refactor in tests.
tests/presentation/api/routes/test_auth_router.py Formatting-only refactor in tests.
tests/presentation/api/routes/test_audio_diarization_router.py Formatting-only refactor in tests.
tests/infrastructure/services/test_youtube_audio_downloader.py Formatting-only refactor in tests.
tests/infrastructure/services/test_whisperx_audio_diarizer.py Formatting-only refactor in tests.
tests/infrastructure/services/test_voice_profile_service.py Formatting-only refactor in tests.
tests/infrastructure/services/test_text_splitter_service.py Formatting-only refactor in tests.
tests/infrastructure/services/test_re_rank_service.py Formatting-only refactor in tests.
tests/infrastructure/services/test_pyannote_voice_recognizer.py Formatting-only refactor in tests.
tests/infrastructure/services/test_model_loader_service.py Formatting-only refactor in tests.
tests/infrastructure/services/test_knowledge_subject_service.py Formatting-only refactor in tests.
tests/infrastructure/services/test_ingestion_job_service.py Formatting-only refactor in tests.
tests/infrastructure/services/test_content_source_service.py Formatting-only refactor in tests.
tests/infrastructure/services/test_chunk_index_service.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_weaviate_vector.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_weaviate_chunk_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_chunk_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_chunk_repository_extended.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/qdrant/test_chunk_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/faiss/test_faiss_chunk_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/faiss/test_chunk_repository_extended.py Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/chroma/test_chroma_chunk_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_repos_services.py Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_knowledge_subject_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_ingestion_job_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_diarization_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_content_source_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_chunk_index_repository.py Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_additional_coverage.py Formatting-only refactor in tests.
tests/infrastructure/loggers/test_std_logger.py Formatting-only refactor in tests.
tests/infrastructure/extractors/test_youtube_extractor.py Formatting-only refactor in tests.
tests/infrastructure/extractors/test_docling_extractor.py Formatting-only refactor in tests.
tests/infrastructure/extractors/test_crawl4ai_extractor.py Formatting-only refactor in tests.
tests/config/test_settings.py Formatting-only refactor in tests.
tests/application/use_cases/test_youtube_throttling.py Formatting-only refactor in tests.
tests/application/use_cases/test_web_scraping_use_case.py Formatting-only refactor in tests.
tests/application/use_cases/test_search_use_case.py Formatting-only refactor in tests.
tests/application/use_cases/test_process_audio_diarization_pipeline.py Formatting-only refactor in tests.
tests/application/use_cases/test_knowledge_subject_use_case.py Formatting-only refactor in tests.
tests/application/use_cases/test_diarization_ingestion_use_case.py Formatting-only refactor in tests.
tests/application/use_cases/test_delete_diarization_use_case.py Formatting-only refactor in tests.
tests/application/use_cases/test_content_source_use_case.py Formatting-only refactor in tests.
tests/application/use_cases/test_auth_use_case.py Formatting-only refactor in tests.
tests/application/use_cases/test_audio_recognition_use_cases.py Formatting-only refactor in tests.
tests/application/test_workers.py Formatting-only refactor in tests.
tests/application/test_audio_diarization_workers.py Formatting-only refactor in tests.
src/presentation/api/routes/voice_profile_management_router.py Voice upload: filename sanitization + API signature change + event publish formatting.
src/presentation/api/routes/subject_router.py Logging call formatting only.
src/presentation/api/routes/settings_router.py Response construction formatting only.
src/presentation/api/routes/notification_router.py Executor call formatting only.
src/presentation/api/routes/job_router.py Job listing call/log formatting only.
src/presentation/api/routes/ingest_router.py Control-flow/log formatting only.
src/presentation/api/routes/chunk_router.py Chunk listing call formatting only.
src/presentation/api/routes/audio_diarization_and_recognition_router.py Dependency annotations + minor formatting.
src/infrastructure/utils/audio_utils.py Error string formatting only.
src/infrastructure/services/youtube_vector_service.py Signature/wrapping formatting only.
src/infrastructure/services/youtube_data_process_service.py Control-flow/list-comp formatting only.
src/infrastructure/services/whisperx_audio_diarizer.py Call-site formatting only.
src/infrastructure/services/voice_profile_service.py Formatting-only refactor.
src/infrastructure/services/text_splitter_service.py Formatting-only refactor.
src/infrastructure/services/task_queue_service.py Thread creation/log formatting only.
src/infrastructure/services/redis_task_queue_service.py Thread creation + cast formatting only.
src/infrastructure/services/re_rank_service.py List-comp formatting only.
src/infrastructure/services/pyannote_voice_recognizer.py Call formatting only.
src/infrastructure/services/model_loader_service.py Model load call formatting only.
src/infrastructure/services/knowledge_subject_service.py Formatting-only refactor.
src/infrastructure/services/ingestion_job_service.py Formatting-only refactor.
src/infrastructure/services/content_source_service.py Formatting-only refactor.
src/infrastructure/services/chunk_vector_service.py Log formatting only.
src/infrastructure/services/chunk_index_service.py Formatting-only refactor.
src/infrastructure/services/auth_service.py JWT decode formatting only.
src/infrastructure/repositories/vector/weaviate/weaviate_client.py Client creation/log formatting only.
src/infrastructure/repositories/vector/weaviate/chunk_repository.py Formatting-only refactor.
src/infrastructure/repositories/vector/qdrant/connector.py Warnings filter formatting only.
src/infrastructure/repositories/vector/qdrant/chunk_repository.py Formatting-only refactor.
src/infrastructure/repositories/vector/models/chunk_model.py Field declarations formatting only.
src/infrastructure/repositories/vector/chroma/chunk_repository.py Formatting-only refactor.
src/infrastructure/repositories/storage/storage.py Logging/delete formatting only.
src/infrastructure/repositories/sql/utils/utils.py Signature formatting only.
src/infrastructure/repositories/sql/models/voice_record.py Column declaration formatting only.
src/infrastructure/repositories/sql/models/user.py Column declaration formatting only.
src/infrastructure/repositories/sql/models/knowledge_subject.py Column/relationship formatting only.
src/infrastructure/repositories/sql/models/ingestion_job.py Column/relationship formatting only.
src/infrastructure/repositories/sql/models/diarization_record.py Column declaration formatting only.
src/infrastructure/repositories/sql/models/content_source.py Column declaration formatting only.
src/infrastructure/repositories/sql/models/chunk_index.py Column declaration formatting only.
src/infrastructure/repositories/sql/knowledge_subject_repository.py Logging formatting only.
src/infrastructure/repositories/sql/ingestion_job_repository.py Query/log formatting only.
src/infrastructure/repositories/sql/diarization_repository.py Query formatting only.
src/infrastructure/repositories/sql/content_source_repository.py Query/log formatting only.
src/infrastructure/repositories/sql/chunk_index_repository.py Query/log formatting only.
src/infrastructure/loggers/std_logger.py ContextVar + method signatures formatting only.
src/infrastructure/extractors/youtube_extractor.py Formatting-only refactor + proxy string wrap.
src/infrastructure/extractors/plain_text_extractor.py httpx client/log formatting only.
src/infrastructure/extractors/models/youtube_metadata_dto.py DTO field formatting only.
src/infrastructure/extractors/docling_extractor.py Stats-building formatting only.
src/infrastructure/extractors/crawl4ai_extractor.py Control-flow/list-comp formatting only.
src/domain/mappers/knowledge_subject_mapper.py List-comp formatting only.
src/domain/mappers/ingestion_job_mapper.py Cast/list-comp formatting only.
src/domain/mappers/content_source_mapper.py Cast formatting only.
src/domain/mappers/chunk_mapper.py Condition/wrapping formatting only.
src/domain/mappers/chunk_index_mapper.py Helper signature/wrapping formatting only.
src/domain/interfaces/repository/retriver_repository.py Abstract method signature formatting only.
src/domain/exception/youtube_exceptions.py String formatting only.
src/domain/entities/knowledge_subject_entity.py Field formatting only.
src/domain/entities/diarization.py Signature/list-comp formatting only.
src/domain/entities/content_source_entity.py Field formatting only.
src/domain/entities/chunk_entity.py Field formatting only.
src/config/validators.py Condition/signature formatting only.
src/application/use_cases/web_scraping_use_case.py UUID parse + call-site formatting only.
src/application/use_cases/search_use_case.py Logging/context formatting only.
src/application/use_cases/retrieve_processed_audio_history.py Signature formatting only.
src/application/use_cases/process_audio_diarization_pipeline.py Call-site formatting only.
src/application/use_cases/manage_voice_profiles.py Formatting-only refactor (includes voice train event publish).
src/application/use_cases/identify_speakers_in_processed_audio.py Path/build/log formatting only.
src/application/use_cases/file_ingestion_use_case.py Call-site/control-flow formatting only.
src/application/use_cases/diarization_ingestion_use_case.py Call-site formatting only.
src/application/use_cases/delete_diarization_use_case.py Logging formatting only.
src/application/use_cases/auth_use_case.py Signature formatting only.
main.py Router/task registration formatting only.
frontend/src/locales/pt-BR.json Add diarization.identification.processing_voice translation.
frontend/src/locales/en.json Add diarization.identification.processing_voice translation.
frontend/src/components/DiarizationView.tsx Handle voice-training completion events; avoid resetting UI step during identification/result; optimistic per-speaker “processing” state.
frontend/src/components/diarization/VoiceTrainingModal.tsx onTrained now passes trained name to parent.
frontend/src/components/diarization/types.ts Add optional trainingStatus to Speaker.
frontend/src/components/diarization/SpeakerIdentificationPanel.tsx Disable/label training button while a speaker is training; use new translation key.
alembic/versions/f120b614600a_add_diarizations_and_voices_tables.py Alembic formatting only.
alembic/versions/c48798b08031_add_voice_samples_table.py Alembic formatting only.
alembic/versions/c16fab000f02_add_user_table.py Alembic formatting only.
alembic/versions/bd01964d9b26_created_tables.py Alembic formatting only.
alembic/versions/946d88fe08b1_add_source_metadata_to_content_source.py Alembic formatting only.
alembic/versions/73f13c5ff10a_add_metadata_to_chunk_index.py Alembic formatting only.
alembic/versions/72f69987a221_rename_diarization_title_to_name.py Alembic formatting only.
alembic/versions/6e53bc32edfe_add_subject_id_to_diarization.py Alembic formatting only.
alembic/versions/5ff7984a3bcc_optimize_sql_models_indexes_and_audit.py Alembic formatting only.
alembic/versions/5736075a22d0_add_vector_store_type_to_job_and_chunk_.py Alembic formatting only.
alembic/versions/50420d500c2e_add_token_columns_to_content_source.py Alembic formatting only.
alembic/versions/4e8d4e04a288_add_external_source_to_ingestion_jobs.py Alembic formatting only.
alembic/versions/0ce7f69147eb_update_unique_constraint_on_content_.py Alembic formatting only.
alembic/versions/04e0f5f5f0af_add_status_message_and_error_message_to_.py Alembic formatting only.
Comments suppressed due to low confidence (2)

src/presentation/api/routes/voice_profile_management_router.py:89

  • The temp directory cleanup in the finally block can raise and mask the original error (e.g., os.rmdir(temp_dir) fails on Windows, or if the UploadFile handle is still open). Consider ensuring the upload file is closed and using a safer cleanup approach (e.g., shutil.rmtree(temp_dir, ignore_errors=True) and/or suppressing cleanup exceptions).
    frontend/src/components/DiarizationView.tsx:111
  • The SSE handler calls loadJobs(true) twice for diarization events when activeJob matches (once unconditionally and again to find the updated job). This causes two API fetches per event; consider awaiting a single loadJobs(true) call and reusing the returned list for both refresh + active-job update.
        if (lastEvent.type === 'diarization') {
            loadJobs(true);
            
            // If the updated job is the active one, refresh its details
            if (activeJob && lastEvent.id === activeJob.id) {
                loadJobs(true).then(updatedJobs => {
                    const updated = updatedJobs.find(j => j.id === activeJob.id);
                    if (updated && updated.status !== activeJob.status) {
                         // If user is already in identification/result, don't reset the
                         // local speakers state on transient status changes (e.g. voice
                         // training flipping the job to TRAINING → COMPLETED). Only
                         // open/reset the job when we were still waiting on the initial
                         // diarization run.
                         const inIdentification = step === 'identification' || step === 'result';
                         if (!inIdentification && (updated.status === 'awaiting_verification' || updated.status === 'completed' || updated.status === 'failed')) {
                            handleOpenJob(updated);
                        } else {
                            setActiveJob(updated);
                        }
                    }
                });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 61 to 68
async def upload_and_register_new_voice_profile(
event_bus: Annotated[IEventBus, Depends(get_event_bus)],
use_case: Annotated[
RegisterNewVoiceProfileUseCase, Depends(get_register_voice_profile_use_case)
],
use_case: Annotated[RegisterNewVoiceProfileUseCase, Depends(get_register_voice_profile_use_case)],
name: str = Form(...),
file: UploadFile = File(...),
file: UploadFile | None = File(None),
):
if not file.filename:
if not file or not file.filename or not file.filename.strip():
raise HTTPException(status_code=400, detail="No filename provided")

Copilot AI Apr 7, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/upload declares file as optional (UploadFile | None = File(None)), but the endpoint’s behavior requires a file. This makes the OpenAPI schema and validation inconsistent (missing file becomes a custom 400 instead of FastAPI’s standard 422). Consider keeping it required (File(...)) and only validating filename emptiness explicitly if needed.

Copilot uses AI. Check for mistakes.
@codecov

codecov Bot commented Apr 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.29412% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.86%. Comparing base (c0af155) to head (360a982).
⚠️ Report is 6 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/application/workers.py 47.36% 10 Missing ⚠️
src/infrastructure/extractors/youtube_extractor.py 53.84% 3 Missing and 3 partials ⚠️
...c/infrastructure/services/voice_profile_service.py 87.50% 3 Missing and 3 partials ⚠️
...e/repositories/vector/weaviate/chunk_repository.py 70.58% 5 Missing ⚠️
...on/use_cases/process_audio_diarization_pipeline.py 83.33% 3 Missing ⚠️
...rc/infrastructure/extractors/crawl4ai_extractor.py 57.14% 2 Missing and 1 partial ⚠️
...ructure/repositories/sql/chunk_index_repository.py 75.00% 3 Missing ⚠️
...rc/infrastructure/services/model_loader_service.py 0.00% 3 Missing ⚠️
.../use_cases/identify_speakers_in_processed_audio.py 33.33% 2 Missing ⚠️
src/application/use_cases/manage_voice_profiles.py 0.00% 2 Missing ⚠️
... and 16 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #58      +/-   ##
==========================================
+ Coverage   80.41%   80.86%   +0.44%     
==========================================
  Files          86       86              
  Lines        6721     6737      +16     
  Branches      773      777       +4     
==========================================
+ Hits         5405     5448      +43     
+ Misses       1071     1041      -30     
- Partials      245      248       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ericksonlopes ericksonlopes merged commit b293786 into main Apr 7, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants