Feature/voice profile and refactors by ericksonlopes · Pull Request #58 · ericksonlopes/WhatYouSaid

ericksonlopes · 2026-04-07T18:23:30Z

No description provided.

…t case

…tests

Copilot

Pull request overview

Adds UX + API support for “voice profile training in progress” while performing broad formatting/refactor cleanups across backend services, repositories, routers, and the test suite.

Changes:

Frontend: track per-speaker voice-training “processing” state and update UI labels/translations accordingly.
Backend: harden voice profile upload handling (filename sanitization) and minor API/service refactors.
Repo-wide refactor: extensive line-wrapping/formatting normalization across Python + tests + Alembic migrations.

Reviewed changes

Copilot reviewed 156 out of 156 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/presentation/api/test_dependencies.py	Formatting-only refactor in tests (patch blocks).
tests/presentation/api/routes/test_subject_router.py	Formatting-only refactor in tests.
tests/presentation/api/routes/test_source_router.py	Formatting-only refactor in tests.
tests/presentation/api/routes/test_settings_router.py	Formatting-only refactor in tests.
tests/presentation/api/routes/test_ingest_router.py	Formatting-only refactor in tests.
tests/presentation/api/routes/test_ingest_router_file.py	Formatting-only refactor in tests.
tests/presentation/api/routes/test_chunk_router.py	Formatting-only refactor in tests.
tests/presentation/api/routes/test_auth_router.py	Formatting-only refactor in tests.
tests/presentation/api/routes/test_audio_diarization_router.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_youtube_audio_downloader.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_whisperx_audio_diarizer.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_voice_profile_service.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_text_splitter_service.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_re_rank_service.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_pyannote_voice_recognizer.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_model_loader_service.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_knowledge_subject_service.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_ingestion_job_service.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_content_source_service.py	Formatting-only refactor in tests.
tests/infrastructure/services/test_chunk_index_service.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_weaviate_vector.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_weaviate_chunk_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_chunk_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/weaviate/test_chunk_repository_extended.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/qdrant/test_chunk_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/faiss/test_faiss_chunk_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/faiss/test_chunk_repository_extended.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/vector/chroma/test_chroma_chunk_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_repos_services.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_knowledge_subject_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_ingestion_job_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_diarization_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_content_source_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_chunk_index_repository.py	Formatting-only refactor in tests.
tests/infrastructure/repositories/sql/test_additional_coverage.py	Formatting-only refactor in tests.
tests/infrastructure/loggers/test_std_logger.py	Formatting-only refactor in tests.
tests/infrastructure/extractors/test_youtube_extractor.py	Formatting-only refactor in tests.
tests/infrastructure/extractors/test_docling_extractor.py	Formatting-only refactor in tests.
tests/infrastructure/extractors/test_crawl4ai_extractor.py	Formatting-only refactor in tests.
tests/config/test_settings.py	Formatting-only refactor in tests.
tests/application/use_cases/test_youtube_throttling.py	Formatting-only refactor in tests.
tests/application/use_cases/test_web_scraping_use_case.py	Formatting-only refactor in tests.
tests/application/use_cases/test_search_use_case.py	Formatting-only refactor in tests.
tests/application/use_cases/test_process_audio_diarization_pipeline.py	Formatting-only refactor in tests.
tests/application/use_cases/test_knowledge_subject_use_case.py	Formatting-only refactor in tests.
tests/application/use_cases/test_diarization_ingestion_use_case.py	Formatting-only refactor in tests.
tests/application/use_cases/test_delete_diarization_use_case.py	Formatting-only refactor in tests.
tests/application/use_cases/test_content_source_use_case.py	Formatting-only refactor in tests.
tests/application/use_cases/test_auth_use_case.py	Formatting-only refactor in tests.
tests/application/use_cases/test_audio_recognition_use_cases.py	Formatting-only refactor in tests.
tests/application/test_workers.py	Formatting-only refactor in tests.
tests/application/test_audio_diarization_workers.py	Formatting-only refactor in tests.
src/presentation/api/routes/voice_profile_management_router.py	Voice upload: filename sanitization + API signature change + event publish formatting.
src/presentation/api/routes/subject_router.py	Logging call formatting only.
src/presentation/api/routes/settings_router.py	Response construction formatting only.
src/presentation/api/routes/notification_router.py	Executor call formatting only.
src/presentation/api/routes/job_router.py	Job listing call/log formatting only.
src/presentation/api/routes/ingest_router.py	Control-flow/log formatting only.
src/presentation/api/routes/chunk_router.py	Chunk listing call formatting only.
src/presentation/api/routes/audio_diarization_and_recognition_router.py	Dependency annotations + minor formatting.
src/infrastructure/utils/audio_utils.py	Error string formatting only.
src/infrastructure/services/youtube_vector_service.py	Signature/wrapping formatting only.
src/infrastructure/services/youtube_data_process_service.py	Control-flow/list-comp formatting only.
src/infrastructure/services/whisperx_audio_diarizer.py	Call-site formatting only.
src/infrastructure/services/voice_profile_service.py	Formatting-only refactor.
src/infrastructure/services/text_splitter_service.py	Formatting-only refactor.
src/infrastructure/services/task_queue_service.py	Thread creation/log formatting only.
src/infrastructure/services/redis_task_queue_service.py	Thread creation + cast formatting only.
src/infrastructure/services/re_rank_service.py	List-comp formatting only.
src/infrastructure/services/pyannote_voice_recognizer.py	Call formatting only.
src/infrastructure/services/model_loader_service.py	Model load call formatting only.
src/infrastructure/services/knowledge_subject_service.py	Formatting-only refactor.
src/infrastructure/services/ingestion_job_service.py	Formatting-only refactor.
src/infrastructure/services/content_source_service.py	Formatting-only refactor.
src/infrastructure/services/chunk_vector_service.py	Log formatting only.
src/infrastructure/services/chunk_index_service.py	Formatting-only refactor.
src/infrastructure/services/auth_service.py	JWT decode formatting only.
src/infrastructure/repositories/vector/weaviate/weaviate_client.py	Client creation/log formatting only.
src/infrastructure/repositories/vector/weaviate/chunk_repository.py	Formatting-only refactor.
src/infrastructure/repositories/vector/qdrant/connector.py	Warnings filter formatting only.
src/infrastructure/repositories/vector/qdrant/chunk_repository.py	Formatting-only refactor.
src/infrastructure/repositories/vector/models/chunk_model.py	Field declarations formatting only.
src/infrastructure/repositories/vector/chroma/chunk_repository.py	Formatting-only refactor.
src/infrastructure/repositories/storage/storage.py	Logging/delete formatting only.
src/infrastructure/repositories/sql/utils/utils.py	Signature formatting only.
src/infrastructure/repositories/sql/models/voice_record.py	Column declaration formatting only.
src/infrastructure/repositories/sql/models/user.py	Column declaration formatting only.
src/infrastructure/repositories/sql/models/knowledge_subject.py	Column/relationship formatting only.
src/infrastructure/repositories/sql/models/ingestion_job.py	Column/relationship formatting only.
src/infrastructure/repositories/sql/models/diarization_record.py	Column declaration formatting only.
src/infrastructure/repositories/sql/models/content_source.py	Column declaration formatting only.
src/infrastructure/repositories/sql/models/chunk_index.py	Column declaration formatting only.
src/infrastructure/repositories/sql/knowledge_subject_repository.py	Logging formatting only.
src/infrastructure/repositories/sql/ingestion_job_repository.py	Query/log formatting only.
src/infrastructure/repositories/sql/diarization_repository.py	Query formatting only.
src/infrastructure/repositories/sql/content_source_repository.py	Query/log formatting only.
src/infrastructure/repositories/sql/chunk_index_repository.py	Query/log formatting only.
src/infrastructure/loggers/std_logger.py	ContextVar + method signatures formatting only.
src/infrastructure/extractors/youtube_extractor.py	Formatting-only refactor + proxy string wrap.
src/infrastructure/extractors/plain_text_extractor.py	httpx client/log formatting only.
src/infrastructure/extractors/models/youtube_metadata_dto.py	DTO field formatting only.
src/infrastructure/extractors/docling_extractor.py	Stats-building formatting only.
src/infrastructure/extractors/crawl4ai_extractor.py	Control-flow/list-comp formatting only.
src/domain/mappers/knowledge_subject_mapper.py	List-comp formatting only.
src/domain/mappers/ingestion_job_mapper.py	Cast/list-comp formatting only.
src/domain/mappers/content_source_mapper.py	Cast formatting only.
src/domain/mappers/chunk_mapper.py	Condition/wrapping formatting only.
src/domain/mappers/chunk_index_mapper.py	Helper signature/wrapping formatting only.
src/domain/interfaces/repository/retriver_repository.py	Abstract method signature formatting only.
src/domain/exception/youtube_exceptions.py	String formatting only.
src/domain/entities/knowledge_subject_entity.py	Field formatting only.
src/domain/entities/diarization.py	Signature/list-comp formatting only.
src/domain/entities/content_source_entity.py	Field formatting only.
src/domain/entities/chunk_entity.py	Field formatting only.
src/config/validators.py	Condition/signature formatting only.
src/application/use_cases/web_scraping_use_case.py	UUID parse + call-site formatting only.
src/application/use_cases/search_use_case.py	Logging/context formatting only.
src/application/use_cases/retrieve_processed_audio_history.py	Signature formatting only.
src/application/use_cases/process_audio_diarization_pipeline.py	Call-site formatting only.
src/application/use_cases/manage_voice_profiles.py	Formatting-only refactor (includes voice train event publish).
src/application/use_cases/identify_speakers_in_processed_audio.py	Path/build/log formatting only.
src/application/use_cases/file_ingestion_use_case.py	Call-site/control-flow formatting only.
src/application/use_cases/diarization_ingestion_use_case.py	Call-site formatting only.
src/application/use_cases/delete_diarization_use_case.py	Logging formatting only.
src/application/use_cases/auth_use_case.py	Signature formatting only.
main.py	Router/task registration formatting only.
frontend/src/locales/pt-BR.json	Add `diarization.identification.processing_voice` translation.
frontend/src/locales/en.json	Add `diarization.identification.processing_voice` translation.
frontend/src/components/DiarizationView.tsx	Handle voice-training completion events; avoid resetting UI step during identification/result; optimistic per-speaker “processing” state.
frontend/src/components/diarization/VoiceTrainingModal.tsx	`onTrained` now passes trained name to parent.
frontend/src/components/diarization/types.ts	Add optional `trainingStatus` to `Speaker`.
frontend/src/components/diarization/SpeakerIdentificationPanel.tsx	Disable/label training button while a speaker is training; use new translation key.
alembic/versions/f120b614600a_add_diarizations_and_voices_tables.py	Alembic formatting only.
alembic/versions/c48798b08031_add_voice_samples_table.py	Alembic formatting only.
alembic/versions/c16fab000f02_add_user_table.py	Alembic formatting only.
alembic/versions/bd01964d9b26_created_tables.py	Alembic formatting only.
alembic/versions/946d88fe08b1_add_source_metadata_to_content_source.py	Alembic formatting only.
alembic/versions/73f13c5ff10a_add_metadata_to_chunk_index.py	Alembic formatting only.
alembic/versions/72f69987a221_rename_diarization_title_to_name.py	Alembic formatting only.
alembic/versions/6e53bc32edfe_add_subject_id_to_diarization.py	Alembic formatting only.
alembic/versions/5ff7984a3bcc_optimize_sql_models_indexes_and_audit.py	Alembic formatting only.
alembic/versions/5736075a22d0_add_vector_store_type_to_job_and_chunk_.py	Alembic formatting only.
alembic/versions/50420d500c2e_add_token_columns_to_content_source.py	Alembic formatting only.
alembic/versions/4e8d4e04a288_add_external_source_to_ingestion_jobs.py	Alembic formatting only.
alembic/versions/0ce7f69147eb_update_unique_constraint_on_content_.py	Alembic formatting only.
alembic/versions/04e0f5f5f0af_add_status_message_and_error_message_to_.py	Alembic formatting only.

Comments suppressed due to low confidence (2)

src/presentation/api/routes/voice_profile_management_router.py:89

The temp directory cleanup in the finally block can raise and mask the original error (e.g., os.rmdir(temp_dir) fails on Windows, or if the UploadFile handle is still open). Consider ensuring the upload file is closed and using a safer cleanup approach (e.g., shutil.rmtree(temp_dir, ignore_errors=True) and/or suppressing cleanup exceptions).
frontend/src/components/DiarizationView.tsx:111
The SSE handler calls loadJobs(true) twice for diarization events when activeJob matches (once unconditionally and again to find the updated job). This causes two API fetches per event; consider awaiting a single loadJobs(true) call and reusing the returned list for both refresh + active-job update.

        if (lastEvent.type === 'diarization') {
            loadJobs(true);
            
            // If the updated job is the active one, refresh its details
            if (activeJob && lastEvent.id === activeJob.id) {
                loadJobs(true).then(updatedJobs => {
                    const updated = updatedJobs.find(j => j.id === activeJob.id);
                    if (updated && updated.status !== activeJob.status) {
                         // If user is already in identification/result, don't reset the
                         // local speakers state on transient status changes (e.g. voice
                         // training flipping the job to TRAINING → COMPLETED). Only
                         // open/reset the job when we were still waiting on the initial
                         // diarization run.
                         const inIdentification = step === 'identification' || step === 'result';
                         if (!inIdentification && (updated.status === 'awaiting_verification' || updated.status === 'completed' || updated.status === 'failed')) {
                            handleOpenJob(updated);
                        } else {
                            setActiveJob(updated);
                        }
                    }
                });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T18:27:13Z

 async def upload_and_register_new_voice_profile(
    event_bus: Annotated[IEventBus, Depends(get_event_bus)],
-    use_case: Annotated[
-        RegisterNewVoiceProfileUseCase, Depends(get_register_voice_profile_use_case)
-    ],
+    use_case: Annotated[RegisterNewVoiceProfileUseCase, Depends(get_register_voice_profile_use_case)],
    name: str = Form(...),
-    file: UploadFile = File(...),
+    file: UploadFile | None = File(None),
 ):
-    if not file.filename:
+    if not file or not file.filename or not file.filename.strip():
        raise HTTPException(status_code=400, detail="No filename provided")


/upload declares file as optional (UploadFile | None = File(None)), but the endpoint’s behavior requires a file. This makes the OpenAPI schema and validation inconsistent (missing file becomes a custom 400 instead of FastAPI’s standard 422). Consider keeping it required (File(...)) and only validating filename emptiness explicitly if needed.

codecov · 2026-04-07T18:29:37Z

Codecov Report

❌ Patch coverage is 85.29412% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.86%. Comparing base (c0af155) to head (360a982).
⚠️ Report is 6 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/application/workers.py	47.36%	10 Missing ⚠️
src/infrastructure/extractors/youtube_extractor.py	53.84%	3 Missing and 3 partials ⚠️
...c/infrastructure/services/voice_profile_service.py	87.50%	3 Missing and 3 partials ⚠️
...e/repositories/vector/weaviate/chunk_repository.py	70.58%	5 Missing ⚠️
...on/use_cases/process_audio_diarization_pipeline.py	83.33%	3 Missing ⚠️
...rc/infrastructure/extractors/crawl4ai_extractor.py	57.14%	2 Missing and 1 partial ⚠️
...ructure/repositories/sql/chunk_index_repository.py	75.00%	3 Missing ⚠️
...rc/infrastructure/services/model_loader_service.py	0.00%	3 Missing ⚠️
.../use_cases/identify_speakers_in_processed_audio.py	33.33%	2 Missing ⚠️
src/application/use_cases/manage_voice_profiles.py	0.00%	2 Missing ⚠️
... and 16 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #58      +/-   ##
==========================================
+ Coverage   80.41%   80.86%   +0.44%     
==========================================
  Files          86       86              
  Lines        6721     6737      +16     
  Branches      773      777       +4     
==========================================
+ Hits         5405     5448      +43     
+ Misses       1071     1041      -30     
- Partials      245      248       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…e profiles

ericksonlopes added 4 commits April 7, 2026 15:11

fix(api): improve voice profile upload validation and fix failing tes…

80a0a29

…t case

feat(db): add status and error messages to content sources

a7cd10b

refactor(api): enhance subject and source routing logic and coverage

5330207

refactor: update dependencies, index services, and background worker …

bb82cc7

…tests

Copilot AI review requested due to automatic review settings April 7, 2026 18:23

Copilot started reviewing on behalf of ericksonlopes April 7, 2026 18:24 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

feat(voice): add status management and improve test coverage for voic…

360a982

…e profiles

ericksonlopes merged commit b293786 into main Apr 7, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/voice profile and refactors#58

Feature/voice profile and refactors#58
ericksonlopes merged 5 commits into
mainfrom
feature/voice-profile-and-refactors

ericksonlopes commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

codecov Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ericksonlopes commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 7, 2026 •

edited

Loading