Feature/voice profile and refactors#58
Conversation
There was a problem hiding this comment.
Pull request overview
Adds UX + API support for “voice profile training in progress” while performing broad formatting/refactor cleanups across backend services, repositories, routers, and the test suite.
Changes:
- Frontend: track per-speaker voice-training “processing” state and update UI labels/translations accordingly.
- Backend: harden voice profile upload handling (filename sanitization) and minor API/service refactors.
- Repo-wide refactor: extensive line-wrapping/formatting normalization across Python + tests + Alembic migrations.
Reviewed changes
Copilot reviewed 156 out of 156 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/presentation/api/test_dependencies.py | Formatting-only refactor in tests (patch blocks). |
| tests/presentation/api/routes/test_subject_router.py | Formatting-only refactor in tests. |
| tests/presentation/api/routes/test_source_router.py | Formatting-only refactor in tests. |
| tests/presentation/api/routes/test_settings_router.py | Formatting-only refactor in tests. |
| tests/presentation/api/routes/test_ingest_router.py | Formatting-only refactor in tests. |
| tests/presentation/api/routes/test_ingest_router_file.py | Formatting-only refactor in tests. |
| tests/presentation/api/routes/test_chunk_router.py | Formatting-only refactor in tests. |
| tests/presentation/api/routes/test_auth_router.py | Formatting-only refactor in tests. |
| tests/presentation/api/routes/test_audio_diarization_router.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_youtube_audio_downloader.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_whisperx_audio_diarizer.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_voice_profile_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_text_splitter_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_re_rank_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_pyannote_voice_recognizer.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_model_loader_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_knowledge_subject_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_ingestion_job_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_content_source_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/services/test_chunk_index_service.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/weaviate/test_weaviate_vector.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/weaviate/test_weaviate_chunk_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/weaviate/test_chunk_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/weaviate/test_chunk_repository_extended.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/qdrant/test_chunk_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/faiss/test_faiss_chunk_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/faiss/test_chunk_repository_extended.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/vector/chroma/test_chroma_chunk_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/sql/test_repos_services.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/sql/test_knowledge_subject_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/sql/test_ingestion_job_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/sql/test_diarization_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/sql/test_content_source_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/sql/test_chunk_index_repository.py | Formatting-only refactor in tests. |
| tests/infrastructure/repositories/sql/test_additional_coverage.py | Formatting-only refactor in tests. |
| tests/infrastructure/loggers/test_std_logger.py | Formatting-only refactor in tests. |
| tests/infrastructure/extractors/test_youtube_extractor.py | Formatting-only refactor in tests. |
| tests/infrastructure/extractors/test_docling_extractor.py | Formatting-only refactor in tests. |
| tests/infrastructure/extractors/test_crawl4ai_extractor.py | Formatting-only refactor in tests. |
| tests/config/test_settings.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_youtube_throttling.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_web_scraping_use_case.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_search_use_case.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_process_audio_diarization_pipeline.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_knowledge_subject_use_case.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_diarization_ingestion_use_case.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_delete_diarization_use_case.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_content_source_use_case.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_auth_use_case.py | Formatting-only refactor in tests. |
| tests/application/use_cases/test_audio_recognition_use_cases.py | Formatting-only refactor in tests. |
| tests/application/test_workers.py | Formatting-only refactor in tests. |
| tests/application/test_audio_diarization_workers.py | Formatting-only refactor in tests. |
| src/presentation/api/routes/voice_profile_management_router.py | Voice upload: filename sanitization + API signature change + event publish formatting. |
| src/presentation/api/routes/subject_router.py | Logging call formatting only. |
| src/presentation/api/routes/settings_router.py | Response construction formatting only. |
| src/presentation/api/routes/notification_router.py | Executor call formatting only. |
| src/presentation/api/routes/job_router.py | Job listing call/log formatting only. |
| src/presentation/api/routes/ingest_router.py | Control-flow/log formatting only. |
| src/presentation/api/routes/chunk_router.py | Chunk listing call formatting only. |
| src/presentation/api/routes/audio_diarization_and_recognition_router.py | Dependency annotations + minor formatting. |
| src/infrastructure/utils/audio_utils.py | Error string formatting only. |
| src/infrastructure/services/youtube_vector_service.py | Signature/wrapping formatting only. |
| src/infrastructure/services/youtube_data_process_service.py | Control-flow/list-comp formatting only. |
| src/infrastructure/services/whisperx_audio_diarizer.py | Call-site formatting only. |
| src/infrastructure/services/voice_profile_service.py | Formatting-only refactor. |
| src/infrastructure/services/text_splitter_service.py | Formatting-only refactor. |
| src/infrastructure/services/task_queue_service.py | Thread creation/log formatting only. |
| src/infrastructure/services/redis_task_queue_service.py | Thread creation + cast formatting only. |
| src/infrastructure/services/re_rank_service.py | List-comp formatting only. |
| src/infrastructure/services/pyannote_voice_recognizer.py | Call formatting only. |
| src/infrastructure/services/model_loader_service.py | Model load call formatting only. |
| src/infrastructure/services/knowledge_subject_service.py | Formatting-only refactor. |
| src/infrastructure/services/ingestion_job_service.py | Formatting-only refactor. |
| src/infrastructure/services/content_source_service.py | Formatting-only refactor. |
| src/infrastructure/services/chunk_vector_service.py | Log formatting only. |
| src/infrastructure/services/chunk_index_service.py | Formatting-only refactor. |
| src/infrastructure/services/auth_service.py | JWT decode formatting only. |
| src/infrastructure/repositories/vector/weaviate/weaviate_client.py | Client creation/log formatting only. |
| src/infrastructure/repositories/vector/weaviate/chunk_repository.py | Formatting-only refactor. |
| src/infrastructure/repositories/vector/qdrant/connector.py | Warnings filter formatting only. |
| src/infrastructure/repositories/vector/qdrant/chunk_repository.py | Formatting-only refactor. |
| src/infrastructure/repositories/vector/models/chunk_model.py | Field declarations formatting only. |
| src/infrastructure/repositories/vector/chroma/chunk_repository.py | Formatting-only refactor. |
| src/infrastructure/repositories/storage/storage.py | Logging/delete formatting only. |
| src/infrastructure/repositories/sql/utils/utils.py | Signature formatting only. |
| src/infrastructure/repositories/sql/models/voice_record.py | Column declaration formatting only. |
| src/infrastructure/repositories/sql/models/user.py | Column declaration formatting only. |
| src/infrastructure/repositories/sql/models/knowledge_subject.py | Column/relationship formatting only. |
| src/infrastructure/repositories/sql/models/ingestion_job.py | Column/relationship formatting only. |
| src/infrastructure/repositories/sql/models/diarization_record.py | Column declaration formatting only. |
| src/infrastructure/repositories/sql/models/content_source.py | Column declaration formatting only. |
| src/infrastructure/repositories/sql/models/chunk_index.py | Column declaration formatting only. |
| src/infrastructure/repositories/sql/knowledge_subject_repository.py | Logging formatting only. |
| src/infrastructure/repositories/sql/ingestion_job_repository.py | Query/log formatting only. |
| src/infrastructure/repositories/sql/diarization_repository.py | Query formatting only. |
| src/infrastructure/repositories/sql/content_source_repository.py | Query/log formatting only. |
| src/infrastructure/repositories/sql/chunk_index_repository.py | Query/log formatting only. |
| src/infrastructure/loggers/std_logger.py | ContextVar + method signatures formatting only. |
| src/infrastructure/extractors/youtube_extractor.py | Formatting-only refactor + proxy string wrap. |
| src/infrastructure/extractors/plain_text_extractor.py | httpx client/log formatting only. |
| src/infrastructure/extractors/models/youtube_metadata_dto.py | DTO field formatting only. |
| src/infrastructure/extractors/docling_extractor.py | Stats-building formatting only. |
| src/infrastructure/extractors/crawl4ai_extractor.py | Control-flow/list-comp formatting only. |
| src/domain/mappers/knowledge_subject_mapper.py | List-comp formatting only. |
| src/domain/mappers/ingestion_job_mapper.py | Cast/list-comp formatting only. |
| src/domain/mappers/content_source_mapper.py | Cast formatting only. |
| src/domain/mappers/chunk_mapper.py | Condition/wrapping formatting only. |
| src/domain/mappers/chunk_index_mapper.py | Helper signature/wrapping formatting only. |
| src/domain/interfaces/repository/retriver_repository.py | Abstract method signature formatting only. |
| src/domain/exception/youtube_exceptions.py | String formatting only. |
| src/domain/entities/knowledge_subject_entity.py | Field formatting only. |
| src/domain/entities/diarization.py | Signature/list-comp formatting only. |
| src/domain/entities/content_source_entity.py | Field formatting only. |
| src/domain/entities/chunk_entity.py | Field formatting only. |
| src/config/validators.py | Condition/signature formatting only. |
| src/application/use_cases/web_scraping_use_case.py | UUID parse + call-site formatting only. |
| src/application/use_cases/search_use_case.py | Logging/context formatting only. |
| src/application/use_cases/retrieve_processed_audio_history.py | Signature formatting only. |
| src/application/use_cases/process_audio_diarization_pipeline.py | Call-site formatting only. |
| src/application/use_cases/manage_voice_profiles.py | Formatting-only refactor (includes voice train event publish). |
| src/application/use_cases/identify_speakers_in_processed_audio.py | Path/build/log formatting only. |
| src/application/use_cases/file_ingestion_use_case.py | Call-site/control-flow formatting only. |
| src/application/use_cases/diarization_ingestion_use_case.py | Call-site formatting only. |
| src/application/use_cases/delete_diarization_use_case.py | Logging formatting only. |
| src/application/use_cases/auth_use_case.py | Signature formatting only. |
| main.py | Router/task registration formatting only. |
| frontend/src/locales/pt-BR.json | Add diarization.identification.processing_voice translation. |
| frontend/src/locales/en.json | Add diarization.identification.processing_voice translation. |
| frontend/src/components/DiarizationView.tsx | Handle voice-training completion events; avoid resetting UI step during identification/result; optimistic per-speaker “processing” state. |
| frontend/src/components/diarization/VoiceTrainingModal.tsx | onTrained now passes trained name to parent. |
| frontend/src/components/diarization/types.ts | Add optional trainingStatus to Speaker. |
| frontend/src/components/diarization/SpeakerIdentificationPanel.tsx | Disable/label training button while a speaker is training; use new translation key. |
| alembic/versions/f120b614600a_add_diarizations_and_voices_tables.py | Alembic formatting only. |
| alembic/versions/c48798b08031_add_voice_samples_table.py | Alembic formatting only. |
| alembic/versions/c16fab000f02_add_user_table.py | Alembic formatting only. |
| alembic/versions/bd01964d9b26_created_tables.py | Alembic formatting only. |
| alembic/versions/946d88fe08b1_add_source_metadata_to_content_source.py | Alembic formatting only. |
| alembic/versions/73f13c5ff10a_add_metadata_to_chunk_index.py | Alembic formatting only. |
| alembic/versions/72f69987a221_rename_diarization_title_to_name.py | Alembic formatting only. |
| alembic/versions/6e53bc32edfe_add_subject_id_to_diarization.py | Alembic formatting only. |
| alembic/versions/5ff7984a3bcc_optimize_sql_models_indexes_and_audit.py | Alembic formatting only. |
| alembic/versions/5736075a22d0_add_vector_store_type_to_job_and_chunk_.py | Alembic formatting only. |
| alembic/versions/50420d500c2e_add_token_columns_to_content_source.py | Alembic formatting only. |
| alembic/versions/4e8d4e04a288_add_external_source_to_ingestion_jobs.py | Alembic formatting only. |
| alembic/versions/0ce7f69147eb_update_unique_constraint_on_content_.py | Alembic formatting only. |
| alembic/versions/04e0f5f5f0af_add_status_message_and_error_message_to_.py | Alembic formatting only. |
Comments suppressed due to low confidence (2)
src/presentation/api/routes/voice_profile_management_router.py:89
- The temp directory cleanup in the
finallyblock can raise and mask the original error (e.g.,os.rmdir(temp_dir)fails on Windows, or if the UploadFile handle is still open). Consider ensuring the upload file is closed and using a safer cleanup approach (e.g.,shutil.rmtree(temp_dir, ignore_errors=True)and/or suppressing cleanup exceptions).
frontend/src/components/DiarizationView.tsx:111 - The SSE handler calls
loadJobs(true)twice for diarization events whenactiveJobmatches (once unconditionally and again to find the updated job). This causes two API fetches per event; consider awaiting a singleloadJobs(true)call and reusing the returned list for both refresh + active-job update.
if (lastEvent.type === 'diarization') {
loadJobs(true);
// If the updated job is the active one, refresh its details
if (activeJob && lastEvent.id === activeJob.id) {
loadJobs(true).then(updatedJobs => {
const updated = updatedJobs.find(j => j.id === activeJob.id);
if (updated && updated.status !== activeJob.status) {
// If user is already in identification/result, don't reset the
// local speakers state on transient status changes (e.g. voice
// training flipping the job to TRAINING → COMPLETED). Only
// open/reset the job when we were still waiting on the initial
// diarization run.
const inIdentification = step === 'identification' || step === 'result';
if (!inIdentification && (updated.status === 'awaiting_verification' || updated.status === 'completed' || updated.status === 'failed')) {
handleOpenJob(updated);
} else {
setActiveJob(updated);
}
}
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async def upload_and_register_new_voice_profile( | ||
| event_bus: Annotated[IEventBus, Depends(get_event_bus)], | ||
| use_case: Annotated[ | ||
| RegisterNewVoiceProfileUseCase, Depends(get_register_voice_profile_use_case) | ||
| ], | ||
| use_case: Annotated[RegisterNewVoiceProfileUseCase, Depends(get_register_voice_profile_use_case)], | ||
| name: str = Form(...), | ||
| file: UploadFile = File(...), | ||
| file: UploadFile | None = File(None), | ||
| ): | ||
| if not file.filename: | ||
| if not file or not file.filename or not file.filename.strip(): | ||
| raise HTTPException(status_code=400, detail="No filename provided") |
There was a problem hiding this comment.
/upload declares file as optional (UploadFile | None = File(None)), but the endpoint’s behavior requires a file. This makes the OpenAPI schema and validation inconsistent (missing file becomes a custom 400 instead of FastAPI’s standard 422). Consider keeping it required (File(...)) and only validating filename emptiness explicitly if needed.
No description provided.