[FEAT]: Implement Real-time Audio Transcription & Indic Language Auto-Captioning for Virtual Courtrooms

### Is your feature request related to a problem? Please describe.

Currently, the Virtual Courtroom module supports native WebRTC-based video conferencing for remote hearings. However, it lacks real-time transcription and closed-captioning. For litigants from diverse linguistic backgrounds or those with hearing impairments, participating in virtual hearings without live, translated text (Hinglish/regional languages) creates a severe barrier to justice.

### Describe the solution you'd like

I want to implement a live-captioning system that extracts audio from the WebRTC stream, processes it via the backend, and displays synchronized text on the UI.

**Objective & Tasks:**
- Extract audio streams from the existing WebRTC client in the React frontend.
- Transmit chunked audio data to the backend via WebSockets for low-latency processing.
- Leverage the existing Bhashini / AI NLP integrations in the orchestrator to generate real-time transcripts.
- Stream the translated text back to the frontend and overlay it as synchronized closed-captions on the video UI.
- Update the `nyaysetu-frontend` Virtual Court component to capture and chunk microphone audio streams via the `MediaRecorder` API.
- Build a scalable UI overlay in the React video conferencing view to display live captions with language toggle options.

### Describe alternatives you've considered

I considered using third-party browser extensions or external APIs directly from the client side, but routing the audio through the existing FastAPI (`nlp-orchestrator`) / Spring Boot backend via WebSockets ensures better security, lower latency, and leverages the project's existing Bhashini/AI NLP integrations.

### Additional context

**Tech Stack Required:**
- React & WebRTC (Frontend)
- WebSockets (STOMP or standard WS)
- FastAPI / Python (NLP Orchestrator)
- Bhashini API / AI Transcription Models

**Notes:**
- To minimize latency, audio chunking should be optimized (e.g., 1-3 second intervals).
- We can fall back to basic local transcription models during local development to avoid API rate limits.
- **I would love to work on this issue for GSSoC'26. Please assign it to me with the `gssoc` and `advanced` / `level-3` labels!**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Implement Real-time Audio Transcription & Indic Language Auto-Captioning for Virtual Courtrooms #1322

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEAT]: Implement Real-time Audio Transcription & Indic Language Auto-Captioning for Virtual Courtrooms #1322

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions