Skip to content

[FEAT]: Implement Real-time Audio Transcription & Indic Language Auto-Captioning for Virtual Courtrooms #1322

@nandannikam

Description

@nandannikam

Is your feature request related to a problem? Please describe.

Currently, the Virtual Courtroom module supports native WebRTC-based video conferencing for remote hearings. However, it lacks real-time transcription and closed-captioning. For litigants from diverse linguistic backgrounds or those with hearing impairments, participating in virtual hearings without live, translated text (Hinglish/regional languages) creates a severe barrier to justice.

Describe the solution you'd like

I want to implement a live-captioning system that extracts audio from the WebRTC stream, processes it via the backend, and displays synchronized text on the UI.

Objective & Tasks:

  • Extract audio streams from the existing WebRTC client in the React frontend.
  • Transmit chunked audio data to the backend via WebSockets for low-latency processing.
  • Leverage the existing Bhashini / AI NLP integrations in the orchestrator to generate real-time transcripts.
  • Stream the translated text back to the frontend and overlay it as synchronized closed-captions on the video UI.
  • Update the nyaysetu-frontend Virtual Court component to capture and chunk microphone audio streams via the MediaRecorder API.
  • Build a scalable UI overlay in the React video conferencing view to display live captions with language toggle options.

Describe alternatives you've considered

I considered using third-party browser extensions or external APIs directly from the client side, but routing the audio through the existing FastAPI (nlp-orchestrator) / Spring Boot backend via WebSockets ensures better security, lower latency, and leverages the project's existing Bhashini/AI NLP integrations.

Additional context

Tech Stack Required:

  • React & WebRTC (Frontend)
  • WebSockets (STOMP or standard WS)
  • FastAPI / Python (NLP Orchestrator)
  • Bhashini API / AI Transcription Models

Notes:

  • To minimize latency, audio chunking should be optimized (e.g., 1-3 second intervals).
  • We can fall back to basic local transcription models during local development to avoid API rate limits.
  • I would love to work on this issue for GSSoC'26. Please assign it to me with the gssoc and advanced / level-3 labels!

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions