GSoC 2026 Candidate Submission: End-to-End Narrative Audio Pipeline by meganho456 · Pull Request #39 · humanai-foundation/humanai-foundation.github.io

meganho456 · 2026-03-29T15:21:06Z

This PR contains my GSoC 2026 test submission for a complete narrative-audio workflow, including all required tasks and a bonus storytelling analysis component.

What’s included
Task 1: Audio Processing Pipeline
Loads .wav recordings, normalizes audio, segments clips when needed, and extracts ML-ready features.
Features include MFCCs, pitch, spectral centroid, RMS energy, and duration.
Produces a structured feature dataset and normalized audio outputs.
Task 2: Narrative Tone Classification
Trains a neural-network classifier using labeled emotional-tone data.
Uses train/test split and reports evaluation metrics (accuracy, weighted F1, per-class report).
Task 3: AI-Based Transcription
Implements batch transcription with Whisper.
Exports transcripts to text format.
Measures transcription quality on a subset using WER.
Task 4: Narrative Audio Retrieval
Implements a retrieval prototype for narrative-style queries (e.g., calm narration, high-energy speech, dramatic dialogue).
Combines structured filtering and semantic ranking to return relevant recordings.
Bonus: Storytelling Audio Analysis
Analyzes storytelling-oriented cues: pacing/pauses, pitch variation, energy dynamics, and sentence-length characteristics.
Adds a heuristic storytelling score and ranks clips by storytelling-like expressiveness.
Deliverables in this submission
Full source code for Tasks 1–4 and bonus task, and run_pipeline that chains all the tasks together
Technical report PDF
README with setup and run instructions
Example output artifacts (feature CSVs, transcripts, analysis outputs)

…us storytelling analysis

- New task0_audio_capture/audio_capture.py with RollingBuffer (thread-safe circular deque), AudioCaptureStream (sounddevice callback at 64 ms chunks), and record_for_duration() convenience helper - run_pipeline.py now captures 5 s from the microphone at startup and feeds the recording into the downstream tasks; falls back to pre-recorded file if no mic is available - requirements.txt: add sounddevice>=0.4.6 - README.md: document Step 1 architecture, parameters, standalone usage, library API, and PortAudio install instructions Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

New modules: - emotion_classifier/: production EmotionClassifier with mfcc-mlp, wav2vec2, and hubert backends; parallel inference via ThreadPoolExecutor - transcriber/: StreamingTranscriber with faster-whisper and openai-whisper backends; pause-triggered utterance segmentation - utterance_buffer/: pause-triggered and fixed-window segmentation strategies - vad_engine/: webrtcvad and silero-vad backends for speech detection - output_generator/: CaptionFormatter, SRTWriter, CaptionBroadcaster (WebSocket), AtmosphereMapper, CrossfadeScheduler - output_generator/overlay.html: browser caption overlay for OBS/streaming - tests/: full test suite with fixtures for all pipeline steps Updated: - README.md: expanded with real-time pipeline architecture and usage - requirements.txt: added faster-whisper, websockets, transformers - run_pipeline.py: wired into new module structure - .gitignore: exclude .claude/, __pycache__, *.pt checkpoints, *_cache.npz Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

meganho456 and others added 7 commits March 29, 2026 06:27

Add run_pipeline.py and integrate Tasks 1-4

fb06be7

GSoC 2026 test submission: narrative audio pipeline tasks 1-4 and bon…

a4b4442

…us storytelling analysis

updated README.MD file

cfb1b29

Refactor: simplify pipeline scripts and clean up readability

ca89f3d

cleaned up more

683b754

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC 2026 Candidate Submission: End-to-End Narrative Audio Pipeline#39

GSoC 2026 Candidate Submission: End-to-End Narrative Audio Pipeline#39
meganho456 wants to merge 7 commits intohumanai-foundation:masterfrom
meganho456:gsoc2026-narrative-audio

meganho456 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

meganho456 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant