Feature: Multimodal Input Modules for Text Sentiment and Audio Emotion Recognition - GSOC-2026

**Labels:** `enhancement` `gsoc-2026` `multimodal` 
**Depends on:** #1988 

---

## Text module

### What it should do

Take a string of text - a participant's transcribed comment, their answer to a task question, and return a `SentimentOutput` with `modality: "text"`.

### Intent detection (optional but valuable)

If time allows, it would be useful to also detect *intent* from the text and not just what emotion the participant expressed, but whether they're making a complaint, a suggestion, asking a question, or describing confusion. This doesn't need a separate model, a simple keyword/pattern approach is fine for now. The intent label can feed into the usability mapping the same way an emotion label does.

---

## Audio module

### What it should do

Take a path to an audio file (WAV or MP3, short clip from a usability session recording) and return a `SentimentOutput` with `modality: "audio"`.

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Multimodal Input Modules for Text Sentiment and Audio Emotion Recognition - GSOC-2026 #1990

Text module

What it should do

Intent detection (optional but valuable)

Audio module

What it should do

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: Multimodal Input Modules for Text Sentiment and Audio Emotion Recognition - GSOC-2026 #1990

Description

Text module

What it should do

Intent detection (optional but valuable)

Audio module

What it should do

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions