Skip to content

Feature: Multimodal Input Modules for Text Sentiment and Audio Emotion Recognition - GSOC-2026 #1990

@HITESH-S-P

Description

@HITESH-S-P

Labels: enhancement gsoc-2026 multimodal
Depends on: #1988


Text module

What it should do

Take a string of text - a participant's transcribed comment, their answer to a task question, and return a SentimentOutput with modality: "text".

Intent detection (optional but valuable)

If time allows, it would be useful to also detect intent from the text and not just what emotion the participant expressed, but whether they're making a complaint, a suggestion, asking a question, or describing confusion. This doesn't need a separate model, a simple keyword/pattern approach is fine for now. The intent label can feed into the usability mapping the same way an emotion label does.


Audio module

What it should do

Take a path to an audio file (WAV or MP3, short clip from a usability session recording) and return a SentimentOutput with modality: "audio".


Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions