Skip to content

Conversation

@tito
Copy link
Member

@tito tito commented Oct 22, 2025

Summary

Adds transcript_format query parameter to GET /v1/transcripts/{id} endpoint, enabling multiple output formats for transcript data. Uses Pydantic discriminated unions for type-safe API responses.

Formats Supported

text (default)

Plain dialogue with speaker names.

Query:

GET /v1/transcripts/{id}?transcript_format=text

Response:

{
  "id": "transcript_123",
  "title": "Product Meeting",
  "duration": 1847.5,
  "transcript_format": "text",
  "transcript": "John Smith: Hello everyone, welcome to the meeting.\nJane Doe: Thanks for having me.\nJohn Smith: Let's get started.",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

text-timestamped

Dialogue with [MM:SS] timestamp prefixes.

Query:

GET /v1/transcripts/{id}?transcript_format=text-timestamped

Response:

{
  "id": "transcript_123",
  "title": "Product Meeting",
  "duration": 1847.5,
  "transcript_format": "text-timestamped",
  "transcript": "[00:00] John Smith: Hello everyone, welcome to the meeting.\n[00:03] Jane Doe: Thanks for having me.\n[00:07] John Smith: Let's get started.",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

webvtt-named

WebVTT subtitle format with participant names.

Query:

GET /v1/transcripts/{id}?transcript_format=webvtt-named

Response:

{
  "id": "transcript_123",
  "title": "Product Meeting",
  "duration": 1847.5,
  "transcript_format": "webvtt-named",
  "transcript": "WEBVTT\n\n00:00:00.000 --> 00:00:03.000\n<v John Smith>Hello everyone, welcome to the meeting.\n\n00:00:03.000 --> 00:00:05.500\n<v Jane Doe>Thanks for having me.",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

json

Structured segments with full metadata.

Query:

GET /v1/transcripts/{id}?transcript_format=json

Response:

{
  "id": "transcript_123",
  "title": "Product Meeting",
  "duration": 1847.5,
  "transcript_format": "json",
  "transcript": [
    {
      "speaker": 0,
      "speaker_name": "John Smith",
      "text": "Hello everyone, welcome to the meeting.",
      "start": 0.0,
      "end": 3.0
    },
    {
      "speaker": 1,
      "speaker_name": "Jane Doe",
      "text": "Thanks for having me.",
      "start": 3.0,
      "end": 5.5
    },
    ...
  ],
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

Technical Details

  • Uses Pydantic discriminated unions with transcript_format as discriminator
  • Default format is text for backward compatibility
  • POST/PATCH endpoints return GetTranscriptWithParticipants (minimal response)
  • GET endpoint returns format-specific models based on query parameter
  • 15 new tests added, all existing tests pass
  • Documentation added in docs/transcript.md

@vercel
Copy link

vercel bot commented Oct 22, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
reflector-media Canceled Canceled Oct 22, 2025 3:38pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
reflector Ignored Ignored Preview Oct 22, 2025 3:38pm

@pr-agent-monadical
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Type Consistency

The PR changes the return type of transcript_get but doesn't update all callers. The GetTranscriptWithParticipants class is used in some places where GetTranscript was previously used, which could cause type compatibility issues.

class GetTranscriptWithParticipants(GetTranscriptMinimal):
    participants: list[TranscriptParticipant] | None


class GetTranscriptWithText(GetTranscriptWithParticipants):
    """
    Transcript response with plain text format.

    Format: Speaker names followed by their dialogue, one line per segment.
    Example:
        John Smith: Hello everyone
        Jane Doe: Hi there
    """

    transcript_format: Literal["text"] = "text"
    transcript: str


class GetTranscriptWithTextTimestamped(GetTranscriptWithParticipants):
    """
    Transcript response with timestamped text format.

    Format: [MM:SS] timestamp prefix before each speaker and dialogue.
    Example:
        [00:00] John Smith: Hello everyone
        [00:05] Jane Doe: Hi there
    """

    transcript_format: Literal["text-timestamped"] = "text-timestamped"
    transcript: str


class GetTranscriptWithWebVTTNamed(GetTranscriptWithParticipants):
    """
    Transcript response in WebVTT subtitle format with participant names.

    Format: Standard WebVTT with voice tags using participant names.
    Example:
        WEBVTT

        00:00:00.000 --> 00:00:05.000
        <v John Smith>Hello everyone
    """

    transcript_format: Literal["webvtt-named"] = "webvtt-named"
    transcript: str


class GetTranscriptWithJSON(GetTranscriptWithParticipants):
    """
    Transcript response as structured JSON segments.

    Format: Array of segment objects with speaker info, text, and timing.
    Example:
        [
            {
                "speaker": 0,
                "speaker_name": "John Smith",
                "text": "Hello everyone",
                "start": 0.0,
                "end": 5.0
            }
        ]
    """

    transcript_format: Literal["json"] = "json"
    transcript: list[TranscriptSegment]


GetTranscript = Annotated[
    GetTranscriptWithText
    | GetTranscriptWithTextTimestamped
    | GetTranscriptWithWebVTTNamed
    | GetTranscriptWithJSON,
    Discriminator("transcript_format"),
]
Error Handling

The transcript format conversion functions don't have explicit error handling for malformed input data. Consider adding try/except blocks to handle potential exceptions from processing invalid transcript data.

def transcript_to_text(
    topics: list[TranscriptTopic], participants: list[TranscriptParticipant] | None
) -> str:
    """Convert transcript topics to plain text with speaker names."""
    lines = []
    for topic in topics:
        if not topic.words:
            continue

        transcript = ProcessorTranscript(words=topic.words)
        segments = transcript.as_segments()

        for segment in segments:
            speaker_name = get_speaker_name(segment.speaker, participants)
            text = segment.text.strip()
            lines.append(f"{speaker_name}: {text}")

    return "\n".join(lines)


def transcript_to_text_timestamped(
    topics: list[TranscriptTopic], participants: list[TranscriptParticipant] | None
) -> str:
    """Convert transcript topics to timestamped text with speaker names."""
    lines = []
    for topic in topics:
        if not topic.words:
            continue

        transcript = ProcessorTranscript(words=topic.words)
        segments = transcript.as_segments()

        for segment in segments:
            speaker_name = get_speaker_name(segment.speaker, participants)
            timestamp = format_timestamp_mmss(segment.start)
            text = segment.text.strip()
            lines.append(f"[{timestamp}] {speaker_name}: {text}")

    return "\n".join(lines)


def topics_to_webvtt_named(
    topics: list[TranscriptTopic], participants: list[TranscriptParticipant] | None
) -> str:
    """Convert transcript topics to WebVTT format with participant names."""
    vtt = webvtt.WebVTT()

    for topic in topics:
        if not topic.words:
            continue

        segments = words_to_segments(topic.words)

        for segment in segments:
            speaker_name = get_speaker_name(segment.speaker, participants)
            text = segment.text.strip()
            text = f"<v {speaker_name}>{text}"

            caption = webvtt.Caption(
                start=_seconds_to_timestamp(segment.start),
                end=_seconds_to_timestamp(segment.end),
                text=text,
            )
            vtt.captions.append(caption)

    return vtt.content


def transcript_to_json_segments(
    topics: list[TranscriptTopic], participants: list[TranscriptParticipant] | None
) -> list[TranscriptSegment]:
    """Convert transcript topics to a flat list of JSON segments."""
    segments = []

    for topic in topics:
        if not topic.words:
            continue

        transcript = ProcessorTranscript(words=topic.words)
        for segment in transcript.as_segments():
            speaker_name = get_speaker_name(segment.speaker, participants)
            segments.append(
                TranscriptSegment(
                    speaker=segment.speaker,
                    speaker_name=speaker_name,
                    text=segment.text.strip(),
                    start=segment.start,
                    end=segment.end,
                )
            )

    return segments

@pr-agent-monadical
Copy link
Contributor

PR Code Suggestions ✨

No code suggestions found for the PR.

Add transcript_format query parameter to /v1/transcripts/{id} endpoint
with support for multiple output formats using discriminated unions.

Formats supported:
- text: Plain speaker dialogue (default)
- text-timestamped: Dialogue with [MM:SS] timestamps
- webvtt-named: WebVTT subtitles with participant names
- json: Structured segments with full metadata

Response models use Pydantic discriminated unions with transcript_format
as discriminator field. POST/PATCH endpoints return GetTranscriptWithParticipants
for minimal responses. GET endpoint returns format-specific models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant