File upload support: PDF, docx, markdown for admin and participant inputs

## Problem

Currently only plain text input is supported for both admin (hackathon setup) and participant (idea submission). Real-world usage requires document uploads — hackathon rules as PDFs, ideas written in docx/markdown, etc.

## Current state

- Admin: text chat with DeepSeek via `/init` endpoint
- Participant: `idea_text: str` field in submission JSON
- No file upload endpoint exists
- Frontend will have a file input prop ready (non-functional until backend support lands)

## Proposed solution

### Participant side (ingestion/normalization node)
- Accept file uploads (PDF, docx, markdown) as base64 in submission payload
- Ingestion node extracts text based on file type → conditionally summarizes if too long
- Normalized text feeds into embeddings + scoring
- Tools: `extract_pdf`, `extract_docx`, `parse_markdown` — agent picks the right one based on input

### Admin side (init handler)
- Accept file upload (hackathon rules/guidelines document) in `/init` payload
- Quick text extraction → keyword scan for criteria/guidelines/theme sections
- Send relevant sections to LLM for structured config extraction
- If document is missing required info, ask admin for clarification in follow-up turn

### Libraries
- PDF: `pdfplumber`
- Docx: `python-docx`
- Markdown: regex strip or `mistune`

## Scope
- Add `idea_file` + `idea_file_type` optional fields to submission model
- Add `file` + `file_type` optional fields to init request
- Extraction tools as agent-callable functions inside TEE
- Conditional summarization for long extracted text
- Error handling for malformed/empty files

## Context

Part of making the pipeline genuinely agentic — the ingestion node makes non-deterministic tool call decisions based on input format and content length. Also unblocks real-world usage where participants have their ideas in documents rather than raw text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File upload support: PDF, docx, markdown for admin and participant inputs #5

Problem

Current state

Proposed solution

Participant side (ingestion/normalization node)

Admin side (init handler)

Libraries

Scope

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

File upload support: PDF, docx, markdown for admin and participant inputs #5

Description

Problem

Current state

Proposed solution

Participant side (ingestion/normalization node)

Admin side (init handler)

Libraries

Scope

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions