Problem
Currently only plain text input is supported for both admin (hackathon setup) and participant (idea submission). Real-world usage requires document uploads — hackathon rules as PDFs, ideas written in docx/markdown, etc.
Current state
- Admin: text chat with DeepSeek via
/init endpoint
- Participant:
idea_text: str field in submission JSON
- No file upload endpoint exists
- Frontend will have a file input prop ready (non-functional until backend support lands)
Proposed solution
Participant side (ingestion/normalization node)
- Accept file uploads (PDF, docx, markdown) as base64 in submission payload
- Ingestion node extracts text based on file type → conditionally summarizes if too long
- Normalized text feeds into embeddings + scoring
- Tools:
extract_pdf, extract_docx, parse_markdown — agent picks the right one based on input
Admin side (init handler)
- Accept file upload (hackathon rules/guidelines document) in
/init payload
- Quick text extraction → keyword scan for criteria/guidelines/theme sections
- Send relevant sections to LLM for structured config extraction
- If document is missing required info, ask admin for clarification in follow-up turn
Libraries
- PDF:
pdfplumber
- Docx:
python-docx
- Markdown: regex strip or
mistune
Scope
- Add
idea_file + idea_file_type optional fields to submission model
- Add
file + file_type optional fields to init request
- Extraction tools as agent-callable functions inside TEE
- Conditional summarization for long extracted text
- Error handling for malformed/empty files
Context
Part of making the pipeline genuinely agentic — the ingestion node makes non-deterministic tool call decisions based on input format and content length. Also unblocks real-world usage where participants have their ideas in documents rather than raw text.
Problem
Currently only plain text input is supported for both admin (hackathon setup) and participant (idea submission). Real-world usage requires document uploads — hackathon rules as PDFs, ideas written in docx/markdown, etc.
Current state
/initendpointidea_text: strfield in submission JSONProposed solution
Participant side (ingestion/normalization node)
extract_pdf,extract_docx,parse_markdown— agent picks the right one based on inputAdmin side (init handler)
/initpayloadLibraries
pdfplumberpython-docxmistuneScope
idea_file+idea_file_typeoptional fields to submission modelfile+file_typeoptional fields to init requestContext
Part of making the pipeline genuinely agentic — the ingestion node makes non-deterministic tool call decisions based on input format and content length. Also unblocks real-world usage where participants have their ideas in documents rather than raw text.