Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,3 +177,33 @@ docker run --rm \
--net=host \
localhost:5001/kebab:latest
```

### File uploads (artifacts)

The chat UI supports uploading files to agents running on the **Go ADK runtime**.
The artifact round trip (persistence, the `save_artifact`/`load_artifacts` tools,
and surfacing agent-produced files back to the UI) is Go ADK only; the Python
runtime mirrors just the text-extraction path so uploaded documents still reach
non-multimodal models. Files travel inline (base64) over the existing A2A message
channel as `FilePart`s — there is no separate artifact HTTP API or CRD field.

- **Storage:** Uploaded files are persisted via the ADK in-memory artifact
service (`artifact.InMemoryService()`), so artifacts live for the lifetime of
the agent process and are not durable across restarts.
- **Access from agents:** The `load_artifacts` tool is registered for every
agent, letting the LLM list/load uploaded files. Inbound uploads are saved
automatically (`SaveInputBlobsAsArtifacts`).
- **Reaching the model:** Images are forwarded inline to the model. Rich
documents (PDF, DOCX, XLSX, PPTX, HTML, EPUB) are extracted to text/markdown
so the model can read them; text-like files (txt/markdown/CSV/JSON) are
passed through as-is. The Go runtime uses
[tabula](https://github.com/tsawler/tabula) (pure Go, no CGO for native
text); the Python runtime uses
[markitdown](https://github.com/microsoft/markitdown). The supported document
set is kept common across both runtimes and the UI. Scanned/image-only PDFs
yield no text without OCR.
- **Allowlist (enforced in the UI):** images, PDF, plain text, Markdown, CSV,
JSON, XML, YAML, HTML, and DOCX, XLSX, PPTX, EPUB documents.
- **Size limit:** 10 MB per file by default, enforced both client-side and on
the server. Override the server limit with the `KAGENT_MAX_ARTIFACT_BYTES`
environment variable (value in bytes); oversized inbound files fail the task.
174 changes: 174 additions & 0 deletions design/EP-89405-file-upload-artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# EP-89405: File Upload & Artifact Support in Chat

* Issue: [MSFN-89405] (internal) — feat: file upload feature
* Status: `implemented` (branch `feature/file-upload-artifacts`, commit `ccfada50`)
* Related: EP-2046 (Chat UI for MCP UI widgets) — shares the chat files but
explicitly excludes the file-upload backend, which this EP owns.

## Background

kagent's chat could only exchange text. Users frequently need to hand an agent a
file (a log, a PDF, a CSV, a screenshot) and to receive files an agent/tool
produces (a generated report, an extracted table). The Go ADK runtime ships an
artifact subsystem (`artifact.Service`, `loadartifactstool`, the
`SaveInputBlobsAsArtifacts` run option, and the per-event `ArtifactDelta`
signal), but kagent serves agents purely over A2A (`adka2a`) and never wired the
`ArtifactService` into the runner — so none of it was reachable.

This EP enables an **end-to-end file upload / download round trip** in chat:

1. Users attach files in the chat UI; they travel inline (base64) over the
existing A2A message/SSE channel — **no new HTTP API, no CRD field**.
2. The Go ADK executor persists inbound uploads as artifacts and surfaces
agent-produced artifacts back to the UI as downloadable A2A file parts.
3. Agents get a `save_artifact` tool (produce files) plus the built-in
`load_artifacts` tool (reference uploaded/produced files across turns).
4. Both runtimes extract text from uploaded files (notably PDF) so models that
cannot natively read a document still receive its content.

## Motivation

- Let users give agents real working material instead of pasting text.
- Let agents return generated files (reports, transformed data) as first-class,
downloadable chat attachments rather than dumping content into the message.
- Reuse the ADK's existing, battle-tested artifact contract rather than inventing
a parallel storage/transport.

### Goals

- Wire the ADK in-memory `ArtifactService` into the Go runner (process-lifetime
persistence, versioned, user/session scoped).
- Accept inbound A2A file parts, persist them via `SaveInputBlobsAsArtifacts`,
and pass them inline to the model.
- Emit agent-saved artifacts back to the UI as A2A `FilePart` events, driven by
the `ArtifactDelta` event signal (event-driven, no store diffing).
- Register `loadartifactstool` (load) and a new `save_artifact` tool (produce)
for agents.
- Extract text from uploaded documents (PDF first) in both the Go ADK
(`fileextract`) and Python (`_file_extract`) model paths so non-multimodal
models still receive document content.
- Chat UI: attach multiple files with type/size validation; render image
thumbnails and downloadable file chips for both user and agent bubbles.
- Raise the nginx/proxy request-body limit so uploads aren't rejected at the edge.

### Non-Goals

- Durable / shared artifact storage (GCS, DB) and cross-replica access. Artifacts
live in process memory and are lost on pod restart.
- A standalone artifact browser UI (list / delete / version history) beyond the
per-message chips.
- A dedicated artifact HTTP/storage API for the UI (everything rides A2A).
- The MCP-app / minimap chat features that share the same chat files (EP-2046).

## Implementation Details

### Transport & data model

- **Upload:** `@a2a-js/sdk` `FilePart` `{ kind: "file", file: { name, mimeType,
bytes /*base64*/ } }` appended alongside the text part on `message/stream`.
- **Download:** A2A artifact-update events carrying a `FilePart`.
- **ADK artifact value:** `*genai.Part` with `InlineData {Data, MIMEType,
DisplayName}`; key `(AppName, UserID, SessionID, FileName, Version)`; versions
auto-increment; `ArtifactDelta map[filename]version` set automatically on save.

### Go ADK runtime

- **`go/adk/pkg/runner/adapter.go`** — set
`ArtifactService: adkartifact.InMemoryService()` on the `runner.Config` in
`CreateRunnerConfig` (single instance, reused for process-lifetime persistence).
- **`go/adk/pkg/agent/agent.go`** — register `loadartifactstool.New()` and the
new `save_artifact` tool in the agent's local tool set.
- **`go/adk/pkg/tools/save_artifact_tool.go`** — new tool letting the LLM produce
a downloadable file from chat; the executor surfaces it as an A2A file part.
- **`go/adk/pkg/a2a/executor.go`** — enable
`runConfig.SaveInputBlobsAsArtifacts = true`; on each event with
`Actions.ArtifactDelta`, `Load` each `(name, version)` from the store, set
`InlineData.DisplayName`, convert via `ToA2APart`, and emit an A2A
artifact-update `FilePart` event (`LastChunk=true`). Load/convert errors are
logged and skipped so the turn continues.
- **`go/adk/pkg/a2a/artifacts.go`** — helpers for building/emitting artifact
events from saved ADK parts.
- **`go/adk/pkg/fileextract/`** (`fileextract.go`, `pdf.go`) — extract text from
uploaded documents (PDF and other supported types) so the content is injected
for models that can't read the raw file.
- **`go/adk/pkg/models/openai_adk.go`** — inject extracted file text into the
OpenAI request path.
- New Go deps in `go/go.mod` / `go/go.sum` for PDF extraction.

### Python runtime

- **`python/packages/kagent-adk/src/kagent/adk/models/_file_extract.py`** — text
extraction (PDF, etc.) mirroring the Go path.
- **`python/packages/kagent-adk/src/kagent/adk/models/_openai.py`** — inject
extracted file content into the OpenAI request.
- **`python/packages/kagent-adk/pyproject.toml`** — add the extraction dependency.

> Note: the original design scoped Python as a follow-up; the shipped
> implementation includes the Python extraction path as well.

### UI (Next.js, `ui/src`)

- **`lib/fileUpload.ts`** — `FILE_ACCEPT`, `MAX_FILE_BYTES` (10 MB), `isAllowedFile`,
`fileToFilePart` (read file → base64 `FilePart`). Allowlist: images, PDF,
text/markdown, CSV, JSON.
- **`components/chat/ChatInterface.tsx`** — attach button + hidden multi-file
`<input>`; `pendingFiles` state with removable chips; type/size validation with
toasts; build `FilePart`s and append to the outgoing message; session naming
falls back to the first file name for file-only messages.
- **`components/chat/FileAttachment.tsx`** (new) — image thumbnail (object URL)
or a download chip (icon, filename, human size, download button).
- **`components/chat/ChatMessage.tsx`** — render file parts in user and agent
bubbles.
- **`lib/messageHandlers.ts`** — preserve `file` parts from messages and from
`artifact-update` events (`extractMessagesFromTasks`).

### Edge / deployment

- **`helm/kagent/files/nginx.conf`** — add
`client_max_body_size {{ .Values.ui.nginx.clientMaxBodySize }};`.
- **`helm/kagent/values.yaml`** — `ui.nginx.clientMaxBodySize: 50m` (default).
- **`helm/kagent/tests/ui-nginx-configmap_test.yaml`** — assert the default and
an override render into the config.

### Acceptance criteria (AC1–AC8)

AC1 artifact service wired · AC2 upload passed inline · AC3 upload persisted ·
AC4 agent-saved artifact emitted as A2A `FilePart` · AC5 `loadartifactstool`
registered · AC6 UI multi-file attach with validation · AC7 thumbnail/chip
rendering in both bubbles · AC8 E2E upload round trip on kind.

### Test Plan

- **Go unit:** `adapter_test.go` (non-nil `ArtifactService`), `agent_test.go`
(tool list includes load/save tools), `executor_test.go` (ArtifactDelta →
emitted `FilePart`; oversized inbound → failed status), `artifacts_test.go`,
`save_artifact_tool_test.go`, `fileextract` tests (`fileextract_test.go`,
`fixture_test.go`), `models/openai_adk_test.go`.
- **Go e2e:** `go/core/test/e2e/file_upload_test.go` — upload an inline A2A file
part to a Go ADK agent and assert it is processed (uses the current
`a2aproject/a2a-go/v2` API).
- **Python unit:** `tests/unittests/models/test_file_extract.py`,
`test_openai.py`.
- **UI unit:** `lib/__tests__/fileUpload.test.ts`,
`lib/__tests__/messageHandlers.test.ts`,
`chat/__tests__/FileAttachment.test.tsx`, `chat/__tests__/ChatMessage.test.tsx`.
- **Helm:** `helm/kagent/tests/ui-nginx-configmap_test.yaml`.

## Alternatives

- **Core-server artifact endpoints / mounting `adkrest`:** more moving parts, a
second transport, and `adkrest` has no upload route — rejected in favor of
A2A-only.
- **Diff the artifact store per turn:** racy and fragile versus the idiomatic
`ArtifactDelta` event signal.
- **Inline-only download (no store surfacing):** wouldn't connect the artifact
store to the UI.
- **Send raw files to every model:** token bloat and many models can't read
PDFs — hence server-side text extraction with the 10 MB cap.

## Open Questions

- Durable/shared artifact storage for multi-replica deployments and restarts.
- Per-file size limit configurability beyond the current 10 MB UI cap +
`client_max_body_size`.
- Whether to expose an artifact browser (list/version/delete) in the UI.
154 changes: 154 additions & 0 deletions go/adk/pkg/a2a/artifacts.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
package a2a

import (
"context"
"encoding/base64"
"fmt"
"maps"
"os"
"strconv"
"strings"

a2atype "github.com/a2aproject/a2a-go/a2a"
"github.com/a2aproject/a2a-go/a2asrv"
"github.com/a2aproject/a2a-go/a2asrv/eventqueue"
adkartifact "google.golang.org/adk/artifact"
"google.golang.org/adk/server/adka2a" //nolint:staticcheck // kagent still uses a2a-go v1; this ADK package is the compatibility adapter.
)

const (
// defaultMaxArtifactBytes is the default per-file size limit for inbound
// uploads (10 MB).
defaultMaxArtifactBytes = 10 * 1024 * 1024
// envMaxArtifactBytes overrides the inbound file size limit (in bytes).
envMaxArtifactBytes = "KAGENT_MAX_ARTIFACT_BYTES"
)

// MaxArtifactBytes returns the artifact size limit, honoring the
// KAGENT_MAX_ARTIFACT_BYTES env var and falling back to the default. It bounds
// both inbound uploads and agent-saved artifacts.
func MaxArtifactBytes() int {
if v := os.Getenv(envMaxArtifactBytes); v != "" {
if n, err := strconv.Atoi(v); err == nil && n > 0 {
return n
}
}
return defaultMaxArtifactBytes
}

// checkInboundFileSizes returns an error if any inbound FilePart's decoded
// content exceeds the limit. Only inline base64 (FileBytes) parts are checked;
// URI-referenced files are out of scope. The decoded size is derived from the
// base64 length so a ~10 MB upload is not fully decoded onto the heap just to be
// measured; base64 validity is enforced downstream when the payload is decoded
// for persistence.
func checkInboundFileSizes(msg *a2atype.Message, limit int) error {
if msg == nil {
return nil
}
for _, part := range msg.Parts {
fp := asFilePart(part)
if fp == nil {
continue
}
fb, ok := fp.File.(a2atype.FileBytes)
if !ok {
continue
}
if n := base64DecodedLen(fb.Bytes); n > limit {
return fmt.Errorf("file %q exceeds maximum allowed size: %d bytes > %d bytes", fb.Name, n, limit)
}
}
return nil
}

// base64DecodedLen returns the number of bytes that standard padded base64 input
// decodes to, derived from its length and trailing padding without allocating
// the payload.
//
// ponytail: assumes clean, padded StdEncoding base64 (what the UI emits via
// FileReader); embedded whitespace/newlines would inflate the estimate. The
// upgrade path is a streaming decode counter if MIME-wrapped input ever shows up.
func base64DecodedLen(s string) int {
n := base64.StdEncoding.DecodedLen(len(s))
switch {
case strings.HasSuffix(s, "=="):
return n - 2
case strings.HasSuffix(s, "="):
return n - 1
}
return n
}

// asFilePart extracts a *FilePart from an A2A Part, handling both value and
// pointer types.
func asFilePart(part a2atype.Part) *a2atype.FilePart {
switch p := part.(type) {
case *a2atype.FilePart:
return p
case a2atype.FilePart:
return &p
}
return nil
}

// emitArtifacts loads each artifact named in delta from the artifact service
// and emits it as an A2A artifact event carrying a FilePart. Load/convert
// failures are logged and skipped so the turn continues (AC4).
func (e *KAgentExecutor) emitArtifacts(
ctx context.Context,
reqCtx *a2asrv.RequestContext,
queue eventqueue.Queue,
userID string,
sessionID string,
delta map[string]int64,
eventMeta map[string]any,
) {
svc := e.runnerConfig.ArtifactService
if svc == nil {
return
}

for name, version := range delta {
resp, err := svc.Load(ctx, &adkartifact.LoadRequest{
AppName: e.appName,
UserID: userID,
SessionID: sessionID,
FileName: name,
Version: version,
})
if err != nil {
e.logger.Error(err, "failed to load saved artifact", "name", name, "version", version)
continue
}
if resp == nil || resp.Part == nil {
e.logger.V(1).Info("artifact load returned no part", "name", name, "version", version)
continue
}

part := resp.Part
// Carry the filename so the converted FilePart has a Name.
if part.InlineData != nil && part.InlineData.DisplayName == "" {
part.InlineData.DisplayName = name
}

a2aPart, err := adka2a.ToA2APart(part, nil)
if err != nil {
e.logger.Error(err, "failed to convert artifact to A2A part", "name", name, "version", version)
continue
}

artifactEvent := a2atype.NewArtifactEvent(reqCtx, a2aPart)
artifactEvent.LastChunk = true
artifactEvent.Metadata = maps.Clone(eventMeta)
artifactEvent.Metadata[adka2a.ToA2AMetaKey("artifact_name")] = name
artifactEvent.Metadata[adka2a.ToA2AMetaKey("artifact_version")] = version
if part.InlineData != nil {
artifactEvent.Metadata[adka2a.ToA2AMetaKey("mime_type")] = part.InlineData.MIMEType
}

if err := queue.Write(ctx, artifactEvent); err != nil {
e.logger.Error(err, "failed to write artifact event", "name", name, "version", version)
}
}
}
Loading
Loading