Skip to content

feat(KnowledgeBase): add document viewer, download, metadata, and sorting#721

Open
bravosierra99 wants to merge 1 commit into
Crosstalk-Solutions:devfrom
bravosierra99:feat/kb-document-viewer
Open

feat(KnowledgeBase): add document viewer, download, metadata, and sorting#721
bravosierra99 wants to merge 1 commit into
Crosstalk-Solutions:devfrom
bravosierra99:feat/kb-document-viewer

Conversation

@bravosierra99

@bravosierra99 bravosierra99 commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Adds view, download, and metadata improvements to the Knowledge Base stored files table.

Changes

Backend (rag_controller.ts, rag_service.ts, rag.ts, routes.ts, types/rag.ts)

  • getStoredFiles now returns StoredFile[] with fileName, size, uploadedAt, and isUserUpload instead of string[]
  • New GET /api/rag/files/content — serves text file contents for inline viewing; restricted to text extensions, path-traversal safe (app directory only)
  • New GET /api/rag/files/download — serves file as attachment; same path guard

Frontend (KnowledgeBaseModal.tsx, api.ts)

  • Upload date and file size columns in the stored files table, both sortable by clicking the column header
  • Eye icon opens an inline content viewer for text-based files (md, txt, csv, json, yaml, etc.)
  • Download button on all files
  • System docs (README, /docs/*) get view + download but no delete button — they're read-only in the UI
  • User uploads get view + download + delete as before

Files changed

  • admin/app/controllers/rag_controller.ts
  • admin/app/services/rag_service.ts
  • admin/app/validators/rag.ts
  • admin/start/routes.ts
  • admin/types/rag.ts
  • admin/inertia/components/chat/KnowledgeBaseModal.tsx
  • admin/inertia/lib/api.ts

@chriscrosstalk

Copy link
Copy Markdown
Collaborator

Thanks @bravosierra99 — the feature itself (view + download + metadata columns for KB files) is solid and genuinely useful. Good security posture too (path-traversal guards, restricting viewable types to text, no actions on system-owned docs).

One problem though: your branch is significantly behind current dev, and the PR is pulling forward ~75 files worth of changes that have already shipped in v1.31.0 and v1.31.1. The real feature is probably 300–500 lines across ~7 files, but the diff currently reads as 3,318 additions across 85 files including the PNG-to-WEBP migration (#575), Kiwix library-mode migration (#622), Ollama overhaul (#645/#649/#744), the RAG pipeline fixes (#745), .tmp download staging (#448), the Community Add-Ons docs page (#753), workflow updates, a CONTRIBUTING rewrite — all of which are already on dev.

GitHub is marking it DIRTY/CONFLICTING as a result, and there's no practical way to review this until the noise is separated from the signal.

Could you rebase onto current dev and narrow the PR to just the feature? The files I'd expect to see in a clean version:

  • admin/app/controllers/rag_controller.ts (the two new endpoints)
  • admin/app/services/rag_service.ts (getStoredFiles() returning StoredFile[])
  • admin/app/validators/rag.ts (new validators)
  • admin/start/routes.ts (route registration)
  • admin/types/rag.ts (StoredFile type)
  • admin/inertia/components/chat/KnowledgeBaseModal.tsx (UI)
  • admin/inertia/lib/api.ts (client helpers)

Cleanest path is usually to create a new branch off current origin/dev and cherry-pick just your feature commits onto it, then force-push over this branch. If you hit a wall, say the word and I'll walk through it with you.

Appreciate the work — want to get this one landed, just need the diff manageable first.

@bravosierra99 bravosierra99 force-pushed the feat/kb-document-viewer branch from f2bc3cb to 226c130 Compare April 24, 2026 16:26
@bravosierra99 bravosierra99 changed the title feat(KnowledgeBase): add document view and download with file metadata feat(KnowledgeBase): add document viewer, download, metadata, and sorting Apr 24, 2026
@bravosierra99 bravosierra99 force-pushed the feat/kb-document-viewer branch from 226c130 to 146af92 Compare April 28, 2026 20:40
@bravosierra99

Copy link
Copy Markdown
Contributor Author

I rebased again, and pushed some updates... I think my testing wasn't quite on exactly what was in the PR. It should be up to date from now and I did some quick sanity checking of the functionality. There should be no remaining conflicts. Let me know if this works for you @chriscrosstalk

@chriscrosstalk chriscrosstalk added the Next Release This PR is staged for our next release. label Apr 30, 2026

@jakeaturner jakeaturner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bravosierra99 great stuff here - just a couple quick changes and this will be ready to merge. Thanks for the contribution!

return fileName.split('.').at(-1)?.toLowerCase() ?? ''
}

function formatBytes(bytes: number | null): string {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing formatBytes utility available from ~/lib/util -- import from there for DRY

Comment thread admin/app/services/rag_service.ts Outdated
* Only serves files within the uploads directory (path-traversal safe).
* Returns null if the file is outside the uploads dir, doesn't exist, or is binary.
*/
public async readFileContent(source: string): Promise<{ content: string; extension: string; fileName: string } | null> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good baseline for path traversal protection but technically still allows reading from anywhere in the app directory (process.cwd()). We should follow pattern from docs_service or even similar check in rag_service for tighter scoping and consistency

@bravosierra99 bravosierra99 force-pushed the feat/kb-document-viewer branch from 146af92 to a84fd7e Compare May 5, 2026 00:15
@bravosierra99

Copy link
Copy Markdown
Contributor Author

updates made.

…ting

Rebuilt on top of dev's RFC Crosstalk-Solutions#883 state-machine UI rather than the now-defunct
StoredFile shape:

- Extend StoredFileInfo with fileName/size/uploadedAt/isUserUpload
- Populate metadata from on-disk stats in RagService.getStoredFiles
- Add fileSourceSchema validator + getFileContent/downloadFile endpoints
  scoped to the uploads directory only (tighter than the original PR — matches
  docs_service traversal pattern)
- KnowledgeBaseModal: sortable Size and Uploaded columns; View/Download
  buttons on upload-bucket rows; new FileViewerModal for in-browser text
  preview. Bucket grouping preserved — sort applies within each bucket.
- Use formatBytes from ~/lib/util rather than redefining
@bravosierra99 bravosierra99 force-pushed the feat/kb-document-viewer branch from a84fd7e to 51ea98e Compare May 25, 2026 12:06
@bravosierra99

Copy link
Copy Markdown
Contributor Author

@jakeaturner heads up — I force-pushed this branch with a full rebuild on current dev rather than a
▎ fix-and-rebase. RFC #883 reworked the KB ingest tracking around StoredFileInfo + state machine, so
▎ the old commits didn't apply cleanly; rebuilding was simpler than reconciling line-by-line.

▎ Both of your prior comments are addressed in the rebuild:

▎ - formatBytes now imported from ~/lib/util instead of a local copy.
▎ - Path-traversal guard rewritten to match the docs_service pattern — scope is also tighter than
▎ before: only files under RagService.UPLOADS_STORAGE_PATH are viewable/downloadable. ZIM files and
▎ admin docs are read-only in the UI.

▎ Other changes vs. the version you reviewed:
▎ - View/Download buttons gated on isUserUpload && size != null.
▎ - Sortable columns sort within each bucket so the zim/upload/admin_docs grouping from dev is
▎ preserved.
▎ - 7 new unit tests in kb_file_grouping.spec.ts covering the sort + bucket-order invariants.

▎ Built and tested locally against current dev; CI green. Ready for another look when you have a
▎ minute.

@chriscrosstalk

Copy link
Copy Markdown
Collaborator

Thanks @bravosierra99 — the file viewer/download piece here is still very much wanted (it's the core of #952). Heads up though: this branch predates the Knowledge Base ingestion rework that shipped in v1.32.0, and there's significant overlap. On current dev, getStoredFiles() already returns a rich StoredFileInfo[] (with per-file ingest state) instead of string[], and KnowledgeBaseModal.tsx was substantially reworked, so this no longer merges cleanly.

Could you rebase onto current dev? The valuable, non-overlapping part is the new /files/content and /files/download endpoints and the inline viewer — those slot in nicely on top of the new StoredFileInfo[] shape. Happy to review once it's rebased. Thanks for your patience on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Next Release This PR is staged for our next release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants