Skip to content

scan htmls and others without sending any data outside

Notifications You must be signed in to change notification settings

kuba-04/knowledge-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

In short

Scan htmls and other local files without sending any data outside.

Overview

This is a simple LLM-powered document search system. It uses Nomic embeddings and Qdrant vector database to store and search documents. Then it uses Ollama to generate answers. You can search through html documents.

Environment Variables

QDRANT_URL=http://localhost:6334 EMBEDDINGS_API_URL=http://localhost:11434/api/embeddings COUCHDB_URL=http://admin:password@localhost:5984

if you run those services in docker, you might use the same variables. Put them in .env file.

Architecture

The system works in several stages:

  1. Document Processing: Documents are processed and stored in CouchDB
  2. Embedding Generation: Nomic API generates embeddings for documents
  3. Vector Storage: Embeddings are stored in Qdrant with metadata
  4. Question Answering:
    • Generates embedding for the question
    • Finds similar documents using vector search
    • Extracts relevant context
    • Uses LLM to generate precise answers
    • Falls back to full document search in blocks if needed

Running

RUST_LOG="info" cargo run --release

License

MIT License

About

scan htmls and others without sending any data outside

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages