GitHub - kuba-04/knowledge-scanner: scan htmls and others without sending any data outside

About

In short

Scan htmls and other local files without sending any data outside.

Overview

This is a simple LLM-powered document search system. It uses Nomic embeddings and Qdrant vector database to store and search documents. Then it uses Ollama to generate answers. You can search through html documents.

Environment Variables

QDRANT_URL=http://localhost:6334 EMBEDDINGS_API_URL=http://localhost:11434/api/embeddings COUCHDB_URL=http://admin:password@localhost:5984

if you run those services in docker, you might use the same variables. Put them in .env file.

Architecture

The system works in several stages:

Document Processing: Documents are processed and stored in CouchDB
Embedding Generation: Nomic API generates embeddings for documents
Vector Storage: Embeddings are stored in Qdrant with metadata
Question Answering:
- Generates embedding for the question
- Finds similar documents using vector search
- Extracts relevant context
- Uses LLM to generate precise answers
- Falls back to full document search in blocks if needed

Running

RUST_LOG="info" cargo run --release

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

In short

Overview

Environment Variables

Architecture

Running

License

About

Releases

Packages

Languages

kuba-04/knowledge-scanner

Folders and files

Latest commit

History

Repository files navigation

About

In short

Overview

Environment Variables

Architecture

Running

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages