kno-sdk

A Python library for cloning, indexing, and semantically searching Git repositories using embeddings (OpenAI or SBERT) and Chroma — plus a high-level agent_query for autonomous code agent.

🚀 Features

Clone or update any Git repository with a single call
Extract semantic code chunks via Tree-Sitter grammars (functions, classes, methods, etc.)
Fallback to line-based chunking for unsupported languages or large files
Embed code or text with your choice of:
- OpenAI's text-embedding-ada-002 via OpenAIEmbeddings
- Local SBERT model (e.g. microsoft/graphcodebert-base) via SBERTEmbeddings
Persist vector store in a .kno/ folder using Chroma
Auto-commit & push the embedding database back to your repo
Fast similarity search over indexed code chunks
Autonomous agent for code analysis via agent_query()

📦 Installation

pip install kno-sdk

🏁 Quickstart

from kno_sdk import clone_and_index, search, EmbeddingMethod

# 1. Clone (or pull) and index a repository
repo_index = clone_and_index(
    repo_url="https://github.com/SyedGhazanferAnwar/NestJs-MovieApp",
    branch="master",
    embedding=EmbeddingMethod.SBERT,      # or EmbeddingMethod.OPENAI
    cloned_repo_base_dir="repos"                      # where to clone locally
)
print("Indexed at:", repo_index.path)
print("Directory snapshot:\n", repo_index.digest)

# 2. Perform semantic search
results = search(
    repo_url="https://github.com/SyedGhazanferAnwar/NestJs-MovieApp",
    branch="master",
    embedding=EmbeddingMethod.SBERT,
    cloned_repo_base_dir="repos",
    query="NestFactory",
    k=5
)
for i, chunk in enumerate(results, 1):
    print(f"--- Result #{i} ---\n{chunk}\n")

# 3. Autonomous Code-Analysis Agent
from kno_sdk import agent_query, EmbeddingMethod, LLMProvider

# First create a repo index
repo_index = clone_and_index(
    repo_url="https://github.com/WebGoat/WebGoat",
    branch="main",
    embedding=EmbeddingMethod.SBERT,
    cloned_repo_base_dir="repos"
)

# Then use the index with agent_query
result = agent_query(
    repo_index=repo_index,
    llm_provider=LLMProvider.ANTHROPIC,
    llm_model="claude-3-haiku-20240307",
    llm_temperature=0.0,
    llm_max_tokens=4096,
    llm_system_prompt="You are a senior code-analysis agent.",
    prompt="Find issues, bugs and vulnerabilities in this repo, and explain each with exact code locations.",
    MODEL_API_KEY="your_api_key_here"
)

print(result)

📖 API Reference

clone_and_index(...) → RepoIndex

Clone (or pull) a repository, embed its files, and persist a Chroma database in .kno folder. Finally, commit & push the .kno/ folder back to the original repo.

def clone_and_index(
    repo_url: str,
    branch: str = "main",
    embedding: EmbeddingMethod = EmbeddingMethod.SBERT,
    cloned_repo_base_dir: str = "."
) -> RepoIndex

repo_url — Git HTTPS/SSH URL
branch — branch to clone or update (default: main)
embedding — EmbeddingMethod.OPENAI or EmbeddingMethod.SBERT
base_dir — local directory to clone into (default: current working dir)

Returns a RepoIndex object with:

path: pathlib.Path — local clone directory
digest: str — textual snapshot of the directory tree
vector_store: Chroma — the Chroma collection instance

search(...) → List[str]

Run a similarity search on an existing .kno/ Chroma database.

def search(
    repo_url: str,
    branch: str = "main",
    embedding: EmbeddingMethod = EmbeddingMethod.SBERT,
    query: str = "",
    k: int = 8,
    cloned_repo_base_dir: str = "."
) -> List[str]

query — your natural-language or code search prompt
k — number of top results to return

Returns a list of the top-k matching code/text chunks.

agent_query(...) → str

High-level agent that clones, indexes, and then iteratively uses tools (search_code, read_file, etc.) plus an LLM to fulfill your prompt.

def agent_query(
    repo_url: str,
    branch: str = "main",
    embedding: EmbeddingMethod = EmbeddingMethod.SBERT,
    cloned_repo_base_dir: str = str(Path.cwd()),
    llm_provider: LLMProvider = LLMProvider.ANTHROPIC,
    llm_model: str = "claude-3-haiku-20240307",
    llm_temperature: float = 0.0,
    llm_max_tokens: int = 4096,
    llm_system_prompt: str = "",
    prompt: str = "",
    MODEL_API_KEY: str = "",
) -> str

repo_url, branch, embedding, base_dir — same as above
llm_provider — LLMProvider.OPENAI or LLMProvider.ANTHROPIC
llm_model — model name (e.g. "gpt-4" or "claude-3-haiku-20240307")
llm_temperature, llm_max_tokens — sampling params
llm_system_prompt — initial system message for the agent
prompt — your user query/task description
MODEL_API_KEY — sets OPENAI_API_KEY or ANTHROPIC_API_KEY

Returns the agent's Final Answer as a string.

EmbeddingMethod

class EmbeddingMethod(str, Enum):
    OPENAI = "OpenAIEmbeddings"
    SBERT  = "SBERTEmbeddings"

Choose between OpenAI's hosted embeddings or a local SBERT model.

🔍 How It Works

Clone or PullUses GitPython to clone depth-1 or pull the latest changes.
Directory SnapshotBuilds a small "digest" of files/folders (up to ~1 K tokens).
Chunk Extraction
- Tree-sitter for language-aware extraction of functions, classes, etc.
- Fallback to fixed-size line chunks for unknown languages or large files.
Embedding
- Streams each chunk into your chosen embedding backend.
- Respects a 16 000-token cap per chunk.
Vector Store
- Persists embeddings in a namespaced Chroma collection under .kno/.
- Only indexes files once (skips already-populated collections).
Commit & Push
- Automatically stages, commits, and pushes .kno/ back to your remote.
Autonomous Agent

RAG prompt
Tool calls (search_code, read_file, …)
Iterative LLM planning & execution
Stops on "Final Answer:" or max iterations

⚙️ Configuration

Skip directories: .git, node_modules, build, dist, target, .vscode, .kno
Skip files: package-lock.json, yarn.lock, .prettierignore
Binary extensions: common image, audio, video, archive, font, and binary file types

All of the above can be modified by forking the source and adjusting the skip_dirs, skip_files, and BINARY_EXTS sets.

🔧 Dependencies

🤝 Contributing

Fork this repo
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Please run pytest before submitting and follow the existing code style.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
src		src
.gitignore		.gitignore
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test3.py		test3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

kno-sdk

🚀 Features

📦 Installation

🏁 Quickstart

📖 API Reference

clone_and_index(...) → RepoIndex

search(...) → List[str]

agent_query(...) → str

EmbeddingMethod

🔍 How It Works

⚙️ Configuration

🔧 Dependencies

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Prometheus-Swarm/kno-sdk

Folders and files

Latest commit

History

Repository files navigation

kno-sdk

🚀 Features

📦 Installation

🏁 Quickstart

📖 API Reference

clone_and_index(...) → RepoIndex

search(...) → List[str]

agent_query(...) → str

EmbeddingMethod

🔍 How It Works

⚙️ Configuration

🔧 Dependencies

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages