Introspect is a service that does data-focused deep research for structured data. It understands your structured data (databases or CSV/Excel files), unstructured data (PDFs), and can query the web to get additional context.
- Set up environment variables:
# Create a .env file in your root folder
# You need all 3 - not just one
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY"
GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
- Start all services using Docker Compose:
docker compose up --build
- Access the application in your browser:
- Main application: http://localhost:80
- Standalone Backend API: http://localhost:1235
We use a simple AI agent with tool use. An LLM attempts to answer a user question with 3 tools – text_to_sql
, web_search
, and pdf_with_citations
.
The model then recursively asks questions using one of these tools until it is satisfied that it has enough context to answer the users question. By default, we use o3-mini
for text to SQL, gemini-2.0-flash
for web search, and claude-4-sonnet
for both PDF analysis and orchestration.

For development workflows and more detailed instructions, see the README files in the /backend
and /frontend
directories.
Defog supports most database connectors including PostgreSQL, MySQL, SQLite, BigQuery, Redshift, Snowflake, and Databricks – and also includes support for CSV and Excel files.
- Run all tests:
docker exec introspect-backend pytest
- Run single test:
docker exec introspect-backend pytest tests/test_file.py::test_function -v
- Tests use the
agents-postgres
service for database operations - Create admin user:
docker exec introspect-backend python create_admin_user.py
- Development server:
cd frontend && npm run dev
- Build production:
cd frontend && npm run build
- Export static site:
cd frontend && npm run export
- Run frontend tests:
cd frontend && npx playwright test
- Lint (Prettier):
cd frontend && npm run lint
- It is highly recommended to run this only as a Docker image, for security purposes
- This repo does involve code where LLM generated code (or custom human generated code) can be autonomously executed. While we have implemented some safeguards to prevent abuse, safety is not guaranteed outside a docker environment.
This repo is maintained by Defog.ai
- Create Docs
- Let users choose what model they want for which task from the `.env. file
- Docs and examples for how to add custom tools
- Docs and examples for how to integrate with unstructured data sources with search, like Google Drive and OneDrive