An autonomous data analysis agent that leverages Google Gemini models to perform scraping, data processing, and visualization. This agent can process both uploaded datasets (CSV, Excel, Parquet, JSON, Images) and live web data.
- Robust LLM Fallback: Implements a custom
LLMWithFallbackclass that rotates through up to 10 Gemini API keys and multiple model versions (gemini-2.5-pro,flash, etc.) to handle rate limits and quotas. - Autonomous Agent: Uses LangChain's tool-calling agent to decide when to scrape data or process existing dataframes.
- Sandboxed Execution: Generates Python code dynamically and executes it in a controlled temporary environment to generate answers and visualizations.
- Multi-Format Scraper: Automatically detects and parses CSV, Excel, Parquet, JSON, and HTML tables from provided URLs.
- Image Processing: Capable of handling image uploads using PIL.
- System Diagnostics: Built-in
/summaryendpoint to monitor network health, API key validity, and system resources. - Optimized Visualizations: Includes a
plot_to_base64helper that ensures generated charts are optimized and compressed (under 100KB) for API responses.
- Backend: FastAPI, Uvicorn
- LLM Framework: LangChain, Google Generative AI
- Data Analysis: Pandas, NumPy, Matplotlib, Seaborn, NetworkX
- Diagnostics: Psutil, HTTPX
- Python 3.9+
- Google AI (Gemini) API Keys
-
Clone the repository:
git clone https://github.com/Shi-pra-19/agent.git cd agent -
Install dependencies:
pip install -r requirements.txt
-
Configure Environment Variables: Create a
.envfile in the root directory and add your Gemini API keys:gemini_api_1=your_api_key_1 gemini_api_2=your_api_key_2 # Supports up to gemini_api_10 GOOGLE_MODEL=gemini-2.5-pro PORT=8000
python app.pyThe server will start at http://0.0.0.0:8000.
| Endpoint | Method | Description |
|---|---|---|
/ |
GET |
Serves the web frontend (index.html). |
/api |
POST |
The primary analysis endpoint. Accepts questions_file (.txt) and an optional data_file. |
/summary |
GET |
Returns system diagnostics and LLM health checks. |
To use the API, send a multipart/form-data request:
questions_file: A.txtfile containing natural language questions.data_file: (Optional) A CSV, XLSX, or image file for analysis.