Financial Agent - Vietnamese Stock Market Assistant 🇻🇳📈

Agent AI tư vấn đầu tư chứng khoán Việt Nam - Hệ thống phân tích thị trường chứng khoán thông minh sử dụng LangGraph, VnStock API và LLM.

📚 Table of Contents

Quick Start
System Requirements
Installation Guide
Database Setup
Environment Configuration
Running the Application
Features
Architecture
Troubleshooting
Development

⚡ Quick Start

Minimum Setup (5 minutes)

# 1. Clone and navigate to project
git clone <repo-url>
cd financial_agent_fork

# 2. Create Python virtual environment
python -m venv venv
.\venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

# 3. Install dependencies
pip install -r requirements.txt

# 4. Copy environment template
cp .env.example .env
# Edit .env with your settings

# 5. Setup database (PostgreSQL required)
# See Database Setup section below

# 6. Run the API server
python main.py

Visit http://localhost:8000/docs to test the API.

🖥️ System Requirements

Minimum Requirements

Python: 3.9 or higher
RAM: 8GB minimum (16GB recommended)
Disk Space: 5GB free space
OS: Windows 10+, macOS 10.14+, or Linux

External Services Required

PostgreSQL Database (v12 or higher)
- Local installation or cloud service (AWS RDS, Azure Database, etc.)
- At least 2GB storage recommended
LLM Provider (choose one)
- Google Gemini: Free API key from Google AI Studio
- Ollama: Local LLM server (free, no API key needed)
Qdrant Vector Database (choose one)
- Qdrant Cloud: Free tier available at cloud.qdrant.io
- Qdrant Local: Docker container or local installation
Optional: Tesseract OCR
- Required only for processing scanned PDF documents
- Installation Guide

📦 Installation Guide

Step 1: Clone Repository

git clone <repository-url>
cd financial_agent_fork

Step 2: Python Environment Setup

# Windows
python -m venv venv
.\venv\Scripts\activate

# macOS/Linux
python -m venv venv
source venv/bin/activate

Verify Python version:

python --version  # Should be 3.9 or higher

Step 3: Install Dependencies

# Upgrade pip
python -m pip install --upgrade pip

# Install all required packages
pip install -r requirements.txt

Installation may take 5-10 minutes due to native dependencies

Optional: Install Tesseract OCR

For processing scanned PDFs and images:

Windows:

# Download installer from:
# https://github.com/UB-Mannheim/tesseract/wiki/Downloads
# Then run setup and add to your .env:
TESSERACT_PATH=C:\Program Files\Tesseract-OCR\tesseract.exe

macOS:

brew install tesseract

Linux (Ubuntu/Debian):

sudo apt-get install tesseract-ocr

Step 4: Verify Installation

python -c "import langchain; print('✓ LangChain installed')"
python -c "import fastapi; print('✓ FastAPI installed')"
python -c "import vnstock; print('✓ VnStock installed')"
python -c "import qdrant_client; print('✓ Qdrant client installed')"

🗄️ Database Setup

This project uses PostgreSQL as the primary relational database, with Qdrant as the vector database for RAG features.

PostgreSQL Setup

Option 1: Local Installation (Recommended for Development)

Windows:

Download PostgreSQL from postgresql.org
Run the installer and follow the installation wizard
Remember the superuser password
Verify installation:
```
psql --version
```
Connect to PostgreSQL:
```
psql -U postgres
```

macOS:

# Using Homebrew
brew install postgresql@15

# Start PostgreSQL service
brew services start postgresql@15

# Connect to PostgreSQL
psql postgres

Linux (Ubuntu/Debian):

# Update package list
sudo apt-get update

# Install PostgreSQL
sudo apt-get install postgresql postgresql-contrib

# Start PostgreSQL service
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Connect to PostgreSQL
sudo -u postgres psql

Option 2: Docker Container (Recommended for Production)

# Run PostgreSQL container
docker run --name financial-db \
  -e POSTGRES_USER=financial_user \
  -e POSTGRES_PASSWORD=financial_password \
  -e POSTGRES_DB=financial_agent \
  -p 5432:5432 \
  -v postgres_data:/var/lib/postgresql/data \
  -d postgres:15

# Verify container is running
docker ps

Create Database and User

# Connect to PostgreSQL
psql -U postgres

# Inside psql shell:
CREATE USER financial_user WITH PASSWORD 'financial_password';
CREATE DATABASE financial_agent OWNER financial_user;

# Grant privileges
GRANT ALL PRIVILEGES ON DATABASE financial_agent TO financial_user;

# Connect to the new database
\c financial_agent

# Verify connection
\dt

Connection String:

postgresql://financial_user:financial_password@localhost:5432/financial_agent

Option 3: Cloud PostgreSQL

AWS RDS:

Go to AWS RDS Console
Click "Create Database"
Select PostgreSQL engine
Configure settings and note the endpoint

Add connection string to .env:

DATABASE_URL=postgresql://username:password@endpoint:5432/financial_agent

Azure Database for PostgreSQL:

Go to Azure Portal
Create new "Azure Database for PostgreSQL"
Configure and get connection details
Add to .env

Supabase (PostgreSQL as a Service):

Sign up at supabase.com
Create new project
Copy connection string from project settings

Add to .env:

DATABASE_URL=postgresql://[user]:[password]@[host]:[port]/[database]

Database Initialization

After PostgreSQL is ready, initialize the application database:

# Navigate to project root
cd financial_agent_fork

# Run migrations using Alembic
alembic upgrade head

Expected Output:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade -> xxxxx, Initial migration

Verify Database Setup

# Connect to database
psql -U financial_user -d financial_agent -h localhost

# List all tables
\dt

# Expected tables:
# - users
# - chat_sessions
# - chat_messages
# - audit_logs
# - document_uploads

# Exit psql
\q

Qdrant Vector Database Setup

Qdrant stores vector embeddings for RAG (Retrieval Augmented Generation) features.

Option 1: Qdrant Cloud (Recommended for Production)

Sign Up: Go to cloud.qdrant.io
Create Cluster:
- Click "Create Cluster"
- Select region (choose closest to your location)
- Name: financial-agent or similar
- Free tier available for testing

Get Credentials:

Copy the API Key and Cluster URL

Add to .env:

QDRANT_MODE=cloud
QDRANT_CLOUD_URL=https://your-cluster.qdrant.io
QDRANT_CLOUD_API_KEY=your-api-key

Verify Connection:

python -c "from qdrant_client import QdrantClient; c = QdrantClient(url='YOUR_URL', api_key='YOUR_KEY'); print('✓ Qdrant connected')"

Option 2: Docker Container (Development)

# Run Qdrant container
docker run --name qdrant \
  -p 6333:6333 \
  -p 6334:6334 \
  -v qdrant_storage:/qdrant/storage \
  -d qdrant/qdrant

# Verify container
docker ps

# Check web interface
# Visit http://localhost:6333/dashboard

Add to .env:

QDRANT_MODE=local
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=

Option 3: Local Installation (Development)

# Download and run Qdrant locally
# Visit https://qdrant.tech/documentation/quick-start/ for platform-specific instructions

# macOS:
brew install qdrant

# Linux:
docker run -p 6333:6333 qdrant/qdrant

🔧 Environment Configuration

Create .env File

# Copy the template
cp .env.example .env

Complete Configuration

Edit .env with all required values:

# ==========================================
# DATABASE CONFIGURATION
# ==========================================
DATABASE_URL=postgresql://financial_user:financial_password@localhost:5432/financial_agent
JWT_SECRET_KEY=your-super-secret-key-change-this-in-production
ADMIN_USERNAME=admin
ADMIN_PASSWORD=your_secure_password_here

# ==========================================
# LLM PROVIDER CONFIGURATION
# ==========================================
LLM_PROVIDER=gemini          # Options: 'gemini' or 'ollama'
GOOGLE_API_KEY=your_api_key  # Required if using Gemini
LLM_MODEL=gemini-2.5-flash   # Google Gemini model

# OR for Ollama:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen3:8b

# LLM Settings
LLM_TEMPERATURE=0.3
LLM_MAX_TOKENS=2048

# ==========================================
# QDRANT VECTOR DATABASE
# ==========================================
QDRANT_MODE=cloud              # 'cloud' or 'local'

# Cloud Settings:
QDRANT_CLOUD_URL=https://your-instance.qdrant.io
QDRANT_CLOUD_API_KEY=your-qdrant-api-key

# OR Local Settings:
# QDRANT_URL=http://localhost:6333
# QDRANT_API_KEY=

# Timeout settings
QDRANT_TIMEOUT_SECONDS=120
QDRANT_RETRY_ATTEMPTS=3
QDRANT_RETRY_DELAY_SECONDS=2.0

# ==========================================
# EMBEDDING CONFIGURATION
# ==========================================
EMBEDDING_MODEL_FINANCIAL=fin-e5-small
EMBEDDING_MODEL_GENERAL=sentence-transformers/all-MiniLM-L6-v2
CHUNK_SIZE_TOKENS=512
CHUNK_OVERLAP_TOKENS=50

# ==========================================
# RAG CONFIGURATION
# ==========================================
ENABLE_RAG=True
RAG_PRIORITY_MODE=personal-first
RAG_SIMILARITY_THRESHOLD=0.1
RAG_TOP_K_RESULTS=20
RAG_MIN_RELEVANCE=0.3
RAG_MAX_DOCUMENTS=5

# ==========================================
# FEATURE FLAGS
# ==========================================
DEBUG=False
ENABLE_TOOLS=True
ENABLE_SUMMARIZATION=True
ENABLE_QUERY_REWRITING=True

# ==========================================
# API CONFIGURATION
# ==========================================
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:5173,http://localhost:3000,http://localhost:8000

# ==========================================
# RATE LIMITING
# ==========================================
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_PERIOD_MINUTES=60

Validate Configuration

python -c "from src.core.config import settings; print('✓ Configuration loaded'); print(f'DB: {settings.DATABASE_URL}'); print(f'LLM: {settings.LLM_PROVIDER}')"

✨ Tính năng

📊 Thông tin doanh nghiệp

✅ Thông tin công ty: Tên công ty, ngành nghề, vốn điều lệ, lịch sử
✅ Cổ đông lớn: Top cổ đông với tỷ lệ sở hữu chi tiết
✅ Ban lãnh đạo: Danh sách lãnh đạo và tỷ lệ sở hữu
✅ Công ty con: Công ty con/liên kết với tỷ lệ nắm giữ
✅ Sự kiện công ty: Chia cổ tức, ĐHCĐ, tăng vốn...

📈 Dữ liệu thị trường

✅ Giá lịch sử (OHLCV): Open, High, Low, Close, Volume
- Theo ngày cụ thể: start_date và end_date
- Theo khoảng thời gian: 3M, 6M, 1Y
- Hiển thị chi tiết dưới dạng bảng

📉 Phân tích kỹ thuật

✅ SMA (Simple Moving Average): Phân tích xu hướng giá
- Tính SMA với window tùy chỉnh (SMA-9, SMA-20, SMA-50...)
- So sánh giá với SMA, xác định xu hướng
- Hiển thị bảng chi tiết theo từng ngày
✅ RSI (Relative Strength Index): Đánh giá quá mua/quá bán
- RSI > 70: Quá mua (cảnh báo giảm)
- RSI < 30: Quá bán (cơ hội tăng)
- Hiển thị bảng chi tiết với trạng thái

📄 Xử lý Tài liệu

✅ Phân tích Báo cáo Tài chính (Hình ảnh):
- OCR từ ảnh PDF/PNG/JPG
- Phân loại báo cáo: BCDN, KQKD, Dòng tiền, Chỉ số
- Trích xuất dữ liệu + tạo bảng Markdown
- Phân tích Gemini AI chi tiết
✅ Xử lý File PDF:
- Trích xuất text từ PDF native
- OCR tự động cho PDF scanned
- Bảng và dữ liệu có cấu trúc
- Phân tích thông minh với Gemini
✅ Phân tích File Excel:
- Chuyển đổi thành bảng Markdown
- Hỗ trợ nhiều sheet
- Định dạng số chuẩn Việt Nam
- Phân tích tài chính chi tiết

🎯 Định dạng trả lời

📋 Bảng Markdown với dữ liệu chi tiết, dễ đọc
📊 Thống kê tổng quan sau mỗi bảng
💡 Phân tích và kết luận chuyên nghiệp

🏗️ Kiến trúc hệ thống

Tech Stack:

Backend: FastAPI (REST API)
Agent Framework: LangChain + LangGraph (ReAct Pattern)
LLM Providers:
- ☁️ Google Gemini (Cloud) - cho phân tích tài chính & OCR
- 🖥️ Ollama (Local) - cho chat & phân tích
Data Source: VnStock3 API (Free)
Technical Analysis: TA-Lib
Document Processing:
- pytesseract + OpenCV (OCR)
- pdfplumber (PDF text extraction)
- pdf2image (PDF to image conversion)
Excel Processing: openpyxl + pandas
Frontend: React + Vite + TailwindCSS

Cấu trúc thư mục:

financial_agent/
├── src/
│   ├── agent/          # LangGraph Agent
│   │   ├── financial_agent.py
│   │   ├── state.py
│   │   └── prompts/
│   │       ├── system_prompt.txt
│   │       ├── financial_report_prompt.txt
│   │       └── excel_analysis_prompt.txt
│   ├── tools/          # 11+ Tools
│   │   ├── vnstock_tools.py        # 5 VnStock tools
│   │   ├── technical_tools.py      # 2 Technical analysis tools
│   │   ├── financial_report_tools.py  # Financial report analysis (OCR + Gemini)
│   │   ├── pdf_tools.py            # PDF document processing
│   │   └── excel_tools.py          # Excel analysis tools
│   ├── llm/            # LLM Factory
│   │   ├── llm_factory.py
│   │   └── config.py
│   └── api/            # FastAPI
│       └── app.py
├── frontend/           # React UI
│   ├── src/
│   │   ├── components/
│   │   └── App.jsx
│   └── package.json
├── tests/              # Unit Tests
├── test_auto.py        # Automated Test Script
└── requirements.txt

🚀 Cài đặt và Chạy

Bước 1: Clone và cài đặt Dependencies

# Clone hoặc cd vào thư mục
cd financial_agent

# Tạo virtual environment
python -m venv venv

# Kích hoạt venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate

# Cài đặt dependencies
pip install -r requirements.txt

Bước 2: Cấu hình LLM Provider

Bạn có thể chọn 1 trong 2 provider:

Option 1: Google Gemini (Recommended) ☁️

Ưu điểm: Nhanh, mạnh mẽ, không cần GPU

Lấy API key miễn phí tại: https://aistudio.google.com/apikey
Cập nhật file .env:

# Google Gemini
GOOGLE_API_KEY=your_api_key_here
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-flash

Option 2: Ollama (Local) 🖥️

Ưu điểm: Chạy offline, bảo mật, miễn phí hoàn toàn

Yêu cầu: RAM >= 8GB (khuyến nghị 16GB), GPU có VRAM >= 4GB (tùy chọn)

Bước 1: Tải và cài đặt Ollama

Windows:
1. Tải tại: https://ollama.com/download/windows
2. Chạy file OllamaSetup.exe
3. Cài đặt theo hướng dẫn (Next → Next → Install)
macOS:
```
brew install ollama
```

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Bước 2: Khởi động Ollama

# Chạy Ollama server (sẽ tự động chạy ở background trên Windows)
ollama serve

Bước 3: Pull model

Chọn 1 trong các model sau (theo cấu hình máy):

# Model nhỏ (RAM 4-8GB) - Tốc độ nhanh
ollama pull qwen3:8b

# Model trung bình (RAM 8-16GB) - Cân bằng
ollama pull llama3.1:8b
ollama pull qwen2.5:7b

# Model lớn (RAM 16GB+, GPU 8GB+) - Chất lượng cao
ollama pull qwen3:14b
ollama pull llama3.1:70b

Bước 4: Kiểm tra model đã cài

ollama list

Bước 5: Cập nhật .env

# Ollama Local
LLM_PROVIDER=ollama
OLLAMA_MODEL=qwen3:8b       # Thay bằng model bạn đã pull
OLLAMA_BASE_URL=http://localhost:11434

Lưu ý Ollama:

Model qwen3:8b (8B parameters) cần ~8GB RAM
Model llama3.1:8b (8B parameters) cần ~8GB RAM
Nếu gặp lỗi "out of memory", thử model nhỏ hơn hoặc chuyển sang Gemini
Kiểm tra Ollama đang chạy: ollama list

Bước 6: Cấu hình Tesseract OCR (cho phân tích báo cáo tài chính)

Tesseract được dùng để OCR hình ảnh báo cáo tài chính. Có thể bỏ qua nếu chỉ dùng Gemini Vision hoặc PDF native.

Windows:

Tải installer: https://github.com/UB-Mannheim/tesseract/wiki
Chạy tesseract-ocr-w64-setup-v5.x.exe
Cài đặt theo hướng dẫn (mặc định: C:\Program Files\Tesseract-OCR)
Cập nhật .env:

# Optional: Chỉ cần nếu install ở vị trí custom
TESSERACT_PATH=C:\Program Files\Tesseract-OCR\tesseract.exe

Linux (Ubuntu/Debian):

sudo apt-get install tesseract-ocr libtesseract-dev

macOS:

brew install tesseract

Kiểm tra cài đặt:

tesseract --version

🎮 Chạy ứng dụng

Backend API

# Activate venv (nếu chưa)
venv\Scripts\activate  # Windows
source venv/bin/activate  # Linux/Mac

# Chạy FastAPI server (sử dụng main.py để tự động load config)
python main.py

# Server chạy tại: http://localhost:8000

Frontend (React)

# Terminal mới, cd vào frontend
cd frontend

# Cài đặt dependencies (lần đầu)
npm install

# Chạy dev server
npm run dev

# Frontend chạy tại: http://localhost:5173

Test API bằng curl

# Test endpoint
curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d "{\"question\": \"Thông tin về VNM\"}"

🧪 Testing

Test tự động với file Excel

# Tạo file Excel mẫu với câu hỏi test
python create_sample_test.py

# Chạy test tự động (đảm bảo backend đang chạy)
python test_auto.py test_questions_sample.xlsx

# Kết quả sẽ được lưu trong test_results_[timestamp].xlsx

Chi tiết xem thư mục tests/ và ps_test/

📚 Sử dụng

Các câu hỏi mẫu:

Thông tin công ty:

- "Thông tin về công ty VNM"
- "VCB thuộc ngành gì?"

Cổ đông & Lãnh đạo:

- "Cổ đông lớn của VCB là ai?"
- "Ban lãnh đạo HPG gồm những ai?"
- "VNM có công ty con nào?"

Sự kiện:

- "Sự kiện gần đây của FPT"
- "VCB có chia cổ tức không?"

Dữ liệu giá:

- "Giá VCB 3 tháng gần nhất"
- "OHLCV của HPG từ đầu năm 2024"

Phân tích kỹ thuật:

- "Tính SMA-20 cho HPG"
- "Tính SMA-9 và SMA-20 của TCB từ đầu tháng 11"
- "RSI của VIC hiện tại"
- "HPG có quá mua không?"

Phân tích tổng hợp:

- "Phân tích toàn diện về VNM"
- "So sánh giá VCB và TCB trong 6 tháng"

Tải lên và phân tích tài liệu:

Báo cáo tài chính (Hình ảnh):

Gửi hình ảnh báo cáo tài chính (BCDN, KQKD, Dòng tiền):

- Upload file PNG/JPG của báo cáo
- Agent sẽ OCR + phân tích + tạo bảng Markdown

File PDF:

Gửi file PDF báo cáo tài chính:

- Upload file PDF (native text hoặc scanned)
- Agent sẽ trích xuất text + bảng
- Phân tích chi tiết với AI

File Excel:

Gửi file Excel dữ liệu tài chính:

- Upload file .xlsx/.xls
- Agent sẽ chuyển đổi thành Markdown
- Phân tích dữ liệu tài chính

🚀 Running the Application

1. Start Backend API Server

# Activate virtual environment
.\venv\Scripts\activate  # Windows
source venv/bin/activate  # macOS/Linux

# Start the FastAPI server
python main.py

Expected Output:

╔══════════════════════════════════════════════════════════════╗
║              Financial Agent API                             ║
║       Vietnamese Stock Market Investment Assistant           ║
╚══════════════════════════════════════════════════════════════╝

🚀 Starting server...
📍 API Server: http://0.0.0.0:8000
📚 API Documentation (Swagger UI): http://0.0.0.0:8000/docs
...
Press CTRL+C to quit

2. Test API Server

In a new terminal:

# Test health check
curl http://localhost:8000/health

# Test chat endpoint
curl -X POST "http://localhost:8000/api/chat" \
  -H "Content-Type: application/json" \
  -d '{"question": "Thông tin về VNM"}'

3. Access Swagger UI

Open browser and visit: http://localhost:8000/docs

You can test all API endpoints interactively here.

4. (Optional) Start Frontend

# In a new terminal
cd frontend

# Install dependencies if not already done
npm install

# Start development server
npm run dev

Frontend will be available at: http://localhost:5173

5. (Optional) Start Desktop App

# In a new terminal
cd desktop_app

# Setup (only first time)
npm install

# Start Electron app
npm start

🔧 Advanced Configuration

Switching LLM Providers

Edit .env to change which LLM is used:

# Google Gemini (Cloud)
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-flash
GOOGLE_API_KEY=your_api_key_here

# Ollama (Local)
LLM_PROVIDER=ollama
OLLAMA_MODEL=qwen3:8b
OLLAMA_BASE_URL=http://localhost:11434

Important: Restart the server after changing .env

Setting Up Ollama (Local LLM)

# Download and install from https://ollama.com/

# Start Ollama server
ollama serve

# In another terminal, pull a model
ollama pull qwen2.5:7b

# Verify installation
ollama list

Troubleshooting Ollama

Error: "Connection refused"

# Check if Ollama is running
ollama list

# If not running, start it
ollama serve

Error: "Out of memory"

Use a smaller model: ollama pull qwen3:4b
Switch to Gemini (cloud-based)

Error: "Model not found"

# List available models
ollama list

# Pull a new model
ollama pull qwen2.5:7b

Recommended Models for Financial Analysis:

qwen2.5:7b - Best balance of quality and speed
llama2:13b - High quality but slower
qwen3:4b - Fast but lower quality (~4GB RAM)

Getting Google Gemini API Key

Go to Google AI Studio
Sign in with your Google account
Click "Create API Key"

Copy the key and add to .env:

GOOGLE_API_KEY=your_key_here
LLM_PROVIDER=gemini

Custom System Prompts

Edit these files to customize agent behavior:

src/agent/prompts/system_prompt.txt - Main agent prompt
src/agent/prompts/financial_report_prompt.txt - Financial report analysis
src/agent/prompts/excel_analysis_prompt.txt - Excel data analysis

Restart server to apply changes.

Fine-tuning LLM Parameters

# Temperature (0.0-1.0): Higher = more creative, Lower = more focused
LLM_TEMPERATURE=0.3

# Maximum length of response
LLM_MAX_TOKENS=2048

# RAG Threshold (0.0-1.0): How relevant documents must be
RAG_SIMILARITY_THRESHOLD=0.1

# Number of documents to retrieve
RAG_TOP_K_RESULTS=20

Installing Additional Tools

Install Tesseract OCR (Optional)

Only needed for processing scanned PDFs:

Windows:

# Download from: https://github.com/UB-Mannheim/tesseract/wiki/Downloads
# Run installer

# Add to .env:
TESSERACT_PATH=C:\Program Files\Tesseract-OCR\tesseract.exe

macOS:

brew install tesseract

Linux:

sudo apt-get install tesseract-ocr

Using TA-Lib for Advanced Technical Analysis

# Already installed via requirements.txt
# Verify installation
python -c "import talib; print('✓ TA-Lib installed')"

📡 API Endpoints

Health Check Endpoint

GET /health

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-01-11T10:30:00Z"
}

Chat Endpoint

POST /api/chat

Ask the financial agent any question about Vietnamese stocks.

curl -X POST "http://localhost:8000/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the latest price of VNM stock?"
  }'

Request Body:

{
  "question": "Your question here",
  "use_rag": true,  // Optional: use RAG for document analysis
  "session_id": "optional_session_id"
}

Response:

{
  "answer": "VNM (Vinamilk) stock information...\n\n| Date | Close | Volume |\n...",
  "sources": ["VnStock API", "Company data"],
  "processing_time_seconds": 2.5
}

Upload Financial Report

POST /api/upload/financial-report

Analyze financial reports from images (PNG, JPG, PDF).

curl -X POST "http://localhost:8000/api/upload/financial-report" \
  -F "file=@financial_report.jpg"

Response:

{
  "success": true,
  "report_type": "Balance Sheet",
  "company": "ABC Corporation",
  "period": "Q3/2024",
  "extracted_text": "...",
  "markdown_table": "| Item | Value |\n...",
  "analysis": "Financial analysis from AI..."
}

Upload PDF Document

POST /api/upload/pdf

Analyze PDF financial documents.

curl -X POST "http://localhost:8000/api/upload/pdf" \
  -F "[email protected]"

Response:

{
  "success": true,
  "file_name": "report.pdf",
  "total_pages": 5,
  "extracted_text": "...",
  "tables_markdown": "| Table | Data |\n...",
  "analysis": "Detailed financial analysis...",
  "processing_method": "native"
}

Upload Excel File

POST /api/upload/excel

Analyze Excel financial data files.

curl -X POST "http://localhost:8000/api/upload/excel" \
  -F "file=@financial_data.xlsx"

Response:

{
  "success": true,
  "file_name": "financial_data.xlsx",
  "sheet_count": 3,
  "markdown": "# Financial Data Analysis\n\n## Sheet 1: Revenue\n| Month | Amount |\n...",
  "message": "Excel file analysis successful"
}

Interactive API Documentation

Visit http://localhost:8000/docs (Swagger UI) to:

View all available endpoints
Test endpoints with example data
See response schemas
Download API specification

🐛 Troubleshooting Guide

Installation Issues

Error: "Python version too old"

# Check your Python version
python --version

# Should be 3.9 or higher. If not, download from python.org

Error: "pip install failed"

# Clear pip cache
pip cache purge

# Upgrade pip
python -m pip install --upgrade pip

# Try installing again
pip install -r requirements.txt

Error: "ModuleNotFoundError: No module named 'xxx'"

# Reinstall with force-reinstall
pip install -r requirements.txt --force-reinstall

# Or reinstall specific package
pip install langchain --upgrade

Database Connection Issues

Error: "Connection refused" for PostgreSQL

# Check if PostgreSQL is running
# Windows: Services app → PostgreSQL → Should show "Running"
# macOS: brew services list | grep postgres
# Linux: sudo systemctl status postgresql

# If not running, start it:
# Windows: Services app → PostgreSQL → Start
# macOS: brew services start postgresql@15
# Linux: sudo systemctl start postgresql

Error: "database does not exist"

# Recreate the database
psql -U postgres
CREATE DATABASE financial_agent;
GRANT ALL PRIVILEGES ON DATABASE financial_agent TO financial_user;
\q

# Run migrations
alembic upgrade head

Error: "Database URL is empty"

# Check .env file has DATABASE_URL
cat .env | grep DATABASE_URL

# Should see something like:
# DATABASE_URL=postgresql://financial_user:financial_password@localhost:5432/financial_agent

LLM Provider Issues

Error: "GOOGLE_API_KEY not configured"

# 1. Get API key from: https://aistudio.google.com/apikey
# 2. Add to .env:
GOOGLE_API_KEY=your_actual_key_here
LLM_PROVIDER=gemini

# 3. Restart server

Error: "Ollama connection failed"

# Check if Ollama is running
ollama list

# Start Ollama if not running
ollama serve

# Update .env to point to correct URL
OLLAMA_BASE_URL=http://localhost:11434
LLM_PROVIDER=ollama

Error: "Model not found"

# List available models
ollama list

# Pull a model
ollama pull qwen2.5:7b

# Set model in .env
OLLAMA_MODEL=qwen2.5:7b

API Server Issues

Error: "Port 8000 already in use"

# Use different port
API_PORT=8001 python main.py

# Or find process using port 8000 and kill it
# Windows:
netstat -ano | findstr :8000
taskkill /PID <PID> /F

# macOS/Linux:
lsof -i :8000
kill -9 <PID>

Error: "CORS error" from frontend

# Update .env with correct origins
CORS_ORIGINS=http://localhost:5173,http://localhost:3000,http://localhost:8000

# Restart server

Error: "No module named 'src'"

# Make sure running from project root directory
cd financial_agent_fork

# Verify directory structure
ls -la src/  # Should show src/ folder exists

# Run server from root
python main.py

Qdrant Vector Database Issues

Error: "Qdrant connection failed"

# Check Qdrant is running
curl http://localhost:6333/health

# If not running, start with Docker
docker run --name qdrant -p 6333:6333 qdrant/qdrant

# Or for Qdrant Cloud, update .env
QDRANT_MODE=cloud
QDRANT_CLOUD_URL=https://your-instance.qdrant.io
QDRANT_CLOUD_API_KEY=your-api-key

Error: "Collection not found"

This is normal for first run. Collections are created automatically when first document is uploaded.

Error: "Timeout connecting to Qdrant"

# Increase timeout in .env
QDRANT_TIMEOUT_SECONDS=300
QDRANT_RETRY_ATTEMPTS=5

# Restart server

File Upload Issues

Error: "File size too large"

Default limit: 50MB per file
For larger files, split into multiple smaller files
Or adjust FastAPI settings

Error: "Unsupported file type"

Financial Reports: PNG, JPG, PDF
Data Files: XLSX, XLS
PDF: PDF only

Document Processing Issues

Error: "OCR failed" or "Tesseract not found"

# Option 1: Install Tesseract (see Installation Guide above)
# Option 2: Use Google Gemini Vision API instead (recommended)
# Set in .env:
LLM_PROVIDER=gemini
GOOGLE_API_KEY=your_key_here

Error: "PDF extraction failed"

Try with a different PDF file
Ensure PDF is not password-protected
Scanned PDFs may need OCR (slower)

Error: "Excel file cannot be read"

Verify file is not corrupted
Save file in .xlsx format (not .xls)
Check file has proper Excel structure
Remove unusual blank rows/columns

Performance Issues

API is slow to respond

# 1. Check if it's LLM latency
# - Switching to faster model (qwen3:4b)
# - Or use Gemini instead

# 2. Check if it's database query
# - Add database indexes
# - Check database server is running properly

# 3. Check RAM usage
# - Monitor memory with: Task Manager (Windows), Activity Monitor (macOS), htop (Linux)
# - If low on RAM, reduce model size

# 4. Enable debug mode to see timings
DEBUG=True

High memory usage

# Use smaller LLM model
OLLAMA_MODEL=qwen3:4b  # Instead of larger models

# Or switch to API-based (cloud) providers
LLM_PROVIDER=gemini

Getting Help

Check logs for detailed error messages:

# Windows: Logs are printed in terminal
# Look for error messages starting with [ERROR]

# Enable verbose logging
DEBUG=True

Test individual components:

# Test VnStock API
python -c "from vnstock3 import Vnstock; v = Vnstock(); print(v.listing_companies())"

# Test PostgreSQL
python -c "from src.database.database import SessionLocal; db = SessionLocal(); print('✓ Database connected')"

# Test Qdrant
python -c "from qdrant_client import QdrantClient; c = QdrantClient(':memory:'); print('✓ Qdrant OK')"

# Test LLM
python -c "from src.llm.llm_factory import LLMFactory; llm = LLMFactory.get_llm(); print(llm.invoke('Hello'))"

🛠️ Available Tools Reference

The financial agent has access to these tools for stock market analysis:

Stock Information Tools

1. get_company_info

Get company overview and profile information

Input: ticker (e.g., VNM, VCB, HPG)
Output: Company name, industry, charter capital, history

2. get_shareholders

Retrieve major shareholders information

Input: ticker
Output: Top 10 shareholders with ownership percentages

3. get_officers

Get company leadership and management team

Input: ticker
Output: Executives, positions, shareholding percentage

4. get_subsidiaries

Find subsidiary and affiliated companies

Input: ticker
Output: List of subsidiaries with ownership percentage

5. get_company_events

Get company events and announcements

Input: ticker
Output: Recent corporate events (dividends, AGM, capital increases)

Market Data Tools

6. get_historical_data

Retrieve historical price data (OHLCV)

Input: ticker, start_date, end_date or period (3M, 6M, 1Y)
Output: Detailed OHLCV table with statistics
Example: get_historical_data("VNM", period="3M")

Technical Analysis Tools

7. calculate_sma

Calculate Simple Moving Average

Input: ticker, window (default: 20)
Output: SMA values with trend analysis
Example: calculate_sma("VNM", window=20)

8. calculate_rsi

Calculate Relative Strength Index

Input: ticker, window (default: 14)
Output: RSI values with overbought/oversold signals
Example: calculate_rsi("HPG", window=14)

How to Use Tools in Chat

Simply ask the agent questions, and it will automatically use the appropriate tools:

Q: "What is the latest price of VNM?"
→ Uses get_historical_data

Q: "Who are the major shareholders of VCB?"
→ Uses get_shareholders

Q: "Calculate SMA-20 for HPG"
→ Uses calculate_sma with window=20

Q: "Is FPT stock overbought right now?"
→ Uses calculate_rsi to check signal

🏗️ Architecture

Tech Stack

Backend: FastAPI (REST API)
Agent Framework: LangChain + LangGraph (ReAct Pattern)
LLM Providers:
- ☁️ Google Gemini (Cloud) - AI analysis & OCR
- 🖥️ Ollama (Local) - for chat & analysis
Data Source: VnStock3 API (Free)
Vector Database: Qdrant (RAG)
Relational Database: PostgreSQL
Technical Analysis: TA-Lib
Document Processing:
- pytesseract + OpenCV (OCR for scanned documents)
- pdfplumber (PDF text extraction)
- pdf2image (PDF to image conversion)
Excel Processing: openpyxl + pandas
Frontend: React + Vite + TailwindCSS
Desktop App: Electron

Project Structure

financial_agent/
├── src/
│   ├── agent/              # LangGraph Agent
│   │   ├── financial_agent.py
│   │   ├── state.py
│   │   └── prompts/
│   │       ├── system_prompt.txt
│   │       ├── financial_report_prompt.txt
│   │       └── excel_analysis_prompt.txt
│   ├── tools/              # 8+ Analysis Tools
│   │   ├── vnstock_tools.py         # Company & stock data
│   │   ├── technical_tools.py       # SMA, RSI indicators
│   │   ├── financial_report_tools.py # OCR + Gemini analysis
│   │   ├── pdf_tools.py             # PDF processing
│   │   └── excel_tools.py           # Excel analysis
│   ├── llm/                # LLM Factory
│   │   ├── llm_factory.py
│   │   └── config.py
│   ├── database/           # Database Models
│   │   ├── database.py
│   │   └── models.py
│   ├── api/                # REST API Endpoints
│   │   └── app.py          # Main FastAPI application
│   ├── services/           # Business Logic
│   │   ├── message_generation_service.py
│   │   ├── document_service.py
│   │   ├── admin_service.py
│   │   └── rag_service.py
│   ├── core/               # Configuration & Workflow
│   │   ├── config.py
│   │   ├── langgraph_workflow.py  # Main workflow
│   │   └── tool_selector.py
│   └── utils/              # Utilities
│       ├── validators.py
│       └── helpers.py
├── migrations/             # Alembic DB migrations
├── frontend/               # React Frontend
│   ├── src/
│   │   ├── components/
│   │   ├── pages/
│   │   └── services/
│   └── package.json
├── desktop_app/            # Electron Desktop App
│   ├── main.js
│   ├── preload.js
│   └── package.json
├── requirements.txt        # Python dependencies
├── alembic.ini            # Database migration config
├── main.py                # Application entry point
└── README.md              # This file

Data Flow Diagram

User Input
    ↓
FastAPI Endpoint
    ↓
LangGraph Agent
    ├→ Tool Router
    │   ├→ VnStock Tools (Stock data)
    │   ├→ Technical Tools (SMA, RSI)
    │   ├→ Financial Report Tools (OCR + AI)
    │   ├→ PDF Tools (Document parsing)
    │   └→ Excel Tools (Data analysis)
    ├→ LLM Provider
    │   ├→ Google Gemini (Cloud)
    │   └→ Ollama (Local)
    └→ Qdrant Vector DB (RAG retrieval)
    ↓
Markdown Response
    ↓
Frontend Display

Database Schema

-- Users table
CREATE TABLE users (
  id UUID PRIMARY KEY,
  username VARCHAR UNIQUE,
  email VARCHAR UNIQUE,
  hashed_password VARCHAR,
  is_admin BOOLEAN DEFAULT FALSE,
  created_at TIMESTAMP
);

-- Chat sessions
CREATE TABLE chat_sessions (
  id UUID PRIMARY KEY,
  user_id UUID FOREIGN KEY,
  title VARCHAR,
  use_rag BOOLEAN DEFAULT TRUE,
  created_at TIMESTAMP
);

-- Chat messages
CREATE TABLE chat_messages (
  id UUID PRIMARY KEY,
  session_id UUID FOREIGN KEY,
  role VARCHAR,
  content TEXT,
  created_at TIMESTAMP
);

-- Document uploads
CREATE TABLE document_uploads (
  id UUID PRIMARY KEY,
  user_id UUID FOREIGN KEY,
  file_name VARCHAR,
  file_type VARCHAR,
  file_size INTEGER,
  created_at TIMESTAMP
);

-- Audit logs
CREATE TABLE audit_logs (
  id UUID PRIMARY KEY,
  user_id UUID FOREIGN KEY,
  action VARCHAR,
  timestamp TIMESTAMP
);

Integration Points

PostgreSQL ↔ FastAPI

SQLAlchemy ORM for data modeling
Alembic for schema migrations
Connection pooling for performance

VnStock API ↔ Tools

Real-time stock prices
Historical OHLCV data
Company fundamentals
Shareholder information

LLM Providers ↔ Agent

Gemini: For analysis and OCR
Ollama: For local chat
Tool calling and function execution

Qdrant ↔ RAG System

Vector embeddings storage
Semantic document retrieval
Collection management

📚 Learning Resources

🚀 Deployment Guide

Deploy Backend to Railway

Push code to GitHub
Connect GitHub repository to Railway
Set environment variables in Railway dashboard
Railway automatically detects Python and deploys

Deploy Frontend to Vercel

Push frontend code to GitHub
Connect GitHub to Vercel
Configure build settings
Vercel auto-deploys on push

Docker Deployment

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "main.py"]

Build and run:

docker build -t financial-agent .
docker run -p 8000:8000 --env-file .env financial-agent

🔄 Workflow Examples

Example 1: Research a Stock

User: "Tell me everything about VNM stock"

Agent:
1. Uses get_company_info("VNM")
2. Uses get_shareholders("VNM")
3. Uses get_company_events("VNM")
4. Uses get_historical_data("VNM", period="6M")
5. Uses calculate_sma("VNM", window=20)
6. Uses calculate_rsi("VNM")
7. LLM synthesizes all data
8. Returns comprehensive analysis with tables

Example 2: Financial Report Analysis

User: Upload financial report image

Agent:
1. OCR image → Extract text
2. Classify report type (Balance Sheet, Income Statement, etc.)
3. Extract financial tables → Markdown
4. Use Gemini to analyze data
5. Return formatted analysis with insights

Example 3: Portfolio Analysis

User: "I own VNM, VCB, and HPG. How are they doing?"

Agent:
1. Gets latest data for each stock
2. Calculates technical indicators
3. Analyzes trends and momentum
4. Compares to market benchmarks
5. Provides investment insights

🌟 Features Roadmap

Real-time price updates (WebSocket)
Financial ratio calculations (P/E, ROE, ROA)
News scraping and sentiment analysis
Portfolio tracking and alerts
Mobile app (React Native)
Advanced charting and visualization
Machine learning price predictions
Multi-language support

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Create virtual environment
python -m venv venv_dev
source venv_dev/bin/activate

# Install dev dependencies
pip install -r requirements.txt
pip install pytest pytest-asyncio black flake8

# Run tests
pytest

# Format code
black src/

# Lint code
flake8 src/

📄 License

MIT License

👨‍💻 Author & Support

Financial Agent - AI Stock Market Assistant for Vietnam
Built with ❤️ using modern AI and financial technologies

Maintained by: Cleans3 Project Status: Active Development
Last Updated: January 2025

Acknowledgments

Special thanks to:

VnStock team for the amazing free API
LangChain team for the powerful framework
Ollama team for local LLM support
Google for Gemini API
Open-source community

Happy Trading! 📈🚀

If you find this project helpful, please ⭐ star it on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data/faiss_index		data/faiss_index
desktop_app		desktop_app
example_dataset		example_dataset
frontend		frontend
migrations		migrations
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AI Intern test questions.xlsx		AI Intern test questions.xlsx
README.md		README.md
alembic.ini		alembic.ini
check_qdrant.py		check_qdrant.py
create_sample_test.py		create_sample_test.py
debug_chunk_types.py		debug_chunk_types.py
debug_rag.py		debug_rag.py
inspect_collection.py		inspect_collection.py
inspect_vectors.py		inspect_vectors.py
logs.txt		logs.txt
logs_debug.txt		logs_debug.txt
main.py		main.py
package-lock.json		package-lock.json
pdf_result_log.txt		pdf_result_log.txt
requirements.txt		requirements.txt
test_auto.py		test_auto.py
test_bug_fixes.py		test_bug_fixes.py
test_file_upload.py		test_file_upload.py
test_llm_retrieval_strategy.py		test_llm_retrieval_strategy.py
test_phase2_metric_extraction.py		test_phase2_metric_extraction.py
test_tool_direct.py		test_tool_direct.py
test_vnstock_api.py		test_vnstock_api.py
test_workflow_quick.py		test_workflow_quick.py
test_workflow_steps.py		test_workflow_steps.py

Cleans3/financial_agent

Folders and files

Latest commit

History

Repository files navigation

Financial Agent - Vietnamese Stock Market Assistant 🇻🇳📈

📚 Table of Contents

⚡ Quick Start

Minimum Setup (5 minutes)

🖥️ System Requirements

Minimum Requirements

External Services Required

📦 Installation Guide

Step 1: Clone Repository

Step 2: Python Environment Setup

Step 3: Install Dependencies

Optional: Install Tesseract OCR

Step 4: Verify Installation

🗄️ Database Setup

PostgreSQL Setup

Option 1: Local Installation (Recommended for Development)

Option 2: Docker Container (Recommended for Production)

Create Database and User

Option 3: Cloud PostgreSQL

Database Initialization

Verify Database Setup

Qdrant Vector Database Setup

Option 1: Qdrant Cloud (Recommended for Production)

Option 2: Docker Container (Development)

Option 3: Local Installation (Development)

🔧 Environment Configuration

Create .env File

Complete Configuration

Validate Configuration

✨ Tính năng

📊 Thông tin doanh nghiệp

📈 Dữ liệu thị trường

📉 Phân tích kỹ thuật

📄 Xử lý Tài liệu

🎯 Định dạng trả lời

🏗️ Kiến trúc hệ thống

Tech Stack:

Cấu trúc thư mục:

🚀 Cài đặt và Chạy

Bước 1: Clone và cài đặt Dependencies

Bước 2: Cấu hình LLM Provider

Option 1: Google Gemini (Recommended) ☁️

Option 2: Ollama (Local) 🖥️

Bước 6: Cấu hình Tesseract OCR (cho phân tích báo cáo tài chính)

Windows:

Linux (Ubuntu/Debian):

macOS:

Kiểm tra cài đặt:

🎮 Chạy ứng dụng

Backend API

Frontend (React)

Test API bằng curl

🧪 Testing

Test tự động với file Excel

📚 Sử dụng

Các câu hỏi mẫu:

Tải lên và phân tích tài liệu:

🚀 Running the Application

1. Start Backend API Server

2. Test API Server

3. Access Swagger UI

4. (Optional) Start Frontend

5. (Optional) Start Desktop App

🔧 Advanced Configuration

🔧 Advanced Configuration

Switching LLM Providers

Setting Up Ollama (Local LLM)

Troubleshooting Ollama

Getting Google Gemini API Key

Custom System Prompts

Fine-tuning LLM Parameters

Installing Additional Tools

Install Tesseract OCR (Optional)

Using TA-Lib for Advanced Technical Analysis

📡 API Endpoints

Packages