Skip to content

Latest commit

 

History

History
169 lines (128 loc) · 4.48 KB

File metadata and controls

169 lines (128 loc) · 4.48 KB

🎯 BGE FAISS Database Usage Guide

📊 Database Overview

Your database contains:

  • 83,837 financial/trading documents
  • 1024-dimensional BGE embeddings
  • Primary language: Chinese (70.5%) with English (4.1%)
  • Time range: 2024-2025 financial reports and market analysis
  • Main topics: Corporate analysis, financial forecasting, trading commentary

🏆 Best Performing Query Types

✅ High Success Queries (46.7% success rate)

  1. Technology & AI Focus

    python chat_app.py -q "OLED RISC AI"
    python chat_app.py -q "AI 人工智能 市场"
  2. Trading & Market Commentary

    python chat_app.py -q "Trading Desk"
  3. Corporate Analysis

    python chat_app.py -q "公司 市场 增长"
    python chat_app.py -q "预计 亿元 同比"
  4. Market Outlook

    python chat_app.py -q "有望 预期 目前"
  5. Semiconductor Industry

    python chat_app.py -q "半导体 芯片 TSMC"

🔑 Top Keywords Found in Your Database

Chinese Keywords (Most Frequent)

  • 公司 (company) - 707 mentions
  • 市场 (market) - 358 mentions
  • 增长 (growth) - 291 mentions
  • 预计 (forecast) - 281 mentions
  • 亿元 (hundred million yuan) - 261 mentions
  • 提升 (improvement) - 240 mentions
  • 同比 (year-over-year) - 238 mentions
  • 产品 (product) - 234 mentions

English Keywords

  • OLED, RISC, AI (technology terms)
  • EBITDA, CAGR (financial metrics)
  • Trading Desk (market commentary)

📅 Temporal Coverage

Your database covers recent financial data (2024-2025) with peak document volumes on:

  • 2025-06-09: 316 documents
  • 2025-02-20: 292 documents
  • 2024-01-29: 278 documents

🎯 Optimal Query Strategies

1. Financial Analysis Queries

# Corporate performance
python chat_app.py -q "公司 业绩 增长"

# Financial forecasting  
python chat_app.py -q "预计 营收 利润"

# Market expectations
python chat_app.py -q "有望 预期 提升"

2. Technology Sector Queries

# AI and technology
python chat_app.py -q "AI 人工智能 技术"

# Semiconductor industry
python chat_app.py -q "半导体 芯片 制造"

# Display technology
python chat_app.py -q "OLED 显示 技术"

3. Trading & Market Queries

# Trading commentary
python chat_app.py -q "Trading Desk"

# Market analysis
python chat_app.py -q "股市 行情 分析"

4. Industry Analysis Queries

# Sector analysis
python chat_app.py -q "行业 分析 趋势"

# Company comparisons
python chat_app.py -q "公司 对比 竞争"

🚫 Query Types to Avoid

Based on testing, these query types don't work well with your database:

  • Automotive industry queries (汽车)
  • General trading system queries (交易系统)
  • Risk management queries (风险管理)
  • Investment strategy queries (投资策略)

💡 Best Practices

✅ Do:

  • Use 2-3 specific keywords rather than full sentences
  • Combine Chinese business terms with relevant English acronyms
  • Focus on financial, technology, and corporate topics
  • Use high-frequency keywords from the analysis
  • Try company names, financial metrics, and tech terms

❌ Don't:

  • Use automotive industry terms
  • Query about general trading concepts
  • Ask about risk management theory
  • Use very specific company names not in the database

🧪 Quick Test Commands

# Test the database analyzer
python analyze_db.py

# Test optimized queries
python optimized_test_queries.py --test-all

# Get recommendations
python optimized_test_queries.py --recommendations

# Interactive exploration
python chat_app.py -i

# Best performing query
python chat_app.py -q "OLED RISC AI"

📈 Success Metrics

Your database performs best with:

  • Technology sector queries: 3+ results typically
  • Financial corporate analysis: 1-2 results typically
  • Trading desk commentary: 2+ results typically
  • AI/semiconductor topics: 1-3 results typically

🎯 Recommended Workflow

  1. Start with high-success queries to understand your data
  2. Use interactive mode for exploration: python chat_app.py -i
  3. Try technology and financial keywords first
  4. Combine Chinese business terms with English tech acronyms
  5. Focus on 2024-2025 timeframe for best results

Your database is optimized for Chinese financial markets, technology sector analysis, and corporate performance research from recent time periods.