Skip to content

InkiChang/ga-search-enhanced

Repository files navigation

GA WebSearch Plugin / GA Search Enhanced

English | 中文

A lightweight, single-file web-search enhancement module for GenericAgent-style agents. It provides Tavily search, OpenAI-compatible deep search, parallel search, web content extraction, and site map discovery.

This project was ported and simplified from an internal GenericAgent integration inspired by openclaw-websearch-plugin, with secrets and local-only configuration removed for public release.

Features

  • Tavily Search: tavily_search
    • Basic / advanced search depth
    • General / news topic support
    • Sticky multi-key failover for 401 / 429 style authorization or rate-limit failures
  • Grok / OpenAI-compatible deep search: grok_search
    • Works with an OpenAI-compatible /v1/chat/completions endpoint
    • Optional model override and platform focus
    • Adds current time context for time-sensitive queries
  • Parallel search: dual_search
    • Runs Tavily and Grok-compatible search concurrently
    • Separates Tavily-only and Grok-only keyword arguments to avoid parameter pollution
  • Web content extraction: ws_fetch
    • Tavily Extract first
    • Firecrawl v2 /scrape fallback when configured
  • Site map discovery: ws_map
    • Tavily Map first
    • Optional local same-domain BFS fallback when beautifulsoup4 is installed

Install

For local development:

pip install -e .

Or copy ga_search_enhanced.py into your agent/tool directory.

Python 3.10+ is required.

Optional dependency for local site-map fallback:

pip install -e ".[map-fallback]"

Configuration

Configuration is intentionally lazy-loaded. The module works in plain Python and can also be used inside a GenericAgent-style environment.

Configuration precedence:

  1. Environment variables
  2. Local web-search.env beside ga_search_enhanced.py
  3. Optional lazy keychain adapter, if a compatible module exists

Example web-search.env:

TAVILY_API_KEYS=tvly-key1,tvly-key2
GROK_API_KEY=your-grok-or-openai-compatible-key
GROK_API_URL=http://127.0.0.1:8000
GROK_MODEL=grok-3-mini
FIRECRAWL_API_KEY=fc-your-key
FIRECRAWL_API_URL=https://api.firecrawl.dev/v2

Supported key aliases include:

  • Tavily: TAVILY_API_KEYS, TAVILY_API_KEY, tavilyApiKeys
  • Grok/OpenAI compatible: GROK_API_KEY, GROK_API_URL, GROK_MODEL, grokApiKey, grokApiUrl, grokModel
  • Firecrawl: FIRECRAWL_API_KEY, FIRECRAWL_API_URL, firecrawlApiKey, firecrawlApiUrl

web-search.env is ignored by git. Never commit real API keys.

Quick Use

from ga_search_enhanced import tavily_search, grok_search, dual_search, ws_fetch, ws_map

print(tavily_search("latest Python release", depth="advanced", max_results=5))
print(grok_search("Analyze the latest Python release impact", platform="GitHub and official docs"))
print(dual_search("Compare current LLM web search APIs", max_results=3, model="grok-3-mini"))
print(ws_fetch(["https://example.com"]))
print(ws_map(url="https://example.com", depth=2, limit=10))

API Overview

tavily_search(query, depth="basic", max_results=5, topic="general", days=None, include_answer=True, include_raw=False)

Searches with Tavily Search API.

Behavior:

  • Empty query returns {"error": "query is required"}.
  • Invalid depth is normalized to "basic".
  • Invalid topic is normalized to "general".
  • max_results is clamped to 1..20.
  • Multiple Tavily keys are sticky: the current key is reused until a 401 or 429 style failure, then the pool fails over to the next key.

Typical success shape:

{
    "result": "## Answer\n...\n\n## Sources\n...",
    "details": {...},  # Tavily response
    "raw": {...},      # same raw Tavily response for compatibility
}

Typical error shape:

{"error": "No Tavily API keys configured"}

grok_search(query, model=None, platform=None)

Calls an OpenAI-compatible chat completion endpoint at:

{GROK_API_URL}/v1/chat/completions

Behavior:

  • Empty query returns {"error": "query is required"}.
  • Missing key returns {"error": "Grok API key not configured"}.
  • model overrides configured GROK_MODEL.
  • platform appends a focus instruction to the user message.
  • Time-sensitive queries receive a current date/time prefix.

Typical success shape:

{
    "result": "...model answer...",
    "model": "...",
    "usage": {...},
    "details": {...},
}

dual_search(query, tavily=None, grok=None, **kwargs)

Runs tavily_search and grok_search concurrently.

Keyword routing:

  • Tavily kwargs: depth, max_results, topic, days, include_answer, include_raw
  • Grok kwargs: model, platform
  • tavily={...} and grok={...} override routed kwargs for each side.

Return shape:

{
    "tavily": {...},
    "grok": {...},
    "combined": "## Tavily Results\n...\n\n## Grok Results\n...",
}

Each side may independently contain either result or error.

ws_fetch(urls, depth="basic", format="markdown")

Extracts web page content.

Fallback decision:

  1. Try Tavily Extract with configured Tavily keys.
  2. If Tavily extraction fails or returns no content, try Firecrawl v2 /scrape when FIRECRAWL_API_KEY is configured.
  3. If all methods fail, return an error.

urls may be a string or a list of strings.

Typical success shape:

{
    "result": "# https://example.com\n\n...",
    "source": "tavily",      # or "firecrawl"
    "failed": [],
}

Typical error shape:

{"error": "All extraction methods failed (Tavily Extract and FireCrawl)"}

ws_map(url=None, depth=1, breadth=20, limit=50, instructions=None, start_url=None, max_pages=None, same_domain=True)

Discovers URLs for a site.

Behavior:

  • start_url is accepted as an alias for url.
  • max_pages is accepted as an alias affecting limit and breadth.
  • depth is clamped to 1..5.
  • breadth and limit are clamped to 1..100.

Fallback decision:

  1. Try Tavily Map with configured Tavily keys.
  2. If Tavily Map fails and beautifulsoup4 is installed, run a simple local BFS crawler.
  3. If beautifulsoup4 is not installed, return the Tavily error.

Typical Tavily success shape:

{
    "result": "# Site Map: ...",
    "url": "https://example.com",
    "urls": ["https://example.com", "..."],
    "details": {...},
    "source": "tavily",
}

Typical local fallback shape:

{
    "result": "# Site Map: ...",
    "url": "https://example.com",
    "urls": ["https://example.com", "..."],
    "source": "local_bfs",
    "tavily_error": "...",
}

GenericAgent vs Plain Python

  • In plain Python, use environment variables or a local web-search.env.
  • In GenericAgent-style runtimes, a compatible lazy keychain module may provide secrets.
  • The package never imports the keychain at module import time, so ordinary import ga_search_enhanced remains safe outside GA.
  • Secrets are not printed by the module.

Test

The included tests are offline and use monkeypatched fake HTTP responses. They do not call real external APIs.

python3 -m pytest -q

The current test matrix covers:

  • Empty argument validation
  • Tavily sticky key failover
  • Grok/OpenAI-compatible response parsing
  • Firecrawl fallback for ws_fetch
  • Local BFS fallback for ws_map
  • Public exports and package metadata expectations

Security Notes

  • No real API keys are included in this repository.
  • web-search.env, .env, and local secret files are ignored.
  • Do not paste tokens into issues, commits, logs, or test output.
  • The module returns structured errors instead of raising for normal user/configuration failures.

Reuse Notice

This repository currently does not ship a standalone license file. Review the code and upstream constraints before reuse or redistribution.


中文说明

GA WebSearch Plugin / GA Search Enhanced 是一个面向 GenericAgent 风格智能体的轻量级联网搜索增强模块。它以单文件形式提供搜索、深度检索、网页内容抓取和站点结构映射能力,便于复制、集成和二次开发。

该项目来自内部 GenericAgent 集成实践,并参考了 openclaw-websearch-plugin 的使用场景;公开版本已移除本地私有配置、密钥和个人环境依赖。

功能特性

  • Tavily 搜索tavily_search
    • 支持 basic / advanced 搜索深度
    • 支持 general / news 主题
    • 支持多 Tavily Key 粘滞复用,并在 401 / 429 类授权或限流失败时切换
  • Grok / OpenAI 兼容深度搜索grok_search
    • 支持 OpenAI 兼容的 /v1/chat/completions 接口
    • 支持模型覆盖和平台聚焦参数
    • 对时间敏感问题自动注入当前时间上下文
  • 并行搜索dual_search
    • 同时调用 Tavily 与 Grok/OpenAI 兼容接口
    • 自动拆分 Tavily/Grok 专属参数,避免参数污染
  • 网页内容抓取ws_fetch
    • 优先使用 Tavily Extract
    • 配置 Firecrawl 后可降级到 Firecrawl v2 /scrape
  • 站点地图发现ws_map
    • 优先使用 Tavily Map
    • 安装 beautifulsoup4 后可使用同域本地 BFS 降级方案

安装

开发模式安装:

pip install -e .

也可以直接把 ga_search_enhanced.py 复制到你的 Agent 工具目录中使用。

要求 Python 3.10 或更高版本。

本地站点地图降级能力的可选依赖:

pip install -e ".[map-fallback]"

配置方式

配置采用懒加载方式。模块既可以在普通 Python 环境中使用,也可以在 GenericAgent 风格环境中使用。

配置优先级:

  1. 环境变量
  2. ga_search_enhanced.py 同级目录下的 web-search.env
  3. 如果存在兼容模块,则懒加载可选 keychain 适配器

示例 web-search.env

TAVILY_API_KEYS=tvly-key1,tvly-key2
GROK_API_KEY=your-grok-or-openai-compatible-key
GROK_API_URL=http://127.0.0.1:8000
GROK_MODEL=grok-3-mini
FIRECRAWL_API_KEY=fc-your-key
FIRECRAWL_API_URL=https://api.firecrawl.dev/v2

支持的配置别名包括:

  • Tavily:TAVILY_API_KEYSTAVILY_API_KEYtavilyApiKeys
  • Grok/OpenAI 兼容接口:GROK_API_KEYGROK_API_URLGROK_MODELgrokApiKeygrokApiUrlgrokModel
  • Firecrawl:FIRECRAWL_API_KEYFIRECRAWL_API_URLfirecrawlApiKeyfirecrawlApiUrl

web-search.env 已加入 .gitignore,请不要提交真实 API Key。

快速使用

from ga_search_enhanced import tavily_search, grok_search, dual_search, ws_fetch, ws_map

print(tavily_search("最新 Python 版本", depth="advanced", max_results=5))
print(grok_search("分析最新 Python 版本的影响", platform="GitHub 和官方文档"))
print(dual_search("对比当前主流 LLM 联网搜索 API", max_results=3, model="grok-3-mini"))
print(ws_fetch(["https://example.com"]))
print(ws_map(url="https://example.com", depth=2, limit=10))

API 概览

tavily_search(query, depth="basic", max_results=5, topic="general", days=None, include_answer=True, include_raw=False)

通过 Tavily Search API 搜索。

行为契约:

  • query 返回 {"error": "query is required"}
  • 非法 depth 会被归一化为 "basic"
  • 非法 topic 会被归一化为 "general"
  • max_results 会被限制在 1..20
  • 多 Tavily Key 采用粘滞复用:当前 key 会一直使用到出现 401429 类失败,再切换到下一个 key。

典型成功结构:

{
    "result": "## Answer\n...\n\n## Sources\n...",
    "details": {...},
    "raw": {...},
}

grok_search(query, model=None, platform=None)

调用 OpenAI 兼容聊天补全接口:

{GROK_API_URL}/v1/chat/completions

行为契约:

  • query 返回 {"error": "query is required"}
  • 未配置 key 返回 {"error": "Grok API key not configured"}
  • model 会覆盖配置中的 GROK_MODEL
  • platform 会追加平台聚焦指令。
  • 时间敏感问题会自动添加当前日期时间。

典型成功结构:

{
    "result": "...模型回答...",
    "model": "...",
    "usage": {...},
    "details": {...},
}

dual_search(query, tavily=None, grok=None, **kwargs)

并行运行 tavily_searchgrok_search

参数路由:

  • Tavily 参数:depthmax_resultstopicdaysinclude_answerinclude_raw
  • Grok 参数:modelplatform
  • tavily={...}grok={...} 可分别覆盖两侧参数

返回结构:

{
    "tavily": {...},
    "grok": {...},
    "combined": "## Tavily Results\n...\n\n## Grok Results\n...",
}

两侧结果可能分别包含 resulterror

ws_fetch(urls, depth="basic", format="markdown")

抓取网页内容。

降级决策:

  1. 先使用 Tavily Extract。
  2. 如果 Tavily 抓取失败或没有内容,并且已配置 FIRECRAWL_API_KEY,则调用 Firecrawl v2 /scrape
  3. 如果全部失败,返回错误结构。

urls 可以是字符串或字符串列表。

典型成功结构:

{
    "result": "# https://example.com\n\n...",
    "source": "tavily",      # 或 "firecrawl"
    "failed": [],
}

ws_map(url=None, depth=1, breadth=20, limit=50, instructions=None, start_url=None, max_pages=None, same_domain=True)

发现站点 URL。

行为契约:

  • start_urlurl 的兼容别名。
  • max_pages 是影响 limitbreadth 的兼容别名。
  • depth 限制在 1..5
  • breadthlimit 限制在 1..100

降级决策:

  1. 先使用 Tavily Map。
  2. 如果 Tavily Map 失败且已安装 beautifulsoup4,则运行简单同域 BFS 爬取。
  3. 如果未安装 beautifulsoup4,返回 Tavily 错误。

典型返回结构:

{
    "result": "# Site Map: ...",
    "url": "https://example.com",
    "urls": ["https://example.com", "..."],
    "source": "tavily",      # 或 "local_bfs"
}

GenericAgent 与普通 Python 边界

  • 普通 Python 环境建议使用环境变量或 web-search.env
  • GenericAgent 风格环境可以通过兼容 keychain 模块提供密钥。
  • 模块不会在 import 阶段导入 keychain,因此脱离 GA 也能安全导入。
  • 模块不会打印密钥。

测试

仓库内置离线测试,通过 monkeypatch 模拟 HTTP 响应,不会调用真实外部 API:

python3 -m pytest -q

当前测试矩阵覆盖:

  • 空参数校验
  • Tavily 粘滞 key 失败切换
  • Grok/OpenAI 兼容响应解析
  • ws_fetch 的 Firecrawl 降级
  • ws_map 的本地 BFS 降级
  • 公开导出与包元数据预期

安全说明

  • 仓库不包含真实 API 密钥。
  • web-search.env.env 和本地密钥文件已加入忽略规则。
  • 不要在 issue、commit 或日志中粘贴 token。
  • 常规用户输入或配置错误会以结构化 error 返回,而不是直接抛出异常。

复用说明

本仓库当前不随附独立许可文件。复用或再分发前,请自行审查代码及上游约束。

About

GenericAgent-friendly Python search enhancement tools using Tavily, Grok and Firecrawl-style fetch/map helpers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages