Skip to content

Commit 633c12f

Browse files
authored
chore(model gallery): add websailor-32b (#6299)
Signed-off-by: Ettore Di Giacinto <[email protected]>
1 parent 6f24135 commit 633c12f

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

gallery/index.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7579,6 +7579,23 @@
75797579
- filename: Qwentile2.5-32B-Instruct-Q4_K_M.gguf
75807580
sha256: e476d6e3c15c78fc3f986d7ae8fa35c16116843827f2e6243c05767cef2f3615
75817581
uri: huggingface://bartowski/Qwentile2.5-32B-Instruct-GGUF/Qwentile2.5-32B-Instruct-Q4_K_M.gguf
7582+
- !!merge <<: *qwen25
7583+
name: "websailor-32b"
7584+
urls:
7585+
- https://huggingface.co/Alibaba-NLP/WebSailor-32B
7586+
- https://huggingface.co/mradermacher/WebSailor-32B-GGUF
7587+
description: |
7588+
WebSailor is a complete post-training methodology designed to teach LLM agents sophisticated reasoning for complex web navigation and information-seeking tasks. It addresses the challenge of extreme uncertainty in vast information landscapes, a capability where previous open-source models lagged behind proprietary systems.
7589+
We classify information-seeking tasks into three difficulty levels, where Level 3 represents problems with both high uncertainty and a complex, non-linear path to a solution. To generate these challenging tasks, we introduce SailorFog-QA, a novel data synthesis pipeline that constructs intricate knowledge graphs and then applies information obfuscation. This process creates questions with high initial uncertainty that demand creative exploration and transcend simple, structured reasoning patterns.
7590+
Our training process begins by generating expert trajectories and then reconstructing the reasoning to create concise, action-oriented supervision signals, avoiding the stylistic and verbosity issues of teacher models. The agent is first given a "cold start" using rejection sampling fine-tuning (RFT) on a small set of high-quality examples to establish a baseline capability. This is followed by an efficient agentic reinforcement learning stage using our Duplicating Sampling Policy Optimization (DUPO) algorithm, which refines the agent's exploratory strategies.
7591+
WebSailor establishes a new state-of-the-art for open-source agents, achieving outstanding results on difficult benchmarks like BrowseComp-en and BrowseComp-zh. Notably, our smaller models like WebSailor-7B outperform agents built on much larger backbones, highlighting the efficacy of our training paradigm. Ultimately, WebSailor closes the performance gap to proprietary systems, achieving results on par with agents like Doubao-Search.
7592+
overrides:
7593+
parameters:
7594+
model: WebSailor-32B.Q4_K_M.gguf
7595+
files:
7596+
- filename: WebSailor-32B.Q4_K_M.gguf
7597+
sha256: 60cea732b8314cedf1807530857b4ebd9f6c41431b3223384eb7f94fbff7b5bc
7598+
uri: huggingface://mradermacher/WebSailor-32B-GGUF/WebSailor-32B.Q4_K_M.gguf
75827599
- &archfunct
75837600
license: apache-2.0
75847601
tags:

0 commit comments

Comments
 (0)