diff --git a/openseek/competition/LongContext-ICL-Annotation/README.md b/openseek/competition/LongContext-ICL-Annotation/README.md index 25d1082..664bb9b 100644 --- a/openseek/competition/LongContext-ICL-Annotation/README.md +++ b/openseek/competition/LongContext-ICL-Annotation/README.md @@ -9,36 +9,6 @@ Large Language Models Automatic Data Annotation under Long-Context Scenarios. [FlagOS开放计算全球挑战赛- AI赛事通 | 数据算法赛](https://www.competehub.dev/zh/competitions/modelscope180) -## Introduction - -The LongContext-ICL-Annotation Challenge focuses on automatic data annotation under long-context settings using Large Language Models (LLMs). The competition is built upon the Qwen3-4B model and adopts the In-context Learning (ICL) paradigm to investigate scalable and high-quality automated annotation methods. - -Participating teams are required to use the officially provided datasets and design effective ICL-based annotation solutions tailored for ultra-long context scenarios. All submissions will be evaluated on a unified benchmark dataset. The Organizing Committee will conduct standardized evaluations and determine the final rankings based on the official evaluation results. - -## Objectives - -This challenge takes Large Language Models (LLMs) as the core technical foundation and targets automated data annotation under ultra-long context constraints, aiming to explore novel paradigms that balance annotation efficiency and annotation accuracy. The competition focuses on the following key scientific and engineering challenges: - -- 1. Instruction and Prompt Design: - - How can effective model instructions and prompt strategies be designed in ultra-long context scenarios to guide LLMs toward stable and high-quality data annotation? -- 2. Ultra-Long Context Construction: - - When the number of available annotation examples significantly exceeds the model’s context capacity, how can information-dense and structurally coherent ultra-long context inputs be constructed for target data annotation? -- 3. Multi-Turn and Continuous Annotation: - - In automated multi-round dialogue or continuous interaction settings, how can ultra-long contexts be efficiently leveraged to achieve both consistency and scalability in data annotation? - -## Challenge Details - -- Participating teams are expected to independently design a complete LLM-based automatic data annotation pipeline and validate their approach under a unified dataset and evaluation protocol. Evaluation scores and rankings will be published on a standardized leaderboard. - -- In addition to prediction results, teams must submit a technical report and fully reproducible source code in accordance with the competition requirements. The Organizing Committee will reproduce submitted solutions and review the technical design. The final score will be calculated as a weighted combination of prediction performance and technical solution evaluation, with detailed rules specified by the competition. - -- Teams are required to submit their technical reports and complete source code to the official OpenSeek GitHub repository designated by the competition. - -- For additional details, please refer to [FlagOS platform](https://flagos.io/RaceDetail?id=296fmsd8&lang=en). All competition-related information is subject to the announcements published on the official platform. - ## Quick Start ### 1. Environment Setup @@ -49,8 +19,6 @@ torch flagScale ``` -On NVIDIA platforms, it is recommended to create the environment using: `cd src && bash create_env_nvidia.sh` - ### 2. Download Model Weights ```bash diff --git a/openseek/competition/LongContext-ICL-Annotation/README_cn.md b/openseek/competition/LongContext-ICL-Annotation/README_cn.md new file mode 100644 index 0000000..dda97fa --- /dev/null +++ b/openseek/competition/LongContext-ICL-Annotation/README_cn.md @@ -0,0 +1,71 @@ +# 超长长上下文场景中LLM自动数据标注挑战赛 + +--- + +## 消息 + +- **[2026-01-20] `发布`:** 赛事信息已在 **Kaggle** 正式上线。详情见:[FlagOS Open Computing Global Challenge](https://www.kaggle.com/competitions/flag-os-open-computing-global-challenge). +- **[2026-01-06] `发布`:** 由 **众智 FlagOS 社区**、**北京智源人工智能研究院(BAAI)** 与 **CCF ODTC** 联合主办的综合性大赛 **FlagOS 开放计算全球挑战赛** 正式发布。详情见: + [FlagOS开放计算全球挑战赛- AI赛事通 | 数据算法赛](https://www.competehub.dev/zh/competitions/modelscope180) + + +--- + +## 快速开始 +### 1. 环境 + +```bash +openai +torch +flagScale +``` + + +### 2. 下载模型权重 +```bash +hf download Qwen/Qwen3-4B --local-dir Qwen3-4B +# or +modelscope download --model Qwen/Qwen3-4B +``` +### 3. 长文本配置 +在`Qwen3-4B/config.json`将原有配置替换为: +```json +"rope_scaling": { + "rope_type": "yarn", + "factor": 4.0, + "original_max_position_embeddings": 32768 +} +``` +### 4. 模型部署 + +请根据实际需求,配置 `llm_config.yaml` 文件。启动配置 + +```bash +cd FlagScale +python run.py --config-path .. --config-name llm_config action=run +``` + +在模型服务启动后,可通过以下方式测试本地 API: + +```bash +python api_test.py +``` + +如需停止服务,请执行: + +```bash +python run.py --config-path .. --config-name llm_config action=stop +``` + +### 5. 运行/改进基线方法(Baseline) + +启动如下命令开始模型标注 +```bash +python main.py +``` + +实现新的标注方法,请修改`method.py`文件。你可以在该文件中: +* 定义新的指令模板、 +* 定义新的上下文示例选择策略 +* 定义新的模型推理、标注方案 +* 添加自定义后处理逻辑 diff --git a/openseek/competition/LongContext-ICL-Annotation/src/create_env_nvidia.sh b/openseek/competition/LongContext-ICL-Annotation/src/create_env_nvidia.sh deleted file mode 100644 index 8861624..0000000 --- a/openseek/competition/LongContext-ICL-Annotation/src/create_env_nvidia.sh +++ /dev/null @@ -1,46 +0,0 @@ - -git clone https://github.com/FlagOpen/FlagScale.git -cd FlagScale - -source ~/miniconda3/etc/profile.d/conda.sh -conda create -n flagscale python=3.11.11 -y -conda activate flagscale - -pip install --upgrade setuptools - -pip --trusted-host pypi.tuna.tsinghua.edu.cn install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 - -pip install -r ./requirements/requirements-base.txt -pip install -r ./requirements/requirements-common.txt - -pip install deepspeed -pip3 install --no-build-isolation transformer_engine[pytorch]==2.6.0.post1 -pip install nvidia-cudnn-frontend - -cu=$(nvcc --version | grep "Cuda compilation tools" | awk '{print $5}' | cut -d '.' -f 1) -torch=$(pip show torch | grep Version | awk '{print $2}' | cut -d '+' -f 1 | cut -d '.' -f 1,2) -cp=$(python3 --version | awk '{print $2}' | awk -F. '{print $1$2}') -flash_attn_version="2.8.3" -echo "https://github.com/Dao-AILab/flash-attention/releases/download/v${flash_attn_version}/flash_attn-${flash_attn_version}+cu${cu}torch${torch}-cp${cp}-cp${cp}-linux_x86_64.whl" -wget --continue --timeout=60 --no-check-certificate --tries=5 --waitretry=10 https://github.com/Dao-AILab/flash-attention/releases/download/v${flash_attn_version}/flash_attn-${flash_attn_version}+cu${cu}torch${torch}-cp${cp}-cp${cp}-linux_x86_64.whl -flash_attn-${flash_attn_version}+cu${cu}torch${torch}-cp${cp}-cp${cp}-linux_x86_64.whl -# Recommend to download the wheel handly, for example flash_attn-2.8.3+cu12torch2.6cxx11abiFALSE-cp311-cp311-linux_x86_64 -pip install flash_attn-2.8.3+cu124torch2.6-cp311-cp311-linux_x86_64.whl - -# maybe slow, be patient -pip install --no-build-isolation "git+https://github.com/Dao-AILab/flash-attention.git@v2.7.2#egg=flashattn-hopper&subdirectory=hopper" - - -# Maybe slow too, be patient -pip install -r ./requirements/inference/requirements.txt -pip install vllm==0.8.5 -python tools/patch/unpatch.py --backend llama.cpp -python tools/patch/unpatch.py --backend omniinfer -python tools/patch/unpatch.py --backend Megatron-LM - -pip install build -pip install setuptools-scm -pip install "git+https://github.com/state-spaces/mamba.git@v2.2.4" - -pip install -r ./requirements/serving/requirements.txt -pip install --no-build-isolation git+https://github.com/FlagOpen/FlagGems.git@release_v1.0.0 \ No newline at end of file