Skip to content

Latest commit

 

History

History
258 lines (202 loc) · 7.93 KB

File metadata and controls

258 lines (202 loc) · 7.93 KB

Getting Started

Overview

FlagScale leverages Hydra for configuration management. The configurations are organized into two levels: an outer experiment-level YAML file and an inner task-level YAML file.

  • The experiment-level YAML file defines the experiment directory, backend engine, task type, and other related environmental configurations.

  • The task-level YAML file specifies the model, dataset, and parameters for specific tasks such as training or inference.

All valid configurations in the task-level YAML file correspond to the arguments used in backend engines such as Megatron-LM and vllm, with hyphens (-) replaced by underscores (_). For a complete list of available configurations, please refer to the backend engine documentation. You can simply copy and modify the existing YAML files in the examples folder to get started.

🔧 Setup

We recommend using the latest release of NGC's PyTorch container for setup.

  1. Clone the repository:

    git clone https://github.com/flagos-ai/FlagScale.git
  2. Install FlagScale requirements

    pip install . --verbose 
  3. Install backends

  • Inference/Serving backend

    vLLM-FL:

    git clone https://github.com/flagos-ai/vllm-FL
    cd vllm-FL
    pip install -e .

    See more details in vLLM-FL

    If you need vLLM-plugin-FL, see details in vLLM-plugin-FL

  • Traning backend Megatron-LM-FL:

    git clone https://github.com/flagos-ai/Megatron-LM-FL
    cd Megatron-LM-FL
    pip install --no-build-isolation .[mlm,dev]
    
    git clone https://github.com/NVIDIA/apex
    cd apex
    APEX_CPP_EXT=1 APEX_CUDA_EXT=1 pip install -v --no-build-isolation .

    See more details in Megatron-LM-FL

Run a Task

FlagScale provides a unified runner for various tasks, including training, inference and serving. Simply specify the configuration file to run the task with a single command. The runner will automatically load the configurations and execute the task. The following sections demonstrate how to run a distributed training task.

Train

Require Megatron-LM-FL env

  1. Prepare dataset demo and tokenizer:

    • dataset

      We provide a small processed data (bin and idx) from the Pile dataset.

      mkdir -p ./data && cd ./data
      wget https://model.ks3-cn-beijing.ksyuncs.com/nlpdata/pile_wikipedia_demo.idx
      wget https://model.ks3-cn-beijing.ksyuncs.com/nlpdata/pile_wikipedia_demo.bin
    • tokenizer

      mkdir -p ./qwentokenizer && cd ./qwentokenizer
      wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/tokenizer_config.json" -O tokenizer_config.json
      wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/qwen.tiktoken" -O qwen.tiktoken    
      wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/qwen_generation_utils.py" -O qwen_generation_utils.py
      wget "https://baai-flagscale.ks3-cn-beijing.ksyuncs.com/tokenizers/qwentokenizer/tokenization_qwen.py" -O tokenization_qwen.py
  2. Edit config:

    Modify the data_path and tokenizer_path in ./examples/qwen3/conf/train/0_6b.yaml at line 81 and line 87

    data:
        data_path: ./data/pile_wikipedia_demo    # modify data_path here
        split: 1
        no_mmap_bin_files: true
        tokenizer:
            legacy_tokenizer: true
            tokenizer_type: QwenTokenizerFS
            tokenizer_path: ./qwentokenizer   # modify tokenizer_path here
            vocab_size: 151936
            make_vocab_size_divisible_by: 64

    Modify config in ./examples/qwen3/conf/train.yaml at line 3

    defaults:
      - _self_
      - train: 0_6b  # modify: train value must match its corresponding config file name
  3. Start the distributed training job:

    python run.py --config-path ./examples/qwen3/conf --config-name train action=run
  4. Stop the distributed training job:

    python run.py --config-path ./examples/qwen3/conf --config-name train action=stop

Inference

Require vLLM-FL env

  1. Prepare model

    modelscope download --model Qwen/Qwen3-0.6B --local_dir ./Qwen3-0.6B
  2. Edit config

    Modify model path in ./examples/qwen3/conf/inference/0_6b.yaml at line 2

    llm:
        model: ./Qwen3-0.6B         # modify: Set model directory
        trust_remote_code: true
        tensor_parallel_size: 1
        pipeline_parallel_size: 1
        gpu_memory_utilization: 0.9
        seed: 1234

    Modify config in ./examples/qwen3/conf/inference_fl.yaml at line 3

    defaults:
      - _self_
      - inference: 0_6b    # modify: Inference value must match its corresponding config file name
  3. Start inference:

    python run.py --config-path ./examples/qwen3/conf --config-name inference_fl action=run

Serve

  1. Prepare model

    modelscope download --model Qwen/Qwen3-0.6B --local_dir ./Qwen3-0.6B
  2. Edit Config

    Modify model path in ./examples/qwen3/conf/serve/0_6b.yaml at line 3

    - serve_id: vllm_model
      engine_args:
        model: ./Qwen3-0.6B          # modify: Set model directory 
        host: 0.0.0.0
        max_model_len: 4096
        max_num_seqs: 4
        uvicorn_log_level: warning
        port: 30000                  # A port available in your env, for example: 30000

    Modify config in ./examples/qwen3/conf/serve.yaml at line 3

    defaults:
      - _self_
      - serve: 0_6b         # modify: Serve value must match its corresponding config file name
    experiment:
      exp_name: qwen3-0.6b  # modify as needed for test clarity
      exp_dir: outputs/${experiment.exp_name}
      task:
        type: serve
        backend: vllm
      runner:
        hostfile: null
        deploy:
        use_fs_serve: false
      envs:
        CUDA_VISIBLE_DEVICES: 0
        CUDA_DEVICE_MAX_CONNECTIONS: 1
  3. Start the server:

    python run.py --config-path ./examples/qwen3/conf --config-name serve action=run
  4. Stop the server:

    python run.py --config-path ./examples/qwen3/conf --config-name serve action=stop

Serving DeepSeek-R1

We support serving the DeepSeek-R1 model and have implemented the flagscale serve command for one-click deployment. By configuring just two YAML files, you can easily serve the model using the flagscale serve command.

  1. Configure the YAML files:

    FlagScale/
     ├─ examples/
     │   └─ deepseek_r1/
     │        └─ conf/
     │            └─ serve.yaml
     |            └─ hostfile.txt # Set hostfile (optional)
     │            └─ serve/
     │               └─ 671b.yaml # Set model parameters and server port
    

    [!Note] When a task spans more than one nodes, a hostfile.txt is required. Its path should be set in the serve.yaml configuration file.

  2. Install FlagScale CLI:

    cd FlagScale
    pip install . --verbose --no-build-isolation
  3. Start serving:

    flagscale serve deepseek_r1

Note that the flagscale command line supports customzation of service parameters:

flagscale serve <MODEL_NAME> <MODEL_CONFIG_YAML>

The configuration files allow you to specify the necessary parameters and settings for your deployment, ensuring a smooth and efficient serving process.