Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Running multi-node offline engine inference ( via SLURM) #2561

Open
2 tasks done
aflah02 opened this issue Dec 23, 2024 · 36 comments
Open
2 tasks done

[Feature] Running multi-node offline engine inference ( via SLURM) #2561

aflah02 opened this issue Dec 23, 2024 · 36 comments
Assignees
Labels

Comments

@aflah02
Copy link

aflah02 commented Dec 23, 2024

Checklist

Motivation

A lot of academic institutions only allow access to larger node clusters via SLURM and it is not immediately clear how would I reuse the code to run Llama 405B BF16 on 2 nodes (by starting a server) to perform offline inference

Related resources

No response

@zhaochenyang20
Copy link
Collaborator

@aflah02 Thanks for pointing this out. We are looking for contributors since we do not use slurm for long. 😂

@zhaochenyang20 zhaochenyang20 added the help wanted Extra attention is needed label Dec 23, 2024
@aflah02
Copy link
Author

aflah02 commented Dec 23, 2024

@zhaochenyang20 If you have any pointers on how you might approach the problem I can take a stab at this. The issue right now is that I have 0 clue on how to get started with using either the runtime api or the engine api for multinode. They don't seem to support pipeline parallel so the only method seems to be tensor parallel across all GPUs but if I say use 16 GPUs it can't do that directly as it only sees 8 GPUs per node

@aflah02
Copy link
Author

aflah02 commented Dec 23, 2024

I was thinking of using the Engine API and just converting all server args from the CLI commands but then my question would be that in the CLI version you run 2 commands one per node, how would you do that here via the Engine API. Do you run 2 engine calls (one on each node)?

@aflah02
Copy link
Author

aflah02 commented Dec 23, 2024

Btw this is one of my attempts to just load the model but nothing seems to run on the worker node as I only see logs for head node -

#!/bin/bash -l

#SBATCH -o SLURM_Logs/%x_%j_node%t.out
#SBATCH -e SLURM_Logs/%x_%j_node%t.err
#SBATCH -D ./
#SBATCH -J 405B-FP8-Online-TP16-Sglang

#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=18
#SBATCH --mem=224GB

#SBATCH --partition="h100"
#SBATCH --gres=gpu:h100:8

#SBATCH --time=12:00:00

# Load required modules or set environment variables if needed
echo "[INFO] Activating environment on node $SLURM_NODEID"
source /NS/venvs/work/afkhan/slurm_sglang_env/bin/activate || { echo "[ERROR] Failed to activate environment"; exit 1; }

# Define parameters
model="/scratch/sws0/user/afkhan/Models/Llama-3.1-405B-Instruct-FP8"
tp_size=16

echo "[INFO] Running inference"
echo "[INFO] Model: $model"
echo "[INFO] TP Size: $tp_size"

# Define the NCCL init address using the hostname of the head node
HEAD_NODE=$(scontrol show hostname "$SLURM_NODELIST" | head -n 1)
NCCL_INIT_ADDR="${HEAD_NODE}:8000"
echo "[INFO] NCCL_INIT_ADDR for node $SLURM_NODEID: $NCCL_INIT_ADDR"

# Launch processes
if [ "$SLURM_NODEID" -eq 0 ]; then
    echo "[INFO] Launching head node process on $HOSTNAME"
    python3 -m sglang.launch_server \
        --model-path "$model" \
        --tp "$tp_size" \
        --nccl-init-addr "$NCCL_INIT_ADDR" \
        --nnodes 2 \
        --node-rank 0 &

    echo "[INFO] Head node process launched"
elif [ "$SLURM_NODEID" -eq 1 ]; then
    echo "[INFO] Launching worker node process on $HOSTNAME"
    python3 -m sglang.launch_server \
        --model-path "$model" \
        --tp "$tp_size" \
        --nccl-init-addr "$NCCL_INIT_ADDR" \
        --nnodes 2 \
        --node-rank 1 &

    echo "[INFO] Worker node process launched"
else
    echo "[ERROR] Unexpected SLURM_NODEID: $SLURM_NODEID"
fi

echo "[INFO] Waiting for all processes to complete on node $SLURM_NODEID"
wait
echo "[INFO] Processes completed on node $SLURM_NODEID"

@zhaochenyang20
Copy link
Collaborator

Good points. We do not support pipeline parallelism but I do not think this would block the progress of running on slurm. Our team will discuss your issue this Friday. Before that, could you try out some quantization method for llama 405B? Or you can use llama 3.3 70B, which is pretty good.

@zhaochenyang20
Copy link
Collaborator

BTW, would you like to join our bi-weekly meeting this Saturday?

@aflah02
Copy link
Author

aflah02 commented Dec 24, 2024

Thanks for the invite
Unfortunately I'll be traveling over the weekend and it would be hard to join the meeting

I've already had success with running the FP8 version as well as the 70B one on a single node for offline inference. So the only thing left is to go multinode for the BF16 version + to get bigger context length for the FP8 version

@zhaochenyang20
Copy link
Collaborator

Great! How do you run the FP8 version of the 70B model? I think the best way is to first quantize it and then load it, rather than quantizing it online. @aflah02

@aflah02
Copy link
Author

aflah02 commented Dec 24, 2024

Great! How do you run the FP8 version of the 70B model? I think the best way is to first quantize it and then load it, rather than quantizing it online. @aflah02

Sorry for not being clear. I've run 2 models - 70B in BF16 and 405B in FP8. I'm not running 70B in FP8.

My goal now is to somehow run 405B in FP16 so I'm trying out stuff with SLURM configs and the server API but that isn't looking good so I'm thinking of somehow using the engine or runtime API

@zhaochenyang20
Copy link
Collaborator

zhaochenyang20 commented Dec 24, 2024

Yeah. I see. We will discuss this in our weekly meeting on this. BTW, how you quantize 405B model on fp8? @aflah02

@aflah02
Copy link
Author

aflah02 commented Dec 24, 2024

Yeah. I see. We will discuss this in our weekly meeting on this. BTW, how you quantize 405B model on fp8? @aflah02

Thanks that would be awesome!
For FP8 I'm directly using the official meta weights for FP8 and using that as model path when loading the model. I'm not doing any quantization myself but I do remember seeing some logs from SGLang about it doing some quantization related stuff

@zhaochenyang20
Copy link
Collaborator

zhaochenyang20 commented Dec 24, 2024

Cool. Thanks for pointing this out. @JamesSand and I are working on quantization documentation. We will record that "use official repo first" 😂

@zhaochenyang20
Copy link
Collaborator

For the slurm issue, let me update this week. If I haven't replied before next week, please reply to this issue and remind me. Thanks! @aflah02

@aflah02
Copy link
Author

aflah02 commented Dec 25, 2024

Just for reference this is the current script which works well on 1 node -

Python file -

import sglang as sgl
import argparse
import pandas as pd
import json

def main(args):
    print("Arguments: ", args)
    print("Loading model...")

    runtime = sgl.Runtime(
        model_path=args.model,
        tokenizer_path=args.model,     
        tp_size=args.tp_size,  # t_ensor p_arallel size, number of GPUs to split the model over
        log_level="error" ,
        random_seed = 20242
    )

    sgl.set_default_backend(runtime)

    print("Model loaded.")

    temperature = 1
    top_p = 1
    top_k = -1
    min_p = 0
    max_tokens = 20000
    message_role = 'assistant'

    @sgl.function
    def conv_generate(s, system_prompt, old_messages, new_message, max_tokens, message_role):
        s += sgl.system(system_prompt)
        for om in old_messages:
            content = om['content']
            role = om['role']
            if role == 'user':
                s += sgl.user(content)
            elif role == 'assistant':
                s += sgl.assistant(content)
        if message_role == 'user':
            s += sgl.user(new_message + sgl.gen("response", max_tokens=max_tokens))
        elif message_role == 'assistant':
            s += sgl.assistant(new_message + sgl.gen("response", max_tokens=max_tokens))
    

    # Read Queries 
    df = pd.read_csv(args.queries_path)

    queries = df["Question"].tolist()

    # Read Prompt
    with open(args.prompt_path, "r") as f:
        system_prompt = f.read()

    completion_texts = []

    for query in queries:
        state = conv_generate.run(
            system_prompt = system_prompt,
            old_messages = [
                {'content': query, 'role': 'user'}
            ],
            new_message = '', # Start with empty message
            max_tokens=max_tokens,
            message_role=message_role,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            min_p=min_p,
        )
        model_generation  = state['response']
        completion_texts.append(model_generation)
        print("Ran query: ", query)
        print("Model generation: ", model_generation)

    # Save completions
    with open(args.save_path, "w") as f:
        json.dump(completion_texts, f)


if __name__ == "__main__":
    argparser = argparse.ArgumentParser()
    argparser.add_argument("--model", type=str)
    argparser.add_argument("--tp_size", type=int)
    argparser.add_argument("--save_path", type=str)
    argparser.add_argument("--queries_path", type=str)
    argparser.add_argument("--prompt_path", type=str)
    args = argparser.parse_args()
    main(args)

SLURM bash file -

#!/bin/bash -l

#SBATCH -o SLURM_Logs/%x_%j_%A-%T.out
#SBATCH -e SLURM_Logs/%x_%j_%A-%T.err
#SBATCH -D ./
#SBATCH -J 405B-FP8-Off-Sglang

#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=18
#SBATCH --mem=224GB

#SBATCH --partition="h100"
#SBATCH --gres=gpu:h100:8

# Wall clock limit (max. is 24 hours):
#SBATCH --time=12:00:00

# Load required modules or set environment variables if needed
source /NS/venvs/work/afkhan/slurm_sglang_env/bin/activate

model="/scratch/sws0/user/afkhan/Models/Llama-3.1-405B-Instruct-FP8"
tp_size=8
prompt_path="/NS/llm-pretraining/work/afkhan/vLLM-Serving-POC/Data/Prompt_OHB_Chat_Alpha.txt"
save_path="/NS/llm-pretraining/work/afkhan/vLLM-Serving-POC/Data/Outputs_Llama-3.1-405B-Instruct-FP8_OHB_Chat_Alpha_tp_8_sgl_fixed.json"
queries_path="/NS/llm-pretraining/work/afkhan/vLLM-Serving-POC/Data/FAQ_en.csv"

# Run Inference -

echo "Running Inference"
echo "Model: $model"
echo "TP Size: $tp_size"
echo "Save Path: $save_path"
echo "Queries Path: $queries_path"

python sglang_offline_runner.py --model $model --tp_size $tp_size --save_path $save_path --queries_path $queries_path --prompt_path $prompt_path

@zhaochenyang20
Copy link
Collaborator

Just for reference this is the current script which works well on 1 node -

Python file -


import sglang as sgl

import argparse

import pandas as pd

import json



def main(args):

    print("Arguments: ", args)

    print("Loading model...")



    runtime = sgl.Runtime(

        model_path=args.model,

        tokenizer_path=args.model,     

        tp_size=args.tp_size,  # t_ensor p_arallel size, number of GPUs to split the model over

        log_level="error" ,

        random_seed = 20242

    )



    sgl.set_default_backend(runtime)



    print("Model loaded.")



    temperature = 1

    top_p = 1

    top_k = -1

    min_p = 0

    max_tokens = 20000

    message_role = 'assistant'



    @sgl.function

    def conv_generate(s, system_prompt, old_messages, new_message, max_tokens, message_role):

        s += sgl.system(system_prompt)

        for om in old_messages:

            content = om['content']

            role = om['role']

            if role == 'user':

                s += sgl.user(content)

            elif role == 'assistant':

                s += sgl.assistant(content)

        if message_role == 'user':

            s += sgl.user(new_message + sgl.gen("response", max_tokens=max_tokens))

        elif message_role == 'assistant':

            s += sgl.assistant(new_message + sgl.gen("response", max_tokens=max_tokens))

    



    # Read Queries 

    df = pd.read_csv(args.queries_path)



    queries = df["Question"].tolist()



    # Read Prompt

    with open(args.prompt_path, "r") as f:

        system_prompt = f.read()



    completion_texts = []



    for query in queries:

        state = conv_generate.run(

            system_prompt = system_prompt,

            old_messages = [

                {'content': query, 'role': 'user'}

            ],

            new_message = '', # Start with empty message

            max_tokens=max_tokens,

            message_role=message_role,

            temperature=temperature,

            top_p=top_p,

            top_k=top_k,

            min_p=min_p,

        )

        model_generation  = state['response']

        completion_texts.append(model_generation)

        print("Ran query: ", query)

        print("Model generation: ", model_generation)



    # Save completions

    with open(args.save_path, "w") as f:

        json.dump(completion_texts, f)





if __name__ == "__main__":

    argparser = argparse.ArgumentParser()

    argparser.add_argument("--model", type=str)

    argparser.add_argument("--tp_size", type=int)

    argparser.add_argument("--save_path", type=str)

    argparser.add_argument("--queries_path", type=str)

    argparser.add_argument("--prompt_path", type=str)

    args = argparser.parse_args()

    main(args)

SLURM bash file -


#!/bin/bash -l



#SBATCH -o SLURM_Logs/%x_%j_%A-%T.out

#SBATCH -e SLURM_Logs/%x_%j_%A-%T.err

#SBATCH -D ./

#SBATCH -J 405B-FP8-Off-Sglang



#SBATCH --nodes=1

#SBATCH --tasks-per-node=1

#SBATCH --cpus-per-task=18

#SBATCH --mem=224GB



#SBATCH --partition="h100"

#SBATCH --gres=gpu:h100:8



# Wall clock limit (max. is 24 hours):

#SBATCH --time=12:00:00



# Load required modules or set environment variables if needed

source /NS/venvs/work/afkhan/slurm_sglang_env/bin/activate



model="/scratch/sws0/user/afkhan/Models/Llama-3.1-405B-Instruct-FP8"

tp_size=8

prompt_path="/NS/llm-pretraining/work/afkhan/vLLM-Serving-POC/Data/Prompt_OHB_Chat_Alpha.txt"

save_path="/NS/llm-pretraining/work/afkhan/vLLM-Serving-POC/Data/Outputs_Llama-3.1-405B-Instruct-FP8_OHB_Chat_Alpha_tp_8_sgl_fixed.json"

queries_path="/NS/llm-pretraining/work/afkhan/vLLM-Serving-POC/Data/FAQ_en.csv"



# Run Inference -



echo "Running Inference"

echo "Model: $model"

echo "TP Size: $tp_size"

echo "Save Path: $save_path"

echo "Queries Path: $queries_path"



python sglang_offline_runner.py --model $model --tp_size $tp_size --save_path $save_path --queries_path $queries_path --prompt_path $prompt_path

Thanks

@aflah02
Copy link
Author

aflah02 commented Dec 26, 2024

Some more updates. I tried to run the openai compatible version on SLURM on 2 nodes. For the 8B version it works across 2 nodes (tp=16) -

#!/bin/bash -l

#SBATCH -o SLURM_Logs/%x_%j_master.out
#SBATCH -e SLURM_Logs/%x_%j_master.err
#SBATCH -D ./
#SBATCH -J 8B-Online-TP16-Sglang

#SBATCH --nodes=2
#SBATCH --ntasks=2  # Total tasks across all nodes
#SBATCH --cpus-per-task=18
#SBATCH --mem=224GB

#SBATCH --partition="a100"
#SBATCH --gres=gpu:a100:8

#SBATCH --time=12:00:00

# Load required modules or set environment variables if needed
echo "[INFO] Activating environment on node $SLURM_PROCID"
if ! source /NS/venvs/work/afkhan/slurm_sglang_env/bin/activate; then
    echo "[ERROR] Failed to activate environment" >&2
    exit 1
fi

# Define parameters
model="/scratch/sws0/user/afkhan/Models/Llama-3.1-8B-Instruct"
tp_size=16

echo "[INFO] Running inference"
echo "[INFO] Model: $model"
echo "[INFO] TP Size: $tp_size"

# Define the NCCL init address using the hostname of the head node
HEAD_NODE=$(scontrol show hostname "$SLURM_NODELIST" | head -n 1)
NCCL_INIT_ADDR="${HEAD_NODE}:8000"
echo "[INFO] NCCL_INIT_ADDR: $NCCL_INIT_ADDR"

# Set OUTLINES_CACHE_DIR to /tmp/node_0_cache

export OUTLINES_CACHE_DIR="/tmp/node_0_cache"

# Launch processes with srun
srun --ntasks=1 --nodes=1 --exclusive --output="SLURM_Logs/8b_%x_%j_node0.out" \
    --error="SLURM_Logs/%x_%j_node0.err" \
    python3 -m sglang.launch_server \
    --model-path "$model" \
    --tp "$tp_size" \
    --nccl-init-addr "$NCCL_INIT_ADDR" \
    --nnodes 2 \
    --node-rank 0 &

# Set OUTLINES_CACHE_DIR to /tmp/node_1_cache

export OUTLINES_CACHE_DIR="/tmp/node_1_cache"

srun --ntasks=1 --nodes=1 --exclusive --output="SLURM_Logs/8b_%x_%j_node1.out" \
    --error="SLURM_Logs/%x_%j_node1.err" \
    python3 -m sglang.launch_server \
    --model-path "$model" \
    --tp "$tp_size" \
    --nccl-init-addr "$NCCL_INIT_ADDR" \
    --nnodes 2 \
    --node-rank 1 &

# Wait for localhost:30000 to accept connections

while ! nc -z localhost 30000; do
    sleep 1
    echo "[INFO] Waiting for localhost:30000 to accept connections"
done

echo "[INFO] localhost:30000 is ready to accept connections"

# Run the client and echo the output
response=$(curl -s -X POST http://127.0.0.1:30000/v1/chat/completions \
-H "Authorization: Bearer None" \
-H "Content-Type: application/json" \
-d '{
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "List 3 countries and their capitals."
    }
  ],
  "temperature": 0,
  "max_tokens": 64
}')

echo "[INFO] Response from server:"
echo "$response"

However for 405B-FP8 I get a timeout (I use the same script with model path changed to 405B-FP8)

Logs for timeout from one of the nodes (both have identical logs) - https://gist.github.com/aflah02/70150ed8f73f90d351cd8fe9ac049342

@aflah02
Copy link
Author

aflah02 commented Dec 26, 2024

Update: The same code worked on 2 A100 nodes with 8 GPUs each. I am now trying the BF16 version on 2 nodes (both H100 and A100). The original issue still stands (which was running offline-inference). I have been able to run online-inference now which is setting up an OpenAI compatible server and hitting it with requests

Update: The 405B model in BF16 worked on H100 and gave the timeout error in the A100 run (when setting up a serve for online inference). The code is the same as the one above for 8B with model changed to 405B

Update: It seems certain node pairs of mine give errors. So I just picked the pairs that work and enforce their selection in the slurm config

@zhaochenyang20
Copy link
Collaborator

Thanks! @aflah02. I don't know if things have worked out. If not, could you come to our meeting on this?

https://x.com/lmsysorg/status/1872797103107522932

@aflah02
Copy link
Author

aflah02 commented Dec 28, 2024

Thanks! @aflah02. I don't know if things have worked out. If not, could you come to our meeting on this?

https://x.com/lmsysorg/status/1872797103107522932

Thanks @zhaochenyang20 however I'm travelling later today and unfortunately will not be able to make it to the meeting.

Just to update I was able to run the online inference (starting a server) across 2 nodes in slurm however I haven't been able to figure out how to run multinode via the SGLang engine or runtime for offline inference (the original question in this issue)

@zhaochenyang20
Copy link
Collaborator

Yeah. I remember even without slurm, we can't serve llama 405B across multiple nodes with offline engine. @aflah02

Also, would you like to provide a docs markdown or .py to us, demonstrating how to use SRT on slurm? This is a great contribution and we sincerely appreciate this. We will move on for the engine running llama 405B (perhaps also deepseek V3)

@zhaochenyang20 zhaochenyang20 changed the title [Feature] Running multi-node offline inference via SLURM [Feature] Running multi-node offline engine inference ( via SLURM) Dec 28, 2024
@aflah02
Copy link
Author

aflah02 commented Dec 28, 2024

Yeah. I remember even without slurm, we can't serve llama 405B across multiple nodes with offline engine. @aflah02

Also, would you like to provide a docs markdown or .py to us, demonstrating how to use SRT on slurm? This is a great contribution and we sincerely appreciate this. We will move on for the engine running llama 405B (perhaps also deepseek V3)

I can share one on running the server via SLURM. Is that what you call SRT? I guess you can technically connect to the endpoint via setting the runtime backend so it makes sense. It's not running via python though and just carefully recreates how you would do it if you had complete access to both nodes but via slurm

@zhaochenyang20
Copy link
Collaborator

SRT is the HTTP server @aflah02

@aflah02
Copy link
Author

aflah02 commented Dec 28, 2024

SRT is the HTTP server @aflah02

Ah okay nice
Yeah I'll do that sometimes next week once I'm back from travels

It would also be great to have a way to do this via the python api though instead of via running commands on the terminal. Is it currently possible to do that? Running multinode by just setting backend model path and tp-size in the python api?

@zhaochenyang20
Copy link
Collaborator

Sorry. I don't think that's feasible right now. We support running 405B llama in this way:

https://sgl-project.github.io/backend/backend.html#example-run-llama-3-1-405b

Also, my advisor told me that Llama 405B is rarely used since it's performance is worse than Qwen 2.5 72B instruct and Llama 3.3 70B Instruct. Maybe you can by pass this 😂

@aflah02

@aflah02
Copy link
Author

aflah02 commented Dec 29, 2024

Sorry. I don't think that's feasible right now. We support running 405B llama in this way:

https://sgl-project.github.io/backend/backend.html#example-run-llama-3-1-405b

Also, my advisor told me that Llama 405B is rarely used since it's performance is worse than Qwen 2.5 72B instruct and Llama 3.3 70B Instruct. Maybe you can by pass this 😂

@aflah02

Yeah but it's not just about the performance. If I want to say benchmark the model I still need to run it and running it via the CLI is much less convenient as compared to having one python script. It would be really useful if this could be added in future releases.

@aflah02
Copy link
Author

aflah02 commented Dec 29, 2024

Also sorry I couldn't join the meeting yesterday. Any updates on if this is in on the roadmap?

@aflah02
Copy link
Author

aflah02 commented Dec 30, 2024

Hi @zhaochenyang20
I wrote a blog post on running LLMs across multiple SLURM nodes via SGLang - https://aflah02.substack.com/p/multi-node-llm-inference-with-sglang

@zhyncs
Copy link
Member

zhyncs commented Dec 30, 2024

@aflah02 Nice blog!

@aflah02
Copy link
Author

aflah02 commented Dec 30, 2024

@aflah02 Nice blog!

Thanks :)

@zhyncs
Copy link
Member

zhyncs commented Dec 31, 2024

Hi @zhaochenyang20 I wrote a blog post on running LLMs across multiple SLURM nodes via SGLang - https://aflah02.substack.com/p/multi-node-llm-inference-with-sglang

@aflah02 This blog looks good. Can you submit a PR to write some of the SLRUM commands into a script?

@aflah02
Copy link
Author

aflah02 commented Dec 31, 2024

Hi @zhaochenyang20 I wrote a blog post on running LLMs across multiple SLURM nodes via SGLang - https://aflah02.substack.com/p/multi-node-llm-inference-with-sglang

@aflah02 This blog looks good. Can you submit a PR to write some of the SLRUM commands into a script?

Sure
Is there any reference which I should follow on where do I place this and what kind of file should this be (bash v/s markdown) etc?

@zhaochenyang20
Copy link
Collaborator

https://github.com/sgl-project/sglang/tree/main/examples/runtime

I think here is the right place. Markdown is pretty okay, and you can refer to your blog at:

https://aflah02.substack.com/p/multi-node-llm-inference-with-sglang

Great job, thanks! @aflah02

@aflah02
Copy link
Author

aflah02 commented Jan 1, 2025

Thanks! I'll raise a PR shortly

@zhaochenyang20
Copy link
Collaborator

Look forward @aflah02

@zhaochenyang20
Copy link
Collaborator

@aflah02 Hey, how is the PR going?

@aflah02
Copy link
Author

aflah02 commented Jan 11, 2025

Sorry for the delay @zhaochenyang20
I'm caught up in travelling. I'll try to raise a PR sometime next week but if that doesn't happen then I'll try in first week of Feb. I am also trying to run this model on more than 2 nodes and debugging that. I'll also add that into the tutorial once that starts to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants