Name	Name	Last commit message	Last commit date
parent directory ..
README.MD	README.MD
llama2-70b.sh	llama2-70b.sh
llama2-7b.sh	llama2-7b.sh
llama3-70b.sh	llama3-70b.sh
llama3-8b.sh	llama3-8b.sh
mistral-7b.sh	mistral-7b.sh
mixtral7x8b.sh	mixtral7x8b.sh
qwen2-72b.sh	qwen2-72b.sh
qwen2-7b.sh	qwen2-7b.sh

Name

Last commit message

Last commit date

llama.cpp on Intel PVC

First time Setup

module load frameworks/2023.12.15.001
module load spack cmake

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

# Cmake based build
mkdir -p build
cd build
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_SYCL_F16=ON
cmake --build . --config Release -v

Running Experiments

cd llama.cpp/build/bin

#Sample Run for 2 GPUs and input,output length 1024 with batch size 32
ZE_AFFINITY_MASK=0 ./llama-bench -m /gila/Aurora_deployment/sraskar/llm_research/model_weights/GGUF_weights/llama_2_7b_f16.gguf -p 1024 -n 1024 -pg 1024,1024 -b 32 -r 1 -o csv

ZE_AFFINITY_MASK is used to set number of GPUs.

Verify Execution

> module load pti-gpu
> sysmon
=====================================================================================
GPU 0: Intel(R) Data Center GPU Max 1550    PCI Bus: 0000:18:00.0
Vendor: Intel(R) Corporation    Driver Version: 1.3.28202    Subdevices: 2
EU Count: 896    Threads Per EU: 8    EU SIMD Width: 16    Total Memory(MB): 124488.0
Core Frequency(MHz): 1600.0 of 1600.0    Core Temperature(C): 38.0
=====================================================================================
Running Processes: 2
     PID,  Device Memory Used(MB),  Shared Memory Used(MB),  GPU Engines, Executable
  182858,                 13188.9,                     0.0,  COMPUTE;DMA, ./llama-bench
  182943,                     5.5,                     0.0,      UNKNOWN, sysmon
=====================================================================================

This show ./llama-bench is executing on single GPU (both tiles)

Run Benchmakrs

Use provided shell scripts in this directory to run llama-bench for various configurations of input, output lengths and batch sizes. e.g. for running llama2-7b benchmakr. Use

source llama2-7b.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.MD

llama.cpp on Intel PVC

First time Setup

Running Experiments

Verify Execution

Run Benchmakrs

FilesExpand file tree

Max1550

Directory actions

More options

Directory actions

More options

Latest commit

History

Max1550

Folders and files

parent directory

README.MD

llama.cpp on Intel PVC

First time Setup

Running Experiments

Verify Execution

Run Benchmakrs