Skip to content

Releases: ai-dock/llama.cpp-cuda

llama.cpp b9585 with CUDA

10 Jun 05:08
20acfc9

Choose a tag to compare

llama.cpp b9585 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9585
Commit: d73cd076740db9c111d0e58ddd4486904469e75e

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9585-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9585-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9568 with CUDA

09 Jun 05:00
20acfc9

Choose a tag to compare

llama.cpp b9568 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9568
Commit: 7d2b45b4f7b663cda74f23fbc3ce6dc3bd4f6545

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9568-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9568-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9553 with CUDA

08 Jun 05:26
20acfc9

Choose a tag to compare

llama.cpp b9553 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9553
Commit: 9e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9553-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9553-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9544 with CUDA

07 Jun 05:17
20acfc9

Choose a tag to compare

llama.cpp b9544 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9544
Commit: 98d5e8ba8a2642710c9871d05ac1033a3328b884

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9544-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9544-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9536 with CUDA

06 Jun 04:56
20acfc9

Choose a tag to compare

llama.cpp b9536 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9536
Commit: 308f61c31f083251ce8150f10b9ef97679b500b5

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9536-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9536-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9518 with CUDA

05 Jun 05:08
20acfc9

Choose a tag to compare

llama.cpp b9518 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9518
Commit: 7c158fbb4aec1bdc9c81d6ca0e785139f4826fae

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9518-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9518-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9496 with CUDA

04 Jun 05:30
20acfc9

Choose a tag to compare

llama.cpp b9496 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9496
Commit: 94a220cd6745e6e3f8de62870b66fd5b9bc92700

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9496-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9496-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9484 with CUDA

03 Jun 05:18
20acfc9

Choose a tag to compare

llama.cpp b9484 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9484
Commit: 63e66fdd23eda3a2659a7af9ff6ef15d71efbff1

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9484-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9484-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9467 with CUDA

02 Jun 05:19
20acfc9

Choose a tag to compare

llama.cpp b9467 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9467
Commit: 1fd5f4803713ea3e1eda326483c9cc71a572cf02

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9467-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9467-cuda-12.8-arm64.tar.gz
./llama-cli --help

llama.cpp b9444 with CUDA

01 Jun 05:27
20acfc9

Choose a tag to compare

llama.cpp b9444 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b9444
Commit: 6f165c1c64f77024686dc969c3de6f030f274add

CUDA Versions

  • CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Host architectures

Tarballs are published per host CPU architecture (Linux):

  • -amd64.tar.gz — x86_64 (most desktops, servers, cloud VMs)
  • -arm64.tar.gz — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)

GPU compute capability reference

  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200, GH200
  • 10.0: B200, GB200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the tarball matching your host CPU arch and CUDA version, then extract:

# amd64 host
tar -xzf llama.cpp-b9444-cuda-12.8-amd64.tar.gz
# arm64 host (e.g. Grace Blackwell)
tar -xzf llama.cpp-b9444-cuda-12.8-arm64.tar.gz
./llama-cli --help