Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZLUDA v3.8.8 #71

Merged
merged 39 commits into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
0fd8c94
Update MIOpen.
lshqqytiger Mar 29, 2024
a60a8cf
bindgen
lshqqytiger Mar 30, 2024
c98ca55
wip
lshqqytiger Mar 31, 2024
5b8e627
wip
lshqqytiger Apr 3, 2024
01433fc
Remove unused functions.
lshqqytiger Apr 10, 2024
4f79acf
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger Apr 17, 2024
3027bee
Update MIOpen. (graph api)
lshqqytiger Apr 17, 2024
b567623
[WIP] Graph API.
lshqqytiger Apr 20, 2024
15dd55a
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger Apr 21, 2024
7137146
WIP
lshqqytiger Apr 27, 2024
277c3a5
WIP
lshqqytiger Apr 27, 2024
d8fbbbd
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger Apr 28, 2024
7565ad1
Implement cudnnGetProperty.
lshqqytiger Apr 28, 2024
a086ad7
WIP
lshqqytiger Apr 29, 2024
33aa37a
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger Apr 30, 2024
e20b03d
WIP
lshqqytiger Apr 30, 2024
6464bda
wip
lshqqytiger May 17, 2024
b8893b7
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger May 17, 2024
47c0e72
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger May 21, 2024
796ca8d
[Fix] Handle stream correctly.
lshqqytiger May 21, 2024
485e336
Merge branch 'module/zluda_runtime' into future/module/zluda_dnn
lshqqytiger May 21, 2024
66b3d22
[Fix] Handle stream correctly.
lshqqytiger May 21, 2024
7dfd642
WIP
lshqqytiger May 29, 2024
75d332d
Merge branch 'module/zluda_runtime' into future/module/zluda_dnn
lshqqytiger May 31, 2024
40d46b3
wip
lshqqytiger Jul 13, 2024
3caec25
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger Jul 13, 2024
3717a96
Add cudaDeviceSynchronize.
lshqqytiger Feb 3, 2025
8c2b6ff
Merge branch 'master' into future/module/zluda_dnn
lshqqytiger Feb 3, 2025
98e058f
Add zluda_get_nightly_flag to recognize nightly build.
lshqqytiger Feb 5, 2025
afd4139
Merge branch 'dev' into future/module/zluda_dnn
lshqqytiger Feb 5, 2025
f359494
wip cudnn 9
lshqqytiger Feb 5, 2025
bc26a22
backend api wip
lshqqytiger Feb 11, 2025
f3706c4
Clean up.
lshqqytiger Feb 12, 2025
ea6c588
Clean up.
lshqqytiger Feb 12, 2025
d7a2d72
Fix invalid pointer dereferencing issue.
lshqqytiger Feb 12, 2025
44ca04c
Add other compute types for cublasLt matmul descriptors.
lshqqytiger Feb 12, 2025
c7ad267
Update README.md.
lshqqytiger Feb 12, 2025
a5ae6b4
Disable cuDNN build by default.
lshqqytiger Feb 12, 2025
903eb76
Fix cuDNN release build.
lshqqytiger Feb 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
902 changes: 593 additions & 309 deletions Cargo.lock

Large diffs are not rendered by default.

48 changes: 44 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,10 @@

ZLUDA lets you run unmodified CUDA applications with near-native performance on ~~Intel~~ AMD GPUs.

ZLUDA is currently alpha quality, but it has been confirmed to work with a variety of native CUDA applications: Geekbench, 3DF Zephyr, Blender, Reality Capture, LAMMPS, NAMD, waifu2x, OpenFOAM, Arnold (proof of concept) and more.
ZLUDA is currently alpha quality, but it has been confirmed to work with a variety of native CUDA applications: Geekbench, 3DF Zephyr, Blender, PyTorch on Windows, Reality Capture, LAMMPS, NAMD, waifu2x, OpenFOAM, Arnold (proof of concept) and more.

If you want to give it a try, download it from Release page to the right and read [Usage](#usage) and [Known Issues](#known-issues) sections below. If you are interested in its history and future read [FAQ](#faq) section further below.

![geekbench.svg](geekbench.svg)

## Usage

### Windows
Expand Down Expand Up @@ -41,7 +39,7 @@ Make sure you have the following installed:
- Git
- CMake
- Python 3
- Rust (1.7x or newer)
- Rust (1.81 or newer)
- C++ compiler
- [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/install_overview.html) 6.0+ (or [HIP SDK](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/) on Windows)
- (Windows only) Recent [AMD Radeon Software Adrenalin](https://www.amd.com/en/technologies/software)
Expand All @@ -65,6 +63,38 @@ Build by running:
cargo xtask --release
```

### Nightly Build (Windows-only)

You can enable unstable features by turning `--nightly` flag on.

```
cargo xtask --nightly
```

This will enable the following modules.

`--nightly` flag can be combined with `--release`.

※ Nightly builds receive very limited amount of tests. You'd like to just disable the unsupported features rather than using nightly build if possible.

#### cuBLASLt

In Windows, cuBLASLt support is disabled by default because AMD haven't released hipBLASLt on Windows yet.

Even though, nightly ZLUDA has cuBLASLt support with unofficial build of hipBLASLt.

#### cuDNN

In Windows, cuDNN support is disabled by default because AMD haven't released MIOpen on Windows yet.

Even though, nightly ZLUDA has cuDNN support with unofficial build of MIOpen.

However, because MIOpen itself is very unstable and incomplete, there are some limitations as described below.

- Custom build of MIOpen without rocMLIR and composable kernel is only tested.
- Only FP32 is supported for Conv2d in gfx1100.
- There is small memory leak issue due to technical difficulties.

## Unknown issues

If an application fails to start under ZLUDA or crashes please check [Known Issues](#known-issues) section below. If nothing there applies, then please read [TROUBLESHOOTING.md](TROUBLESHOOTING.md).
Expand Down Expand Up @@ -112,12 +142,16 @@ If an application fails to start under ZLUDA or crashes please check [Known Issu

Firstly, ZLUDA ignores some of the floating point denormal and rounding mode information present in the kernels. Secondly, for certain approximate (not IEEE 754) NVIDIA floating point operations in CUDA, ZLUDA blindly uses approximate AMD floating point operations. The two might have a different precision.

- PyTorch: `torch.stft` does not always return correct result.

#### CUDA 12+

- Application built with CUDA 12 and using Thrust crashes with `LLVM ERROR: unsupported libcall legalization`.

This is a ROCm/HIP bug. Currently, CUDA applications built with CUDA versions pre-12 work the best. Building with CUDA 12 and a pre-CUDA 12 Thrust might also work.

- PyTorch built for CUDA 12+ will not work.

#### OptiX

- ZLUDA has a bare-minimum OptiX implementation for Arnold. See details in [Arnold](#arnold) section.
Expand Down Expand Up @@ -239,6 +273,12 @@ Performance is currently much lower than the native HIP backend, see the discuss
torch.backends.cuda.enable_mem_efficient_sdp(False)
```

If you have PyTorch version >2.4 and are not using nightly build, set the following environment variable.

```
DISABLE_ADDMM_CUDA_LT=1
```

If you have an issue while running `torch.topk`, insert the codes below

```py
Expand Down
1 change: 0 additions & 1 deletion geekbench.svg

This file was deleted.

2 changes: 1 addition & 1 deletion miopen-sys/README
Original file line number Diff line number Diff line change
@@ -1 +1 @@
bindgen /opt/rocm/include/miopen/miopen.h -o src/miopen.rs --no-layout-tests --size_t-is-usize --default-enum-style=newtype --no-derive-debug --allowlist-function "miopen.*" --allowlist-var "MIOPEN_*" --must-use-type miopenStatus_t -- -D__HIP_PLATFORM_AMD__ -DMIOPEN_BACKEND_HIP=1 -I/opt/rocm/include -x c++
bindgen $Env:HIP_PATH/include/miopen/miopen.h -o src/miopen.rs --no-layout-tests --default-enum-style=newtype --no-derive-debug --allowlist-function "miopen.*" --allowlist-var "MIOPEN_*" --must-use-type miopenStatus_t -- -D__HIP_PLATFORM_AMD__ -DMIOPEN_BACKEND_HIP=1 -DMIOPEN_BETA_API=1 -I"$Env:HIP_PATH/include" -x c++
19 changes: 17 additions & 2 deletions miopen-sys/build.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,19 @@
fn main() {
use std::env::VarError;
use std::{env, path::PathBuf};

fn main() -> Result<(), VarError> {
println!("cargo:rustc-link-lib=dylib=MIOpen");
println!("cargo:rustc-link-search=native=/opt/rocm/lib/");
if cfg!(windows) {
let env = env::var("CARGO_CFG_TARGET_ENV")?;
if env == "msvc" {
let mut path = PathBuf::from(env::var("HIP_PATH")?);
path.push("lib");
println!("cargo:rustc-link-search=native={}", path.display());
} else {
println!("cargo:rustc-link-search=native=C:\\Windows\\System32");
};
} else {
println!("cargo:rustc-link-search=native=/opt/rocm/lib/");
}
Ok(())
}
Loading
Loading