ypapadop-amd · ypapadop-amd · Dec 4, 2024 · Dec 4, 2024 · Dec 5, 2024 · Dec 5, 2024
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,206 @@
+# GitHub Copilot Instructions for GGML
+
+## Project Overview
+
+GGML is a tensor library for machine learning with a focus on:
+- Low-level cross-platform implementation
+- Integer quantization support for efficient model inference
+- Broad hardware support (CPU, CUDA, Metal, HIP/HSA, SYCL, Vulkan, WebGPU, OpenCL)
+- Automatic differentiation
+- Zero memory allocations during runtime
+- No third-party dependencies for core functionality
+
+**Note:** This project is under active development. Core library development primarily happens in the [llama.cpp](https://github.com/ggerganov/llama.cpp) and [whisper.cpp](https://github.com/ggerganov/whisper.cpp) repositories.
+
+## Build System
+
+### CMake Configuration
+
+- **Minimum CMake version:** 3.14
+- **Languages:** C (C11), C++ (C++17), Assembly
+- **Default build type:** Release (if not specified)
+- **Shared libraries:** Default ON (except MINGW/Emscripten/WASM)
+
+### Building the Project
+
+```bash
+mkdir build && cd build
+cmake ..
+cmake --build . --config Release -j 8
+```
+
+### Key CMake Options
+
+- `BUILD_SHARED_LIBS` - Build shared libraries (default: ON except Windows/MINGW)
+- `GGML_BUILD_TESTS` - Build test suite (default: ON when standalone)
+- `GGML_BUILD_EXAMPLES` - Build example programs (default: ON when standalone)
+- `GGML_CUDA` - Enable CUDA backend
+- `GGML_METAL` - Enable Metal backend (default: ON for Apple platforms)
+- `GGML_HIP` - Enable HIP backend
+- `GGML_HSA` - Enable HSA backend
+- `GGML_SYCL` - Enable SYCL backend
+- `GGML_VULKAN` - Enable Vulkan backend
+- `GGML_BLAS` - Enable BLAS support
+
+## Coding Standards
+
+### Code Style
+
+- **Indentation:** 4 spaces (see `.editorconfig`)
+- **Line endings:** LF (Unix-style)
+- **Charset:** UTF-8
+- **Final newline:** Required
+- **Trailing whitespace:** Remove
+
+### Formatting Tools
+
+- A `.clang-format` file exists in `src/ggml-hsa/` based on LLVM style
+- **Column limit:** 100 characters
+- **Pointer alignment:** Middle (e.g., `int * ptr`)
+- **Brace style:** Attach
+
+### Naming Conventions
+
+- Public API functions: `ggml_*` prefix
+- Backend-specific functions: `ggml_<backend>_*` (e.g., `ggml_cuda_*`, `ggml_metal_*`)
+- Types: `struct ggml_*`
+- Enums: `GGML_*` (uppercase with underscores)
+
+## Architecture
+
+### Directory Structure
+
+```
+├── include/          # Public headers (ggml.h, ggml-*.h, gguf.h)
+├── src/              # Core implementation and backend implementations
+│   ├── ggml.c       # Core tensor library
+│   ├── ggml-cpu/    # CPU-specific optimizations
+│   ├── ggml-cuda/   # CUDA backend
+│   ├── ggml-metal/  # Metal backend
+│   ├── ggml-hip/    # HIP backend
+│   ├── ggml-hsa/    # HSA backend
+│   └── ...          # Other backends
+├── examples/         # Example applications (GPT-2, GPT-J, MNIST, SAM, etc.)
+├── tests/            # Test suite
+├── cmake/            # CMake modules
+├── scripts/          # Utility scripts
+└── docs/             # Documentation (GGUF format spec)
+```
+
+### Key Components
+
+- **ggml.h/ggml.c** - Core tensor operations and compute graph
+- **ggml-backend.h** - Backend abstraction layer
+- **ggml-alloc.h** - Memory allocation utilities
+- **gguf.h** - GGUF file format for model serialization
+- **Backend implementations** - Hardware-specific optimizations
+
+## Testing
+
+### Running Tests
+
+```bash
+cd build
+ctest --output-on-failure
+```
+
+### Test Organization
+
+- Unit tests in `tests/` directory
+- Backend-specific tests in `tests/ggml-<backend>/`
+- Test naming: `test-*.c` or `test-*.cpp`
+- Use CTest for test execution
+
+### Writing Tests
+
+- Follow existing test patterns in `tests/` directory
+- Test both correctness and performance where applicable
+- Include edge cases and boundary conditions
+- Backend tests should verify backend-specific functionality
+
+## Contributing Guidelines
+
+⚠️ **Important:** For changes to the core `ggml` library (including CMake build system):
+- Open a PR in https://github.com/ggml-org/llama.cpp first
+- This ensures better visibility, testing, and review
+- See [CONTRIBUTING.md](../CONTRIBUTING.md) for details
+
+### Pull Request Process
+
+1. Ensure code follows the established style
+2. Add or update tests as needed
+3. Verify all tests pass locally
+4. Update documentation if changing public APIs
+5. Keep changes focused and minimal
+
+## Common Tasks
+
+### Adding a New Backend
+
+1. Create `src/ggml-<backend>/` directory
+2. Implement backend interface defined in `ggml-backend.h`
+3. Add CMakeLists.txt with appropriate options
+4. Create public header `include/ggml-<backend>.h`
+5. Add tests in `tests/ggml-<backend>/`
+6. Update main CMakeLists.txt with new options
+
+### Adding New Tensor Operations
+
+1. Add operation to `enum ggml_op` in `include/ggml.h`
+2. Implement forward pass in `src/ggml.c`
+3. Implement backward pass (gradient) if needed
+4. Add operation to backend implementations
+5. Add comprehensive tests
+6. Update documentation
+
+### Optimizing Existing Operations
+
+1. Profile to identify bottlenecks
+2. Consider SIMD/vectorization opportunities (see `src/ggml-cpu/`)
+3. Implement backend-specific optimizations
+4. Add performance tests
+5. Verify correctness with existing tests
+
+## Backend-Specific Notes
+
+### CUDA Backend
+- Use `ggml_cuda.h` for CUDA-specific APIs
+- CUDA kernels in `src/ggml-cuda/`
+
+### Metal Backend
+- macOS/iOS GPU acceleration
+- Shaders in Metal Shading Language
+- Default ON for Apple platforms
+
+### HIP/HSA Backends
+- AMD GPU support
+- Use appropriate compiler flags for ROCm
+
+### CPU Backend
+- SIMD optimizations in `src/ggml-cpu/`
+- Multiple implementations for different architectures
+- llamafile integration for optimized matrix multiplication
+
+## Python Bindings
+
+Python bindings are available in `examples/python/`:
+- Auto-generated using CFFI
+- Support for quantized tensors with automatic conversion
+- See `examples/python/README.md` for usage
+
+## Resources
+
+- [Introduction to ggml](https://huggingface.co/blog/introduction-to-ggml)
+- [GGUF file format specification](../docs/gguf.md)
+- [llama.cpp project](https://github.com/ggerganov/llama.cpp) - Primary development hub
+- [whisper.cpp project](https://github.com/ggerganov/whisper.cpp) - Speech recognition with ggml
+
+## Important Reminders
+
+1. **Minimal changes**: Make surgical, focused changes
+2. **Test early and often**: Run tests after each significant change
+3. **Follow existing patterns**: Match the style and structure of existing code
+4. **Consider performance**: GGML is performance-critical; profile changes
+5. **Cross-platform**: Ensure changes work on Linux, macOS, and Windows
+6. **Documentation**: Update comments and docs for public API changes
+7. **Upstream first**: Core changes should go to llama.cpp repository first
diff --git a/.github/workflows/format.yml b/.github/workflows/format.yml
@@ -0,0 +1,74 @@
+name: Format Code
+
+on:
+  push:
+    branches: [ hsa-backend ]
+    paths:
+      - 'src/ggml-hsa/**'
+  pull_request:
+    branches: [ hsa-backend ]
+    paths:
+      - 'src/ggml-hsa/**'
+  workflow_dispatch:
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  format:
+    runs-on: ubuntu-latest
+
+    steps:
+    - name: Clone
+      uses: actions/checkout@v6
+      with:
+        ref: ${{ github.head_ref || github.ref_name }}
+        fetch-depth: 0
+        token: ${{ secrets.GITHUB_TOKEN }}
+
+    - name: Set up Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.x'
+
+    - name: Install formatters
+      run: |
+        sudo mkdir -p /etc/apt/keyrings
+        curl -fsSL https://apt.llvm.org/llvm-snapshot.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/llvm-snapshot.gpg
+        echo "deb [signed-by=/etc/apt/keyrings/llvm-snapshot.gpg] http://apt.llvm.org/$(lsb_release -cs)/ llvm-toolchain-$(lsb_release -cs)-22 main" | sudo tee /etc/apt/sources.list.d/llvm-toolchain-22.list
+        sudo apt-get update
+        sudo apt-get install -y clang-format-22
+        pip install black
+
+    - name: Format C++ code with clang-format
+      run: |
+        find src/ggml-hsa -type f \( -name '*.cpp' -o -name '*.cc' -o -name '*.hpp' -o -name '*.h' \) \
+          -exec clang-format-22 -i --style=file:src/ggml-hsa/.clang-format {} +
+
+    - name: Format Python code with black
+      run: |
+        black src/ggml-hsa
+
+    - name: Check for changes
+      id: verify
+      run: |
+        if ! git diff --exit-code; then
+          echo "changes=true" >> $GITHUB_OUTPUT
+        else
+          echo "changes=false" >> $GITHUB_OUTPUT
+        fi
+
+    - name: Commit and push formatting changes
+      if: steps.verify.outputs.changes == 'true'
+      run: |
+        git config user.name "github-actions[bot]"
+        git config user.email "github-actions[bot]@users.noreply.github.com"
+        git add src/ggml-hsa
+        git commit -m "Auto-format code in src/ggml-hsa
+
+        - Format C++ code with clang-format
+        - Format Python code with black
+
+        Co-Authored-By: github-actions[bot] <github-actions[bot]@users.noreply.github.com>"
+        git push
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -213,6 +213,7 @@ option(GGML_HIP_NO_VMM                      "ggml: do not try to use HIP VMM"
 option(GGML_HIP_ROCWMMA_FATTN               "ggml: enable rocWMMA for FlashAttention"         OFF)
 option(GGML_HIP_MMQ_MFMA                    "ggml: enable MFMA MMA for CDNA in MMQ"           ON)
 option(GGML_HIP_EXPORT_METRICS              "ggml: enable kernel perf metrics output"         OFF)
+option(GGML_HSA                             "ggml: use HSA"                                   OFF)
 option(GGML_MUSA_GRAPHS                     "ggml: use MUSA graph, experimental, unstable"    OFF)
 option(GGML_MUSA_MUDNN_COPY                 "ggml: enable muDNN for accelerated copy"         OFF)
 option(GGML_VULKAN                          "ggml: use Vulkan"                                OFF)
@@ -319,6 +320,7 @@ set(GGML_PUBLIC_HEADERS
     include/ggml-cann.h
     include/ggml-cpp.h
     include/ggml-cuda.h
+    include/ggml-hsa.h
     include/ggml-opt.h
     include/ggml-metal.h
     include/ggml-rpc.h

diff --git a/README.md b/README.md
@@ -57,6 +57,12 @@ cmake -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc ..
 cmake -DCMAKE_C_COMPILER="$(hipconfig -l)/clang" -DCMAKE_CXX_COMPILER="$(hipconfig -l)/clang++" -DGGML_HIP=ON
 ```
 
+## Using HSA
+
+```bash
+cmake -DCMAKE_C_COMPILER="$(hipconfig -l)/clang" -DCMAKE_CXX_COMPILER="$(hipconfig -l)/clang++" -DGGML_HSA=ON
+```
+
 ## Using SYCL
 
 ```bash

diff --git a/cmake/ggml-config.cmake.in b/cmake/ggml-config.cmake.in
@@ -83,6 +83,11 @@ if (NOT GGML_SHARED_LIB)
         set(GGML_HIP_INTERFACE_LINK_LIBRARIES hip::host roc::rocblas roc::hipblas)
     endif()
 
+    if (GGML_HSA)
+        find_package(hsa-runtime64 1.0 REQUIRED)
+        set(GGML_HSA_INTERFACE_LINK_LIBRARIES hsa-runtime64::hsa-runtime64)
+    endif()
+
     if (GGML_SYCL)
         set(GGML_SYCL_INTERFACE_LINK_LIBRARIES "")
         find_package(DNNL)

diff --git a/examples/gpt-2/main-backend.cpp b/examples/gpt-2/main-backend.cpp
@@ -11,6 +11,10 @@
 #include "ggml-metal.h"
 #endif
 
+#ifdef GGML_USE_HSA
+#include "ggml-hsa.h"
+#endif
+
 #include "common.h"
 #include "common-ggml.h"
 
@@ -220,6 +224,16 @@ bool gpt2_model_load(const std::string & fname, gpt2_model & model, gpt_vocab &
     }
 #endif
 
+#ifdef GGML_USE_HSA
+    if (n_gpu_layers > 0) {
+        fprintf(stderr, "%s: using HSA backend\n", __func__);
+        model.backend = ggml_backend_hsa_init(0);
+        if (!model.backend) {
+            fprintf(stderr, "%s: ggml_backend_hsa_init() failed\n", __func__);
+        }
+    }
+#endif
+
     if (!model.backend) {
         // fallback to CPU backend
         fprintf(stderr, "%s: using CPU backend\n", __func__);
@@ -231,6 +245,12 @@ bool gpt2_model_load(const std::string & fname, gpt2_model & model, gpt_vocab &
         return false;
     }
 
+    ggml_backend_dev_t device = ggml_backend_get_device(model.backend);
+    size_t total_memory = 0;
+    size_t free_memory = 0;
+    ggml_backend_dev_memory(device, &free_memory, &total_memory);
+    fprintf(stderr, "%s: free memory %zu, total memory %zu\n", __func__, free_memory, total_memory);
+
     // create the tensors for the model
     {
         const auto & hparams = model.hparams;

diff --git a/examples/gpt-2/main-sched.cpp b/examples/gpt-2/main-sched.cpp
@@ -15,6 +15,11 @@
 #include "ggml-blas.h"
 #endif
 
+#ifdef GGML_USE_HSA
+#include "ggml-hsa.h"
+#endif
+
+
 #include "common.h"
 #include "common-ggml.h"
 
@@ -145,6 +150,18 @@ void init_backends(gpt2_model & model, const gpt_params & params) {
     }
 #endif
 
+#ifdef GGML_USE_HSA
+    if (params.n_gpu_layers > 0) {
+        fprintf(stderr, "%s: using HSA backend\n", __func__);
+        ggml_backend_t hsa_backend = ggml_backend_hsa_init(0);
+        if (!hsa_backend) {
+            fprintf(stderr, "%s: ggml_backend_hsa_init() failed\n", __func__);
+        } else {
+            model.backends.push_back(hsa_backend);
+        }
+    }
+#endif
+
     // always add the CPU backend as a fallback
     ggml_backend_t cpu_backend = ggml_backend_cpu_init();
     ggml_backend_cpu_set_n_threads(cpu_backend, params.n_threads);