Skip to content

Commit

Permalink
Nomic vulkan backend licensed under the Software for Open Models Lice…
Browse files Browse the repository at this point in the history
…nse (SOM), version 1.0.
  • Loading branch information
manyoso committed Aug 31, 2023
1 parent d55cbbe commit 987546c
Show file tree
Hide file tree
Showing 13 changed files with 512 additions and 5 deletions.
30 changes: 30 additions & 0 deletions LICENSE_SOM.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Software for Open Models License (SOM)
Version 1.0 dated August 30th, 2023

This license governs use of the accompanying Software. If you use the Software, you accept this license. If you do not accept the license, do not use the Software.

This license is intended to encourage open release of models created, modified, processed, or otherwise used via the Software under open licensing terms, and should be interpreted in light of that intent.

1. Definitions
The “Licensor” is the person or entity who is making the Software available under this license. “Software” is the software made available by Licensor under this license.
A “Model” is the output of a machine learning algorithm, and excludes the Software.
“Model Source Materials” must include the Model and model weights, and may include any input data, input data descriptions, documentation or training descriptions for the Model.
“Open Licensing Terms” means: (a) any open source license approved by the Open Source Initiative, or (b) any other terms that make the Model Source Materials publicly available free of charge, and allow recipients to use, modify and distribute the Model Source Materials. Terms described in (b) may include reasonable restrictions such as non-commercial or non-production limitations, or require use in compliance with law.

2. Grant of Rights. Subject to the conditions and limitations in section 3:
(A) Copyright Grant. Licensor grants you a non-exclusive, worldwide, royalty-free copyright license to copy, modify, and distribute the Software and any modifications of the Software you create under this license. The foregoing license includes without limitation the right to create, modify, and use Models using this Software.

(B) Patent Grant. Licensor grants you a non-exclusive, worldwide, royalty-free license, under any patents owned or controlled by Licensor, to make, have made, use, sell, offer for sale, import, or otherwise exploit the Software. No license is granted to patent rights that are not embodied in the operation of the Software in the form provided by Licensor.

3. Conditions and Limitations
(A) Model Licensing and Access. If you use the Software to create, modify, process, or otherwise use any Model, including usage to create inferences with a Model, whether or not you make the Model available to others, you must make that Model Source Materials publicly available under Open Licensing Terms.

(B) No Re-Licensing. If you redistribute the Software, or modifications to the Software made under the license granted above, you must make it available only under the terms of this license. You may offer additional terms such as warranties, maintenance and support, but You, and not Licensor, are responsible for performing such terms.

(C) No Trademark License. This license does not grant you rights to use the Licensor’s name, logo, or trademarks.

(D) If you assert in writing a claim against any person or entity alleging that the use of the Software infringes any patent, all of your licenses to the Software under Section 2 end automatically as of the date you asserted the claim.

(E) If you distribute any portion of the Software, you must retain all copyright, patent, trademark, and attribution notices that are present in the Software, and you must include a copy of this license.

(F) The Software is licensed “as-is.” You bear the entire risk of using it. Licensor gives You no express warranties, guarantees or conditions. You may have additional consumer rights under your local laws that this license cannot change. To the extent permitted under your local laws, the Licensor disclaims and excludes the implied warranties of merchantability, fitness for a particular purpose and non-infringement. To the extent this disclaimer is unlawful, you, and not Licensor, are responsible for any liability.
4 changes: 3 additions & 1 deletion gpt4all-backend/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ endif()
include_directories("${CMAKE_CURRENT_BINARY_DIR}")

set(LLMODEL_VERSION_MAJOR 0)
set(LLMODEL_VERSION_MINOR 3)
set(LLMODEL_VERSION_MINOR 4)
set(LLMODEL_VERSION_PATCH 0)
set(LLMODEL_VERSION "${LLMODEL_VERSION_MAJOR}.${LLMODEL_VERSION_MINOR}.${LLMODEL_VERSION_PATCH}")
project(llmodel VERSION ${LLMODEL_VERSION} LANGUAGES CXX C)
Expand All @@ -39,6 +39,8 @@ else()
message(STATUS "Interprocedural optimization support detected")
endif()

set(LLAMA_KOMPUTE YES)

include(llama.cpp.cmake)

set(BUILD_VARIANTS default avxonly)
Expand Down
2 changes: 1 addition & 1 deletion gpt4all-backend/llama.cpp-mainline
Submodule llama.cpp-mainline updated 97 files
+0 −1 .gitignore
+0 −0 .gitmodules
+122 −0 CMakeLists.txt
+30 −0 LICENSE_SOM.txt
+8 −0 examples/main/main.cpp
+1,313 −0 ggml-vulkan.cpp
+61 −0 ggml-vulkan.h
+16 −16 ggml.c
+27 −0 kompute/.ccls
+5 −0 kompute/.clang-format
+4 −0 kompute/.dockerignore
+58 −0 kompute/.github/workflows/cpp_examples.yml
+104 −0 kompute/.github/workflows/cpp_tests.yml
+28 −0 kompute/.github/workflows/python_tests.yml
+187 −0 kompute/CMakeLists.txt
+203 −0 kompute/LICENSE
+210 −0 kompute/Makefile
+513 −0 kompute/README.md
+106 −0 kompute/cmake/bin2h.cmake
+19 −0 kompute/cmake/bin_file_to_header.cmake
+139 −0 kompute/cmake/check_vulkan_version.cmake
+35 −0 kompute/cmake/code_coverage.cmake
+15 −0 kompute/cmake/deprecation_warnings.cmake
+8 −0 kompute/cmake/komputeConfig.cmake.in
+43 −0 kompute/cmake/vulkan_shader_compiler.cmake
+16 −0 kompute/config/FindSphinx.cmake
+819 −0 kompute/external/bin/xxd.c
+28 −0 kompute/kompute-config.cmake
+145 −0 kompute/op_add.comp
+145 −0 kompute/op_addrow.comp
+176 −0 kompute/op_cpy_f16_f16.comp
+176 −0 kompute/op_cpy_f16_f32.comp
+176 −0 kompute/op_cpy_f32_f16.comp
+168 −0 kompute/op_cpy_f32_f32.comp
+153 −0 kompute/op_diagmask.comp
+142 −0 kompute/op_gelu.comp
+150 −0 kompute/op_getrows_f16.comp
+179 −0 kompute/op_getrows_q4_0.comp
+181 −0 kompute/op_getrows_q4_1.comp
+145 −0 kompute/op_mul.comp
+177 −0 kompute/op_mul_mat_f16.comp
+195 −0 kompute/op_mul_mat_q4_0.comp
+218 −0 kompute/op_mul_mat_q4_1.comp
+145 −0 kompute/op_mulrow.comp
+209 −0 kompute/op_norm.comp
+141 −0 kompute/op_relu.comp
+178 −0 kompute/op_rmsnorm.comp
+183 −0 kompute/op_rope.comp
+142 −0 kompute/op_scale.comp
+141 −0 kompute/op_silu.comp
+197 −0 kompute/op_softmax.comp
+148 −0 kompute/scripts/convert_shaders.py
+11 −0 kompute/scripts/requirements.txt
+93 −0 kompute/setup.py
+450 −0 kompute/src/Algorithm.cpp
+82 −0 kompute/src/CMakeLists.txt
+27 −0 kompute/src/Core.cpp
+493 −0 kompute/src/Manager.cpp
+65 −0 kompute/src/OpAlgoDispatch.cpp
+51 −0 kompute/src/OpBufferSyncDevice.cpp
+51 −0 kompute/src/OpBufferSyncLocal.cpp
+74 −0 kompute/src/OpMemoryBarrier.cpp
+90 −0 kompute/src/OpTensorCopy.cpp
+61 −0 kompute/src/OpTensorSyncDevice.cpp
+76 −0 kompute/src/OpTensorSyncLocal.cpp
+396 −0 kompute/src/Sequence.cpp
+451 −0 kompute/src/Tensor.cpp
+46 −0 kompute/src/include/CMakeLists.txt
+338 −0 kompute/src/include/kompute/Algorithm.hpp
+39 −0 kompute/src/include/kompute/Core.hpp
+21 −0 kompute/src/include/kompute/Kompute.hpp
+267 −0 kompute/src/include/kompute/Manager.hpp
+313 −0 kompute/src/include/kompute/Sequence.hpp
+306 −0 kompute/src/include/kompute/Tensor.hpp
+197 −0 kompute/src/include/kompute/logger/Logger.hpp
+86 −0 kompute/src/include/kompute/operations/OpAlgoDispatch.hpp
+62 −0 kompute/src/include/kompute/operations/OpBase.hpp
+50 −0 kompute/src/include/kompute/operations/OpBufferSyncDevice.hpp
+50 −0 kompute/src/include/kompute/operations/OpBufferSyncLocal.hpp
+81 −0 kompute/src/include/kompute/operations/OpMemoryBarrier.hpp
+58 −0 kompute/src/include/kompute/operations/OpMult.hpp
+63 −0 kompute/src/include/kompute/operations/OpTensorCopy.hpp
+66 −0 kompute/src/include/kompute/operations/OpTensorSyncDevice.hpp
+66 −0 kompute/src/include/kompute/operations/OpTensorSyncLocal.hpp
+69 −0 kompute/src/logger/CMakeLists.txt
+101 −0 kompute/src/logger/Logger.cpp
+5 −0 kompute/src/shaders/CMakeLists.txt
+26 −0 kompute/src/shaders/glsl/CMakeLists.txt
+52 −0 kompute/src/shaders/glsl/ShaderLogisticRegression.comp
+310 −0 kompute/src/shaders/glsl/ShaderLogisticRegression.hpp.in
+28 −0 kompute/src/shaders/glsl/ShaderOpMult.comp
+101 −0 kompute/src/shaders/glsl/ShaderOpMult.hpp.in
+29 −0 kompute/src/shaders/hlsl/computeheadless.comp
+42 −0 llama-util.h
+61 −8 llama.cpp
+1 −1 llama.h
+18 −0 undump.py
134 changes: 133 additions & 1 deletion gpt4all-backend/llama.cpp.cmake
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
#
# Copyright (c) 2023 Nomic, Inc. All rights reserved.
#
# This software is licensed under the terms of the Software for Open Models License (SOM),
# version 1.0, as detailed in the LICENSE_SOM.txt file. A copy of this license should accompany
# this software. Except as expressly granted in the SOM license, all rights are reserved by Nomic, Inc.
#

cmake_minimum_required(VERSION 3.12) # Don't bump this version for no reason

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
Expand Down Expand Up @@ -145,6 +153,129 @@ if (LLAMA_OPENBLAS)
endif()
endif()

if (LLAMA_KOMPUTE)
find_package(Vulkan COMPONENTS glslc REQUIRED)
find_program(glslc_executable NAMES glslc HINTS Vulkan::glslc)
if (NOT glslc_executable)
message(FATAL_ERROR "glslc not found")
endif()

set(LLAMA_DIR ${CMAKE_CURRENT_SOURCE_DIR}/llama.cpp-mainline)

function(compile_shader)
set(options)
set(oneValueArgs)
set(multiValueArgs SOURCES)
cmake_parse_arguments(compile_shader "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
foreach(source ${compile_shader_SOURCES})
get_filename_component(OP_FILE ${source} NAME)
set(spv_file ${CMAKE_CURRENT_BINARY_DIR}/${OP_FILE}.spv)
add_custom_command(
OUTPUT ${spv_file}
DEPENDS ${LLAMA_DIR}/${source}
COMMAND ${glslc_executable} --target-env=vulkan1.2 -o ${spv_file} ${LLAMA_DIR}/${source}
COMMENT "Compiling ${source} to ${source}.spv"
)

get_filename_component(RAW_FILE_NAME ${spv_file} NAME)
set(FILE_NAME "shader${RAW_FILE_NAME}")
string(REPLACE ".comp.spv" ".h" HEADER_FILE ${FILE_NAME})
string(TOUPPER ${HEADER_FILE} HEADER_FILE_DEFINE)
string(REPLACE "." "_" HEADER_FILE_DEFINE "${HEADER_FILE_DEFINE}")
set(OUTPUT_HEADER_FILE "${HEADER_FILE}")
message(STATUS "${HEADER_FILE} generating ${HEADER_FILE_DEFINE}")
add_custom_command(
OUTPUT ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "/*THIS FILE HAS BEEN AUTOMATICALLY GENERATED - DO NOT EDIT*/" > ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#ifndef ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#define ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "namespace kp {" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "namespace shader_data {" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_BINARY_DIR}/bin/xxd -i ${spv_file} >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo "}}" >> ${OUTPUT_HEADER_FILE}
COMMAND ${CMAKE_COMMAND} -E echo \"\#endif // define ${HEADER_FILE_DEFINE}\" >> ${OUTPUT_HEADER_FILE}
DEPENDS ${spv_file} xxd
COMMENT "Converting to hpp: ${FILE_NAME} ${CMAKE_BINARY_DIR}/bin/xxd"
)
endforeach()
endfunction()

if (EXISTS "${LLAMA_DIR}/kompute/CMakeLists.txt")
message(STATUS "Kompute found")
add_subdirectory(${LLAMA_DIR}/kompute)

# Compile our shaders
compile_shader(SOURCES
kompute/op_scale.comp
kompute/op_add.comp
kompute/op_addrow.comp
kompute/op_mul.comp
kompute/op_mulrow.comp
kompute/op_silu.comp
kompute/op_relu.comp
kompute/op_gelu.comp
kompute/op_softmax.comp
kompute/op_norm.comp
kompute/op_rmsnorm.comp
kompute/op_diagmask.comp
kompute/op_mul_mat_f16.comp
kompute/op_mul_mat_q4_0.comp
kompute/op_mul_mat_q4_1.comp
kompute/op_getrows_f16.comp
kompute/op_getrows_q4_0.comp
kompute/op_getrows_q4_1.comp
kompute/op_rope.comp
kompute/op_cpy_f16_f16.comp
kompute/op_cpy_f16_f32.comp
kompute/op_cpy_f32_f16.comp
kompute/op_cpy_f32_f32.comp
)

# Create a custom target for our generated shaders
add_custom_target(generated_shaders DEPENDS
shaderop_scale.h
shaderop_add.h
shaderop_addrow.h
shaderop_mul.h
shaderop_mulrow.h
shaderop_silu.h
shaderop_relu.h
shaderop_gelu.h
shaderop_softmax.h
shaderop_norm.h
shaderop_rmsnorm.h
shaderop_diagmask.h
shaderop_mul_mat_f16.h
shaderop_mul_mat_q4_0.h
shaderop_mul_mat_q4_1.h
shaderop_getrows_f16.h
shaderop_getrows_q4_0.h
shaderop_getrows_q4_1.h
shaderop_rope.h
shaderop_cpy_f16_f16.h
shaderop_cpy_f16_f32.h
shaderop_cpy_f32_f16.h
shaderop_cpy_f32_f32.h
)

# Create a custom command that depends on the generated_shaders
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/ggml-vulkan.stamp
COMMAND ${CMAKE_COMMAND} -E touch ${CMAKE_CURRENT_BINARY_DIR}/ggml-vulkan.stamp
DEPENDS generated_shaders
COMMENT "Ensuring shaders are generated before compiling ggml-vulkan.cpp"
)

# Add the stamp to the main sources to ensure dependency tracking
set(GGML_SOURCES_KOMPUTE ${LLAMA_DIR}/ggml-vulkan.cpp ${LLAMA_DIR}/ggml-vulkan.h ${CMAKE_CURRENT_BINARY_DIR}/ggml-vulkan.stamp)
add_compile_definitions(GGML_USE_KOMPUTE)
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} kompute)
set(LLAMA_EXTRA_INCLUDES ${LLAMA_EXTRA_INCLUDES} ${CMAKE_BINARY_DIR})
else()
message(WARNING "Kompute not found")
endif()
endif()

if (LLAMA_ALL_WARNINGS)
if (NOT MSVC)
set(c_flags
Expand Down Expand Up @@ -301,7 +432,8 @@ function(include_ggml DIRECTORY SUFFIX WITH_LLAMA)
${GGML_SOURCES_QUANT_K}
${GGML_SOURCES_CUDA}
${GGML_METAL_SOURCES}
${GGML_OPENCL_SOURCES})
${GGML_OPENCL_SOURCES}
${GGML_SOURCES_KOMPUTE})

if (LLAMA_K_QUANTS)
target_compile_definitions(ggml${SUFFIX} PUBLIC GGML_USE_K_QUANTS)
Expand Down
85 changes: 85 additions & 0 deletions gpt4all-backend/llamamodel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@
#include <llama.h>
#include <ggml.h>

#ifdef GGML_USE_KOMPUTE
#include "ggml-vulkan.h"
#endif

namespace {
const char *modelType_ = "LLaMA";
Expand Down Expand Up @@ -155,13 +158,26 @@ bool LLamaModel::loadModel(const std::string &modelPath)
// currently
d_ptr->params.n_gpu_layers = 1;
#endif
#ifdef GGML_USE_KOMPUTE
if (ggml_vk_has_device()) {
// vulkan always runs the whole model if n_gpu_layers is not 0, at least
// currently
d_ptr->params.n_gpu_layers = 1;
}
#endif

d_ptr->ctx = llama_init_from_file(modelPath.c_str(), d_ptr->params);
if (!d_ptr->ctx) {
std::cerr << "LLAMA ERROR: failed to load model from " << modelPath << std::endl;
return false;
}

#ifdef GGML_USE_KOMPUTE
if (ggml_vk_has_device()) {
std::cerr << "llama.cpp: using Vulkan on " << ggml_vk_current_device().name << std::endl;
}
#endif

d_ptr->n_threads = std::min(4, (int32_t) std::thread::hardware_concurrency());
d_ptr->modelLoaded = true;
fflush(stderr);
Expand Down Expand Up @@ -252,6 +268,75 @@ const std::vector<LLModel::Token> &LLamaModel::endTokens() const
return fres;
}

#if defined(GGML_USE_KOMPUTE)
#include "ggml-vulkan.h"
#endif

std::vector<LLModel::GPUDevice> LLamaModel::availableGPUDevices(size_t memoryRequired)
{
#if defined(GGML_USE_KOMPUTE)
std::vector<ggml_vk_device> vkDevices = ggml_vk_available_devices(memoryRequired);

std::vector<LLModel::GPUDevice> devices;
for(const auto& vkDevice : vkDevices) {
LLModel::GPUDevice device;
device.index = vkDevice.index;
device.type = vkDevice.type;
device.heapSize = vkDevice.heapSize;
device.name = vkDevice.name;
device.vendor = vkDevice.vendor;

devices.push_back(device);
}

return devices;
#else
return std::vector<LLModel::GPUDevice>();
#endif
}

bool LLamaModel::initializeGPUDevice(size_t memoryRequired, const std::string& device)
{
#if defined(GGML_USE_KOMPUTE)
return ggml_vk_init_device(memoryRequired, device);
#else
return false;
#endif
}

bool LLamaModel::initializeGPUDevice(const LLModel::GPUDevice &device)
{
#if defined(GGML_USE_KOMPUTE)
ggml_vk_device vkDevice;
vkDevice.index = device.index;
vkDevice.type = device.type;
vkDevice.heapSize = device.heapSize;
vkDevice.name = device.name;
vkDevice.vendor = device.vendor;
return ggml_vk_init_device(vkDevice);
#else
return false;
#endif
}

bool LLamaModel::initializeGPUDevice(int device)
{
#if defined(GGML_USE_KOMPUTE)
return ggml_vk_init_device(device);
#else
return false;
#endif
}

bool LLamaModel::hasGPUDevice()
{
#if defined(GGML_USE_KOMPUTE)
return ggml_vk_has_device();
#else
return false;
#endif
}

#if defined(_WIN32)
#define DLL_EXPORT __declspec(dllexport)
#else
Expand Down
5 changes: 5 additions & 0 deletions gpt4all-backend/llamamodel_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ class LLamaModel : public LLModel {
size_t restoreState(const uint8_t *src) override;
void setThreadCount(int32_t n_threads) override;
int32_t threadCount() const override;
std::vector<GPUDevice> availableGPUDevices(size_t memoryRequired) override;
bool initializeGPUDevice(size_t memoryRequired, const std::string& device) override;
bool initializeGPUDevice(const GPUDevice &device) override;
bool initializeGPUDevice(int device) override;
bool hasGPUDevice() override;

private:
LLamaPrivate *d_ptr;
Expand Down
14 changes: 14 additions & 0 deletions gpt4all-backend/llmodel.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,14 @@ class LLModel {
// window
};

struct GPUDevice {
int index = 0;
int type = 0;
size_t heapSize = 0;
std::string name;
std::string vendor;
};

explicit LLModel() {}
virtual ~LLModel() {}

Expand Down Expand Up @@ -87,6 +95,12 @@ class LLModel {
return *m_implementation;
}

virtual std::vector<GPUDevice> availableGPUDevices(size_t /*memoryRequired*/) { return std::vector<GPUDevice>(); }
virtual bool initializeGPUDevice(size_t /*memoryRequired*/, const std::string& /*device*/) { return false; }
virtual bool initializeGPUDevice(const GPUDevice &/*device*/) { return false; }
virtual bool initializeGPUDevice(int /*device*/) { return false; }
virtual bool hasGPUDevice() { return false; }

protected:
// These are pure virtual because subclasses need to implement as the default implementation of
// 'prompt' above calls these functions
Expand Down
Loading

0 comments on commit 987546c

Please sign in to comment.