Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic LLVM-IR shading code can be achieved by using AdaptiveCpp #33

Open
101001000 opened this issue Apr 7, 2024 · 6 comments
Open

Comments

@101001000
Copy link
Owner

I had a conversation with @illuhad about how this could be achieved with a couple tweaks

@illuhad
Copy link

illuhad commented May 29, 2024

fyi, this could be interesting in this context: AdaptiveCpp/AdaptiveCpp#1467

@101001000
Copy link
Owner Author

101001000 commented May 29, 2024

fyi, this could be interesting in this context: AdaptiveCpp/AdaptiveCpp#1467

Very interesting, thank you! The only concern I'm having right now is that Adaptivecpp dynamic functions require the function body to be defined at compile time, so I don't know how I could fit a runtime loaded function.

I managed to bypass the function pointer limitation (// Not allowed: auto* myfuncptr = &myfunc; dynamic_function{myfuncptr};) by manually registering it with the wranggled id of the function definition hipsycl::glue::reflection::detail::symbol_information::get().register_function_symbol((const void *)&myfunction1, "_Z11myfunction1PiN7hipsycl4sycl4itemILi1ELb1EEE"); so I can do: ptr = &myfunction1; dyn_function_config.define(&execute_operations, ptr);

Now I need to find a way to include my LLVM code to the module passed to LLVMToBackendTranslator.

Just to make it clear, I managed to make the function ptr works with this code:

#include <sycl/sycl.hpp>

SYCL_EXTERNAL void myfunction1(int *data, sycl::item<1> idx){
    data[idx] = 1;
}

SYCL_EXTERNAL void myfunction2(int *data, sycl::item<1> idx){
    data[idx] = 2;
}

__attribute__((noinline)) void execute_operations(int *data, sycl::item<1> idx){
    // This prevents the compiler from removing calls to execute_operations if it
    // sees that the function cannot actually have any side-effects.
    sycl::jit::arguments_are_used(data, idx);
}

int main()
{
    sycl::queue q;
    int *data = sycl::malloc_shared<int>(32, q);

    sycl::jit::dynamic_function_config dyn_function_config;

    int i{-1};
    std::cin >> i;

    auto* ptr = &myfunction1;

    if(i == 1){
        ptr = &myfunction1;
    } else if(i == 2){
        ptr = &myfunction2;
    }

    hipsycl::glue::reflection::detail::symbol_information::get().register_function_symbol((const void *)&myfunction1, "_Z11myfunction1PiN7hipsycl4sycl4itemILi1ELb1EEE");
    hipsycl::glue::reflection::detail::symbol_information::get().register_function_symbol((const void *)&myfunction2, "_Z11myfunction2PiN7hipsycl4sycl4itemILi1ELb1EEE");
    
    dyn_function_config.define(&execute_operations, ptr);

    q.parallel_for(sycl::range{32}, dyn_function_config.apply([=](sycl::item<1> idx)
                                                              { execute_operations(data, idx); }));

    q.wait();

    for (int i = 0; i < 32; ++i) {
        std::cout << data[i] << ",";
    }
}

@illuhad
Copy link

illuhad commented May 29, 2024

Very interesting, thank you! The only concern I'm having right now is that Adaptivecpp dynamic functions require the function body to be defined at compile time, so I don't know how I could fit a runtime loaded function.

What do you mean by a runtime-loaded function? The set of instructions in an IR is finite, so if you define your dynamic functions appropriately (in an extreme case, on an instruction level, like add, subtract etc) you can have this feature operate at almost arbitrary granularity.

If you want even more control, I'm also looking into adding an optional user callback right before the IR optimization stage, so that users can inject their own LLVM transformations.

I managed to bypass the function pointer limitation (// Not allowed: auto* myfuncptr = &myfunc; dynamic_function{myfuncptr};) by manually registering it with the wranggled id of the function definition hipsycl::glue::reflection::detail::symbol_information::get().register_function_symbol((const void *)&myfunction1, "_Z11myfunction1PiN7hipsycl4sycl4itemILi1ELb1EEE"); so I can do: ptr = &myfunction1; dyn_function_config.define(&execute_operations, ptr);

Don't do this, it's not exactly the same. You're only registering the function with the symbol resolution logic, but the compiler won't know that execute_operations is a dynamic function. This will cause problems in some cases.

If you need more control and need to pass around functions more freely, the way to do it is dynamic_function_id:

// Notice the difference in types between `dynamic_function` and `dynamic_function_definition`.
sycl::jit::dynamic_function df{&execute_operations};
sycl::jit::dynamic_function_definition mf1{&myfunction1};

// These objects can be passed around like normal variables, stored in data structures etc.
sycl::jit::dynamic_function_id df_id = df.id();
sycl::jit::dynamic_function_id mf1_id = mf1.id();

sycl::jit::dynamic_function_config cfg;
cfg.define(df_id, mf1_id);

The downside of using dynamic_function_id is that since it does type-erasure, it cannot be caught at compile time whether functions are compatible. It's the user's responsibility to ensure that.

If you still want type-safety, you can also directly use the dynamic_function{_definition} objects:

// These objects can also be passed around like normal variables, however, they have template arguments
// which you just don't see because they get automatically deduced by C++ CTAD. So it might be difficult to store
// these objects in a common data structure, if they have different function signatures. In that case,
// you would need `dynamic_function_id`
sycl::jit::dynamic_function df{&execute_operations};
sycl::jit::dynamic_function_definition mf1{&myfunction1};

sycl::jit::dynamic_function_config cfg;
cfg.define(df, mf1);

See the API reference in that PR for a full list of all overloads.

@101001000
Copy link
Owner Author

101001000 commented Jun 4, 2024

What do you mean by a runtime-loaded function?

My objective is to load a function from an LLVM Module loaded at runtime, my definition of runtime function is a function which definition is not known until the execution of the program

If you want even more control, I'm also looking into adding an optional user callback right before the IR optimization stage, so that users can inject their own LLVM transformations.

That would fit perfect in my use case as I could add a transformation pass which would merge the main module with a loaded one. I'm doing right now something similar but hardcoding the LLVM module load and merge directly in the AdaptiveCpp code as a concept.

I was thinking that maybe I can play around with specialized constants with sycl::specialized? If I understood well it's purpose, JIT will read sycl::specialized values and will generate device code with the runtime value of such variable. I tried to set a sycl::specialized function pointer but it only worked in the OMP backend, I guess that's maybe because OMP runs in the host, and the device code is not defined for such function? I know about the function ptrs and virtual functions limitations of SYCL, but maybe using a functor object could do the trick?

@illuhad
Copy link

illuhad commented Jun 6, 2024

I tried to set a sycl::specialized function pointer but it only worked in the OMP backend, I guess that's maybe because OMP runs in the host, and the device code is not defined for such function? I know about the function ptrs and virtual functions limitations of SYCL, but maybe using a functor object could do the trick?

The problem is that a) some devices may not support function pointers and even more fundamental b) the pointer address that you obtain outside the kernel code refers to the address of the host code, but inside the kernel you are on device - function pointer addresses, even if supported, will in general be different there.

A functor object will either boil down to function pointers as well if it uses some form of type erasure, or it will rely on different data types, i.e. require templates. In the latter case, JIT techniques won't help you because templates are evaluated in the frontend at compile time.

@101001000
Copy link
Owner Author

As a remainder to myself, it's also possible to inject LLVM-IR payload in DPC++ by slightly modifying their kernel fusion extension as is the only place they do JIT for real. I don't know if they perform an optimization pass after the merge as they work with Spir-V after that, or if I should provide a pre-optimized code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants