-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRNS CPU GPU #12
Comments
Hi Trinayan, As stated in the Chai paper, inter-worker synchronization between CPU and GPU workers is not possible without system-wide atomics. For this reason, in the paper we use GPU-only version for PAD, SC and TRNS when we compare OpenCL-D and OpenCL-U versions (Figure 2). Juan |
Hi, Thanks a lot for this information. Now I understand clearly. Is it also possible to generate larger input sets for Bezier Surface in a manner similar to the other benchmarks ? Best, |
Yes, for BS you can use: |
TRANS appears to be a good focus to test implementations of system wide atomics wither that is in software: https://docs.nvidia.com/cuda/pascal-tuning-guide/index.html or hardware (TBD). I see you have the CUDA_8_0 flag in the code, but haven't gotten to the point where that compiles though I'm running CUDA 9.2. Can we revisit this together? I'm willing to contribute back. |
Hi robers97, Could you be more specific about your question? Thanks, |
Hi,
The Chai paper mentions that SC , PAD and TRNS support only GPU execution in the OPENCL-D benchmarks. I did not try the OpenCL-D ones but the CUDA-D SC and PAD version is CPU-GPU executing together. I was wondering if it is also possible to do this for TRNS of CUDA-D as well or it is not possible at all? Thanks
Best,
Trinayan
The text was updated successfully, but these errors were encountered: