TRNS CPU GPU #12

trinayan · 2017-06-26T14:59:12Z

Hi,
The Chai paper mentions that SC , PAD and TRNS support only GPU execution in the OPENCL-D benchmarks. I did not try the OpenCL-D ones but the CUDA-D SC and PAD version is CPU-GPU executing together. I was wondering if it is also possible to do this for TRNS of CUDA-D as well or it is not possible at all? Thanks

Best,
Trinayan

el1goluj · 2017-06-27T12:37:56Z

Hi Trinayan,

As stated in the Chai paper, inter-worker synchronization between CPU and GPU workers is not possible without system-wide atomics. For this reason, in the paper we use GPU-only version for PAD, SC and TRNS when we compare OpenCL-D and OpenCL-U versions (Figure 2).
PAD and SC -D versions in the repository support CPU+GPU out-of-place implementations that do not require CPU-GPU inter-worker synchronization. The input arrays are divided into two parts, each of which is assigned to CPU or GPU. These versions are not directly comparable to PAD and SC -U versions, which are in-place.
TRNS uses multiple depending cycles for concurrency (see Sung et al., Innovative Parallel Computing, 2012). Elements in a depending cycle are scattered across the whole matrix, making the elements hard to be collected in -D version. Thus, the input array cannot easily split into two parts as in PAD and SC.

Juan

trinayan · 2017-06-29T15:00:00Z

Hi,

Thanks a lot for this information. Now I understand clearly. Is it also possible to generate larger input sets for Bezier Surface in a manner similar to the other benchmarks ?

Best,
Trinayan

el1goluj · 2017-06-30T06:34:27Z

Yes, for BS you can use:
-m : input size in both dimensions (default=3)
-n : output resolution in both dimensions (default=300)

robers97 · 2018-05-27T13:33:27Z

As stated in the Chai paper, inter-worker synchronization between CPU and GPU workers is not possible without system-wide atomics. For this reason, in the paper we use GPU-only version for PAD, SC and TRNS when we compare OpenCL-D and OpenCL-U versions (Figure 2).

TRANS appears to be a good focus to test implementations of system wide atomics wither that is in software: https://docs.nvidia.com/cuda/pascal-tuning-guide/index.html or hardware (TBD). I see you have the CUDA_8_0 flag in the code, but haven't gotten to the point where that compiles though I'm running CUDA 9.2.

Can we revisit this together? I'm willing to contribute back.

el1goluj · 2018-05-27T13:47:51Z

Hi robers97,

Could you be more specific about your question?
The CUDA-U version uses system-wide atomics. It is tested with CUDA 8.0 (first CUDA version with system-wide atomics), thus it should work with CUDA 9.2. You will need Pascal or Volta GPU.
Yes, TRNS is a good benchmark to test implementations of system-wide atomics. Actually all CUDA-U benchmarks use them, and might be useful for you.

Thanks,
Juan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRNS CPU GPU #12

TRNS CPU GPU #12

trinayan commented Jun 26, 2017

el1goluj commented Jun 27, 2017

trinayan commented Jun 29, 2017

el1goluj commented Jun 30, 2017

robers97 commented May 27, 2018

el1goluj commented May 27, 2018

TRNS CPU GPU #12

TRNS CPU GPU #12

Comments

trinayan commented Jun 26, 2017

el1goluj commented Jun 27, 2017

trinayan commented Jun 29, 2017

el1goluj commented Jun 30, 2017

robers97 commented May 27, 2018

el1goluj commented May 27, 2018