-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added readme w/ discussion and makefiles
- Loading branch information
Showing
7 changed files
with
171 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
all: fill_lanes.none.x-mic fill_lanes.none.x-host fill_lanes.simd.x-mic fill_lanes.simd.x-host fill_lanes.fill-temp.x-mic fill_lanes.fill-temp.x-host fill_lanes.fill-direct.x-mic fill_lanes.fill-direct.x-host fill_lanes.fill-intr.x-mic | ||
CC=icc -O3 -std=c99 -w2 -qopt-report=5 -wd10397 -wd10382 | ||
fill_lanes.none.x-mic: fill_lanes.c | ||
$(CC) -o$@ -mmic fill_lanes.c | ||
fill_lanes.none.x-host: fill_lanes.c | ||
$(CC) -o$@ -xHost fill_lanes.c | ||
fill_lanes.simd.x-mic: fill_lanes.c | ||
$(CC) -o$@ -mmic fill_lanes.c -DSIMD -DSIMD_CORRECT | ||
fill_lanes.simd.x-host: fill_lanes.c | ||
$(CC) -o$@ -xHost fill_lanes.c -DSIMD -DSIMD_CORRECT | ||
fill_lanes.fill-temp.x-mic: fill_lanes.c | ||
$(CC) -o$@ -mmic fill_lanes.c -DSIMD -DSIMD_CORRECT \ | ||
-DFILL -DFILL_TEMP | ||
fill_lanes.fill-temp.x-host: fill_lanes.c | ||
$(CC) -o$@ -xHost fill_lanes.c -DSIMD -DSIMD_CORRECT \ | ||
-DFILL -DFILL_TEMP | ||
#fill_lanes.fill-store.x-mic: fill_lanes.c | ||
# $(CC) -o$@ -mmic fill_lanes.c -DSIMD -DSIMD_CORRECT \ | ||
# -DFILL -DFILL_STORE | ||
#fill_lanes.fill-store.x-host: fill_lanes.c | ||
# $(CC) -o$@ -xHost fill_lanes.c -DSIMD -DSIMD_CORRECT \ | ||
# -DFILL -DFILL_STORE | ||
fill_lanes.fill-direct.x-mic: fill_lanes.c | ||
$(CC) -o$@ -mmic fill_lanes.c -DSIMD -DSIMD_CORRECT \ | ||
-DFILL -DFILL_DIRECT -DVECTOR_VARIANT | ||
fill_lanes.fill-direct.x-host: fill_lanes.c | ||
$(CC) -o$@ -xHost fill_lanes.c -DSIMD -DSIMD_CORRECT \ | ||
-DFILL -DFILL_DIRECT -DVECTOR_VARIANT | ||
fill_lanes.fill-intr.x-mic: fill_lanes.c | ||
$(CC) -o$@ -mmic fill_lanes.c -DFILL_INTR_PHI |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
all: inner_loop_reduce.none.x-mic inner_loop_reduce.none.x-host inner_loop_reduce.simd.x-mic inner_loop_reduce.simd.x-host inner_loop_reduce.vector_variant.x-mic inner_loop_reduce.vector_variant.x-host | ||
CC=icc -O3 -std=c99 -w2 -qopt-report=5 -wd10397 -wd10382 | ||
inner_loop_reduce.none.x-mic: inner_loop_reduce.c | ||
$(CC) -o$@ -mmic inner_loop_reduce.c | ||
inner_loop_reduce.none.x-host: inner_loop_reduce.c | ||
$(CC) -o$@ -xHost inner_loop_reduce.c | ||
inner_loop_reduce.simd.x-mic: inner_loop_reduce.c | ||
$(CC) -o$@ -mmic inner_loop_reduce.c -DSIMD -DSIMD_CORRECT | ||
inner_loop_reduce.simd.x-host: inner_loop_reduce.c | ||
$(CC) -o$@ -xHost inner_loop_reduce.c -DSIMD -DSIMD_CORRECT | ||
inner_loop_reduce.vector_variant.x-mic: inner_loop_reduce.c | ||
$(CC) -o$@ -mmic inner_loop_reduce.c -DSIMD -DSIMD_CORRECT -DVECTOR_VARIANT | ||
inner_loop_reduce.vector_variant.x-host: inner_loop_reduce.c | ||
$(CC) -o$@ -xHost inner_loop_reduce.c -DSIMD -DSIMD_CORRECT -DVECTOR_VARIANT |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
Vectorization Example: Fill Lanes | ||
================================= | ||
|
||
Consider a program, where we have to vectorize a loop that contains another, inner loop. | ||
That inner loop iterates over different pieces of data, and contains a "continue" statement halfway through. | ||
This "continue" statement guards the execution of a very expensive function. | ||
|
||
Vectorizing this loop conventionally leads to code that executes that expensive function frequently, and most of the times with incomplete masks, i.e. we waste some "space" in our SIMD unit. | ||
|
||
Instead, it is possible to only execute that expensive function once we have filled our vector registers to the max: | ||
We do not execute the inner for loop in lock-step anymore, but allow different lanes to proceed independently from each other. | ||
|
||
To implement this, we have to figure out if any lane has executed continue, and we have to figure out if all loops are ready to execute a function. | ||
This means that we need reductions on masks among the lanes. | ||
The current compiler does not seem to support that. | ||
|
||
We present a number of work-arounds (of various effectiveness). | ||
One relies on non-vectorized functions and global variables (fill-temp), another on vector_variant functions (fill-direct), another on masked stores and global variables (fill-store, defunct), and the last one is an explicitly vectorized code using intrinsics (fill-intr, only Phi). | ||
|
||
Measurements | ||
============ | ||
|
||
Note: none is the non-vectorized version, simd the trivially vectorized version, fill-* are attempts at filling the lanes. | ||
Only the intrinsic filling achieves a speedup. | ||
All versions are correct (i.e. the first correctness parameter is < 1e-5, the second correctness parameter approx -5.99130). | ||
|
||
none.x-mic: Time : 3.723747200000000e+08 | ||
none.x-mic: Correct: 3.066983936150791e-06 | ||
none.x-mic: Correct: -5.991305351257324e+00 | ||
simd.x-mic: Time : 1.661247360000000e+08 | ||
simd.x-mic: Correct: 3.066983936150791e-06 | ||
simd.x-mic: Correct: -5.991305351257324e+00 | ||
fill-direct.x-mic: Time : 4.607440000000000e+08 | ||
fill-direct.x-mic: Correct: 3.066983936150791e-06 | ||
fill-direct.x-mic: Correct: -5.991305351257324e+00 | ||
fill-temp.x-mic: Time : 6.023345280000000e+08 | ||
fill-temp.x-mic: Correct: 3.066983936150791e-06 | ||
fill-temp.x-mic: Correct: -5.991305351257324e+00 | ||
fill-intr.x-mic: Time : 3.659471200000000e+07 | ||
fill-intr.x-mic: Correct: 4.884550889983075e-06 | ||
fill-intr.x-mic: Correct: -5.991304397583008e+00 | ||
none.x-host: Time : 7.618968800000000e+07 | ||
none.x-host: Correct: 3.066983936150791e-06 | ||
none.x-host: Correct: -5.991305351257324e+00 | ||
simd.x-host: Time : 8.071620000000000e+07 | ||
simd.x-host: Correct: 3.066983936150791e-06 | ||
simd.x-host: Correct: -5.991305351257324e+00 | ||
fill-direct.x-host: Time : 1.553101760000000e+08 | ||
fill-direct.x-host: Correct: 3.066983936150791e-06 | ||
fill-direct.x-host: Correct: -5.991305351257324e+00 | ||
fill-temp.x-host: Time : 2.124863200000000e+08 | ||
fill-temp.x-host: Correct: 3.066983936150791e-06 | ||
fill-temp.x-host: Correct: -5.991305351257324e+00 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
Vectorization Example: Inner Loop Reduce | ||
======================================== | ||
|
||
Consider an example where we vectorize an outer loop, and a inner loop is iterated through. | ||
We need to accumulate a value within to a memory location dependent on variables in the inner loop. | ||
The simd pragma can specify a reduction clause for arrays. | ||
However, this means that the array will be augmented with another dimension of size <vector length>, and the code below the pragma will just add into that. | ||
The reduction happens only in the end. | ||
This may actually be desireable behaviour if we often accumulate into relatively few distinct memory locations. | ||
It however is unsuited if we accumulate only a few times per memory location, if we are constrained in terms of memory usage, or if we need to allocate on the heap. | ||
|
||
The mitigation strategies herein circle around the idea to "hide" code from the compiler in function calls. | ||
These function calls have to be serialized by the compiler, and we get the desired behaviour. | ||
Note that it is crucial to annotate the function (memory_reduce_add) with __declspec(noinline). | ||
If inlined, it will not work: The code will compute invalid results. | ||
As another additional step, we can make use of the vector_variant annotation, and perform the reduction using compiler intrinsics. | ||
|
||
Measurement | ||
=========== | ||
|
||
Note: none has no pragma-based vectorization, simd is explicitly vectorized, and vector_variant uses vector_variant-declarations additionally. | ||
It is clear that performance benefits on both the Phi and the Host. | ||
|
||
none.x-mic: Time : 3.473042000000000e+06 | ||
none.x-mic: Correct: 4.835128784179688e-04 | ||
simd.x-mic: Time : 1.219468000000000e+06 | ||
simd.x-mic: Correct: 1.049041748046875e-04 | ||
vector_variant.x-mic: Time : 3.594990000000000e+05 | ||
vector_variant.x-mic: Correct: 3.814697265625000e-05 | ||
none.x-host: Time : 1.234876000000000e+06 | ||
none.x-host: Correct: 4.835128784179688e-04 | ||
simd.x-host: Time : 4.378800000000000e+05 | ||
simd.x-host: Correct: 3.433227539062500e-05 | ||
vector_variant.x-host: Time : 2.689560000000000e+05 | ||
vector_variant.x-host: Correct: -6.866455078125000e-05 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters