forked from cms-sw/cmssw
-
Notifications
You must be signed in to change notification settings - Fork 5
Speed up in clusterizer and doubletFinder #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
VinInn
wants to merge
119
commits into
cms-patatrack:CMSSW_10_4_X_Patatrack
from
VinInn:GPUFastTracksNNClus
Closed
Changes from all commits
Commits
Show all changes
119 commits
Select commit
Hold shift + click to select a range
d15d0dd
use gpu vertices
VinInn fc8ffad
add vertex spitting
VinInn 9e3a3aa
fix iterations
VinInn c9418f6
apply outlier rejection, tune error
VinInn 5e9d7cf
fix duplicate cleaning
VinInn 2e7910f
sort and clean
VinInn 2d155f1
fishbone works
VinInn aad5235
fishbone works
VinInn 3244703
fishbone works
VinInn d6508a8
add layerid
VinInn e2fd6d2
copy layer on gpu
VinInn c68f413
efficient
VinInn 9dc2184
optimize parallelization
VinInn b3ed9d0
update notebook to include fishbone
VinInn cc973f6
silence it
VinInn 06365df
mark magic 2
VinInn f2439af
remove magic 256, reduce it to 128
VinInn ae7fc3f
reduce size
VinInn 9030763
remove duplicate code lines
VinInn d79faf5
narrow cut to avoid inefficiency for realistic
VinInn 376a0d4
Merged gpuVertexRedux from repository VinInn with cms-merge-topic
VinInn ad06e33
build pentuplets
VinInn 7bea72e
simplify
VinInn 932fec9
align to offline
VinInn d0f3adf
simplify histogrammer: no need of ws in fill
VinInn 6a192fd
test cuda_assert
VinInn fce72dc
use more stable and gpu friendly version of circle
VinInn f7dbc25
assoc tested
VinInn 483d591
check cosdir
VinInn 384465e
clean clode
VinInn ada49bd
try to use template errors
VinInn 6acfd9f
retune but still use old params
VinInn 45f65b9
add AtomicPairCounter and implement manyToOne
VinInn 7ac1cf3
tuning cuts
VinInn 2d8e41b
few steps toward persistent gputracks, crashes
VinInn 0adee78
Q productions works
VinInn 0198e52
forward hits
VinInn fd8f49c
compiles
VinInn fd49d01
runs
VinInn aef70eb
use less memory
VinInn 545a326
use even less memory
VinInn 511aa99
add quality flag
VinInn de2333d
factorize
VinInn 2a2ab2b
factorize
VinInn e028fd1
reading correcly tuples
VinInn 3064dd9
read hits
VinInn e64286e
Add B-hadron MTV variation to pixel track validation sequence
makortel 5695203
fix errors on gpu
VinInn bca3120
tip/zip ok
VinInn 9367675
use new version of Rinman fit
VinInn 76cfce2
fix error^2
VinInn 433d1cd
fix pixel errors
VinInn 3bd5c2e
use error from templates
VinInn 6899cd0
dup remover written
VinInn ae195a2
filter duplicates
VinInn e135f21
mae sure algo is stable
VinInn a605664
Merged mtvBhadronPixel from repository makortel with cms-merge-topic
VinInn 5763c56
fix for absent lape
VinInn 7ea1bb1
drop quads if sharing cell with pents
VinInn dd6ad51
add region cuts
VinInn 34989f3
merged, refactorize
VinInn 8cc7562
merged, refactorize
VinInn 5cafde2
Merged gpuTracks104 from repository VinInn with cms-merge-topic
VinInn 5ed6161
back to previous status
VinInn 572613b
prepare vertex finder to read from gpu
VinInn 2797693
produce vertices
VinInn aa0e4ad
maka vertices on gpu only: not scheduled..
VinInn 989019e
make profiling working
VinInn db66a14
minor cleanup
VinInn 79fd0ae
silenced
VinInn 68f9162
solve conflict
VinInn ddad076
resize to avoid overflows
VinInn a6e3e7d
protect and report cell overflow as well
VinInn 0935c5e
more cleanup
VinInn e695b29
remove all cpu stuff from CA on gpu
VinInn b2445f2
fix gpu only wf
VinInn fb73c7a
Address code style and quality issues (#203)
fwyzard 6110cf4
Fix MTV validation of initialStepPreSplitting tracks and add B-hadron…
makortel f11b911
Merge branch 'CMSSW_10_4_X_Patatrack' into gpuTracks104
VinInn 77bd114
Fix Free issues
VinInn ef869f5
Merge branch 'CMSSW_10_4_X_Patatrack' into gpuTracks104
VinInn 94b521e
Remove stray empty lines for consistency with upstream
fwyzard 54f759e
Merged gpuTracks104 from repository VinInn with cms-merge-topic
VinInn 76a4ae9
add test, adress first set of comments
VinInn ce143ca
more comments addressed
VinInn 13b7277
silenced
VinInn c455a96
now works
VinInn 6d9379f
test of fit on gpu works
VinInn d3cc0b4
Merge branch 'CMSSW_10_4_X_Patatrack' of https://github.com/cms-patat…
VinInn 522cfdf
late fishbone
VinInn c8623a5
make fishbone configurable
VinInn 807d794
Merge branch 'CMSSW_10_4_X_Patatrack' into gpuTracks104
VinInn 1d517e8
Merged gpuTracks104 from repository VinInn with cms-merge-topic
VinInn 909cfb0
fix long standing bug (minor effect)
VinInn b8b4299
compiles and run
VinInn 0c55b56
lit fit is wrong
VinInn 811cb63
fix stupid bug
VinInn 47d28a7
remove default arg
VinInn ff31eaf
debug occupancy
VinInn 5ee726d
faster clustering
VinInn 31584e9
apply to vertex as well
VinInn 002a576
clean assert
VinInn 49df121
fix missing Free
VinInn 2078b24
Merged gpuTracksFastRFit from repository VinInn with cms-merge-topic
VinInn 5aeaba2
Fix setting the data pointer of error SimpleVector
makortel 5be8fe9
Merged fixPixelErrors from repository makortel with cms-merge-topic
VinInn c46e716
silenced initchk
VinInn fa4a912
Merged GPUFastTracksOptFix from repository VinInn with cms-merge-topic
VinInn 8af902a
add NN to clustering
VinInn 705d218
Merged GPUFastTracksNNClus from repository VinInn with cms-merge-topic
VinInn c190bd2
revert to use topology
VinInn 048af11
parallelize inner loop
VinInn 3e1a505
Merged GPUFastTracksNNClus from repository VinInn with cms-merge-topic
VinInn 940fe95
use 2D grid instead of hand-made stride
VinInn 8f5d217
use 2D grid instead of hand-made stride
VinInn 1c2f268
widen cell_connect as well
VinInn 7067416
Full workflow from raw data to pixel tracks and vertices on GPUs (#216)
VinInn e15a883
Merge branch 'CMSSW_10_4_X_Patatrack' into GPUFastTracksNNClus
fwyzard 8fff3ec
Merge branch 'CMSSW_10_4_X_Patatrack' into GPUFastTracksNNClus
fwyzard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and later, and in the kernel code: do we expect any differences using
or
assuming the
.xand.yare swapped accordingly inside the kernels ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, do we expect any performance difference using
or
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spitting the PR.
Answer to first question:
According to CUDA doc and examples "x" run faster then "y" so swapping "x" with "y" will NOT achieve the desired result of having the inner loop run in contiguous cuda thread:
The current implementation should be in my intentions equivalent to the hand-made one in terms of thread assignment.
second question:
IN PRINCIPLE the two approaches should be fully equivalent: the use of a 2D grid is clearly more CUDA-style, and does not require the percolation of the stride.
I should have coded directly using the 2D grid.
IN PRACTICE: I cannot exclude a different overhead between the two implementations.
I have simple unit tests/examples
https://github.com/VinInn/ctest/blob/master/cuda/combiHM.cu
https://github.com/VinInn/ctest/blob/master/cuda/combiXY.cu
The hand-made seems a bit faster.
My opinion is that the 2D grid is the way to code it in CUDA: It is surely more easy to understand and maintain. (is like in C using 1D arrays and computing the offset by hands instead of using a 2D array...)
We could investigate with cuda/nvcc experts: not sure we get anywhere.