-
Notifications
You must be signed in to change notification settings - Fork 47
Limited-area transforms on GPU #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…identify meridional transform through negative kfields.
…essary for graphs).
|
This limited-area contribution required two small modifications in the global transforms:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this PARKIND1 to EC_PARKIND be automatically applied to a generated files by the sed transformation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of duplicated code in the cmake files between gpu, cpu, lam and global. We should at some point try to reduce this duplication to the bare minimum
|
Thanks for this @ddegrauwe. Some quick suggestions before I start a proper review:
|
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
|
Hi @samhatfield Thanks for the OpenMP fix!
Sorry about those. Should be fine now.
Indeed, MAXVAL(G%NMEN) is not the same as NSMAX. But for a global model, G%NMEN is always smaller than NSMAX, so looping until NSMAX is fine . There's a condition For a LAM domain, however, we can have a rectangular domain which is larger in the zonal direction than in the meridional direction. In that case, we have NSMAX<G%NMEN. Looping only until NSMAX will give wrong results because some waves are omitted. |
No problem - looks like you spotted some preexisting tabs as well. They are very hard to spot...
Makes sense. Given the DO KGL=1,D_NDGL_FS
DO JM=0,G_NMEN(OFFSET_VAR+KGL-1)
DO JF=1,KF_FS
IOFF_LAT = KF_FS*D_NSTAGTF(KGL)+(JF-1)*(D_NSTAGTF(KGL+1)-D_NSTAGTF(KGL))
SCAL = 1._JPRBT/REAL(G_NLOEN(OFFSET_VAR+KGL-1),JPRBT)
ISTA = 2_JPIB*D_NPNTGTB0(JM,KGL)*KF_FS
FOUBUF_IN(ISTA+2*JF-1) = SCAL * PREEL_COMPLEX(IOFF_LAT+2*JM+1)
FOUBUF_IN(ISTA+2*JF ) = SCAL * PREEL_COMPLEX(IOFF_LAT+2*JM+2)
ENDDO
ENDDO
ENDDO? Then we don't need |
In that case it would no longer be possible to collapse the loops for OpenACC or OpenMP. |
Ah yes, good point. Then we should leave it. |
|
Many thanks for your suggestions so far @samhatfield; I included them in a new commit. |
Co-authored-by: Sam Hatfield <samuel.hatfield@ecmwf.int>
|
@samhatfield are you working on fixing the adjoint test to have a fixed seed? |
I thought I had already, but apparently I didn't include the direct transform test. Maybe that wasn't in develop at the time. Let me open another PR. |
|
Here you go: #284. |
|
What is the status here? |
|
Hi @samhatfield , @wdeconinck , I just updated this pull request to the current develop branch. As far as I can tell, all checks pass and no merge conflicts remain. |
Hey Daan, nothing major to block the PR if I remember right (I guess I would have said so 6 months ago if there was). I'd like to have another look again before approving. However I'll only look at the "top level" files, like the CMake files etc., and just confirm nothing you're adding here interferes with global transform functionality or anything we've introduced in the last 6 months. As for everything in src/etrans/gpu, if it's okay with you, I'll not look closely at the changes there and trust you that everything is working. |
| set(transi_PRIVATE_LIBRARIES trans_dp) | ||
| if( HAVE_ETRANS ) | ||
| list( APPEND transi_PRIVATE_LIBRARIES | ||
| etrans_dp | ||
| ) | ||
| endif() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this code actually do something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unused, and probably a leftover of the merge.
Instead etrans_dp has been added directly in the list of PRIVATE_LIBS further via a generator expression
Checking this shows me that the GPU version transi_gpu_dp does not yet link with etrans_gpu_dp
If not straightforward, I would leave this for a follow-up PR.
samhatfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved subject to resolution of question above.
Adding capability of performing limited-area spectral transforms on GPU, based on work originally done by Meteo-France and NVIDIA.