Skip to content

Conversation

@dhaumont
Copy link
Contributor

@dhaumont dhaumont commented Sep 26, 2025

Summary

This PR introduces a field API interface to ectrans. It is based on top of dump-checksums (PR #287)

Description of changes

  • The new field API interface is implemented in src/field_api:

    • src/field_api/trans/inv_trans_field_api.F90: inverse transform (calling inv_trans internally)
    • src/field_api/trans/dir_trans_field_api.F90: direct transform (calling dir_trans internally)
    • src/field_api/field_api_ectrans_mod.F90: helper functions
  • Optional compilation triggered by a new USE_FIELD_API configuration option. Adds a new dependency to field API.

  • Single and double precision for CPU and GPU

  • Accessible in ectrans-benchmark via a new option --field-api

    • src/programs/util/ectrans_field_api_helper.F90 contains the field API implementation
  • In the GPU version of ectrans-benchmark , setting llacc = .true. will force the intermediate fields on GPU. This requires additional memory copies host <-> device before and after the regular ectrans calls.

  • New unit tests have been added to test --field-api (CPU and GPU)

  • compare_checksums.py updated to compare the checksums with field-api with the regular version

  • nvhpc22.11 on CPU: bit reproducibility between the field API and the regular version has been partially adressed by clamping small values in spectral space after dir_trans in ectrans benchmark (2a7aa87). This only applies to double precision (single precision is only bit reproducible during the first time step and slightly different after)

Testing and validation

  • ATOS:
    • Intel 2021.4.0 - CPU: bit reproducible
    • nvhpc 22.11 - CPU: dp : bit reproducible, sp: not reproducible
    • nvhpc 22.11 on GPU: bit reproducible
  • Ubuntu laptop
    *Gfortran 13.3.0 - CPU : bit reproducible

Known limitations

  • no LAM version
  • no openMP version ( OpenACC only)
  • The GPU version with llacc = .true. requires memory transfers between CPU and GPU, that should be avoided
  • for nvhpc 22.11, the CPU version in sp is not bit reproducible ( first time step is bit reproducible, tiny differences after first iteration)

Initial pull request

dhaumont#1
Contains some comments from @samhatfield that needs to be integrated

Prototype in ARPEGE

See for instance this prototype of transinv_mdl_field_api.F90: https://github.com/dhaumont/IAL/blob/50t1_field_api_ectrans/arpifs/transform/transinv_mdl_field_api.F90
This version can be compiled using this bundle:
50t1_ectrans_field_api_interface

Comment on lines 117 to 118
INTEGER(KIND=JPIM) :: IFLDXGUV
INTEGER(KIND=JPIM) :: IFLDXLUV
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
INTEGER(KIND=JPIM) :: IFLDXGUV
INTEGER(KIND=JPIM) :: IFLDXLUV

These aren't used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in ecbf2d2

Comment on lines 143 to 144
IFLDXGUV= 0
IFLDXLUV= 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. they will go away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in ecbf2d2

! ------------------------------------------------------------------
IF (LHOOK) CALL DR_HOOK('INV_TRANS_FIELD_API',0,ZHOOK_HANDLE)

ISPUV = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISPUV also doesn't seem to be used - can you check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. I am removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 7cc7912

! Spectral field view
REAL(KIND=JPRB),POINTER :: P(:)
INTEGER :: IVSET
CHARACTER(LEN=12) :: NAME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a case where NAME is used. Could you check this is still required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names are really useful for debugging, I would prefer to keep them

! Grid point field view
REAL(KIND=JPRB),POINTER :: P(:,:)
INTEGER :: IVSET
CHARACTER(LEN=12) :: NAME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@wdeconinck
Copy link
Collaborator

I took the liberty to commit directly in your branch @dhaumont, I hope you don't mind.
I fixed the CI by disabling the tests that require field_api when the feature FIELD_API is not enabled.

@wdeconinck
Copy link
Collaborator

It is not possible that a downstream package would be linking to multiple versions of ectrans_field_api (i.e. ectrans_field_api_cpu_sp, ectrans_field_api_cpu_dp, ectrans_field_api_gpu_sp, ectrans_field_api_gpu_dp).
This is because you would have each ectrans_field_api_cpu_dp and ectrans_field_api_cpu_sp etc, each using FIELD_MODULE from field_api but this module and its types/symbols are not uniquely defined (once for field_api_sp and once for field_dp).
This is not a problem because its use is intended for IAL/IFS, which would link to only one of these versions.

However this means that we don't need to go through the same procedure as done with trans to define unique symbols, as the uniqueness is not important. For trans this was important because we want to be able to make use of trans through downstream libraries (like Atlas) that may make a runtime choice on precision and cpu/gpu execution.

If you don't mind I will refactor this a bit, reducing the complexity also with the includes which would not be required any longer.

@dhaumont
Copy link
Contributor Author

dhaumont commented Jan 8, 2026

I took the liberty to commit directly in your branch @dhaumont, I hope you don't mind. I fixed the CI by disabling the tests that require field_api when the feature FIELD_API is not enabled.

So the field api interface is not compiled nor executed in the CI ????
In my opinion, what is not tested is just dead code.

@dhaumont
Copy link
Contributor Author

dhaumont commented Jan 8, 2026

It is not possible that a downstream package would be linking to multiple versions of ectrans_field_api (i.e. ectrans_field_api_cpu_sp, ectrans_field_api_cpu_dp, ectrans_field_api_gpu_sp, ectrans_field_api_gpu_dp). This is because you would have each ectrans_field_api_cpu_dp and ectrans_field_api_cpu_sp etc, each using FIELD_MODULE from field_api but this module and its types/symbols are not uniquely defined (once for field_api_sp and once for field_dp). This is not a problem because its use is intended for IAL/IFS, which would link to only one of these versions.

However this means that we don't need to go through the same procedure as done with trans to define unique symbols, as the uniqueness is not important. For trans this was important because we want to be able to make use of trans through downstream libraries (like Atlas) that may make a runtime choice on precision and cpu/gpu execution.

If you don't mind I will refactor this a bit, reducing the complexity also with the includes which would not be required any longer.

One of the main requirement of this interface is the ability to dynamically choose betweeen CPU and GPU execution: you want to be able to run the spectral transform on CPU or on GPU with LDACC.
Is this compatible with what you suggest ?

I'm not sure what problem you are trying to solve exactly, is this the fact that you have to include an interface file ?

@samhatfield
Copy link
Collaborator

So the field api interface is not compiled nor executed in the CI ???? In my opinion, what is not tested is just dead code.

It's tested in the "HPC" suite (ECMWF HPC2020 and LUMI) but not the Github suite. So what we need is a commit like 88af96b but applied to .github/workflows/build.yml.

@wdeconinck
Copy link
Collaborator

So the field api interface is not compiled nor executed in the CI ???? In my opinion, what is not tested is just dead code.

The FIELD_API feature is optional, meaning that when FIELD_API is not enabled those particular tests should not be running, we can all agree on that. That is what I fixed.
Next, I definitely agree that the CI should be running also with FIELD_API feature enabled.

I forgot to mention that for the build-hpc workflow I also added the missing field_api dependency in commit ce5ae51 . It was not sufficient to set "-DENABLE_FIELD_API=ON", so hopefully the CI is going to test this at least on the HPCs now (lumi and ecmwf-atos).
Looking at the logs however it seems to ignore changes to the Github-Actions workflows for pull-requests with "secrets". I guess that is a "feature" of Github-Actions for security reasons. We would need to first change the workflow in the develop branch then to make sure this PR is testing FIELD_API changes.

@dhaumont
Copy link
Contributor Author

dhaumont commented Jan 8, 2026

So the field api interface is not compiled nor executed in the CI ???? In my opinion, what is not tested is just dead code.

The FIELD_API feature is optional, meaning that when FIELD_API is not enabled those particular tests should not be running, we can all agree on that. That is what I fixed. Next, I definitely agree that the CI should be running also with FIELD_API feature enabled.

I forgot to mention that for the build-hpc workflow I also added the missing field_api dependency in commit ce5ae51 . It was not sufficient to set "-DENABLE_FIELD_API=ON", so hopefully the CI is going to test this at least on the HPCs now (lumi and ecmwf-atos). Looking at the logs however it seems to ignore changes to the Github-Actions workflows for pull-requests with "secrets". I guess that is a "feature" of Github-Actions for security reasons. We would need to first change the workflow in the develop branch then to make sure this PR is testing FIELD_API changes.

OK, thanks a lot Willem.

@wdeconinck
Copy link
Collaborator

One of the main requirement of this interface is the ability to dynamically choose betweeen CPU and GPU execution: you want to be able to run the spectral transform on CPU or on GPU with LDACC. Is this compatible with what you suggest ?

I'm not sure what problem you are trying to solve exactly, is this the fact that you have to include an interface file ?

I think the simplicification will still allow us to develop later on the ability to choose between CPU and GPU backends, but not between single and double precision, due to linking to two field_api libraries that are incompatible with each other, i.e. the meanings of FIELD_1RB, FIELD_2RB etc.

What will be possible in the future is to have two libraries : ectrans_field_api_${prec} (${prec}=(sp,dp)) which will link with both ectrans_cpu_${prec} and ectrans_gpu_${prec}.
Based on the LDACC flag you would then need to call invtrans_cpu_${prec} or invtrans_gpu_${prec} within invtrans_field_api(KRESOL=NRESOL, ..., LDACC=...)

It is however not yet clear to me if NRESOL is in this case the same value for LDACC=.TRUE. or LDACC=.FALSE.
We are not there yet. Perhaps the GPU or CPU backend will be embedded in the KRESOL, so that we don't need the LDACC flag:

call invtrans_field_api(KRESOL=NRESOL_GPU, ...)
call invtrans_field_api(KRESOL=NRESOL_CPU, ...)

and then introspection on the KRESOL parameter will need to be used instead of the LDACC flag.

What is clear to me however is that you also did not intend to create these symbols: invtrans_field_api_(cpu,gpu)_(sp,dp).

@wdeconinck
Copy link
Collaborator

See dhaumont#4 for the proposed changes to remove the unique symbol handling with the sedrenames etc.

Remove unique symbols for ectrans_field_api
@dhaumont
Copy link
Contributor Author

dhaumont commented Jan 9, 2026

@samhatfield @wdeconinck After our discussion about different passes in LS and LG that were not needed anymore, I simplified the code by making the rguments optional. I also removed the passes.

c7af7ea

IMPORTANT: I just found a crash in the unit test due to this commit. I'm inverstigating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants