Skip to content

Conversation

@rem1776
Copy link
Contributor

@rem1776 rem1776 commented Sep 22, 2025

Description
draft for the io offloading changes

How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Please also note
any relevant details for your test configuration (e.g. compiler, OS). Include
enough information so someone can reproduce your tests.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • New check tests, if applicable, are included
  • make distcheck passes

@rem1776 rem1776 marked this pull request as ready for review December 1, 2025 17:52
Copy link
Contributor

@vithikashah001 vithikashah001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs documentation for variables, subroutines and functions.

@J-Lentz J-Lentz self-requested a review January 5, 2026 18:50
Copy link
Contributor

@uramirez8707 uramirez8707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes, but I think this is a good start

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the apache license

test_fms/offloading/test_io_offloading.F90 and test_fms/offloading/test_metadata_transfer.F90 needs it too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license for this file should be apache

test_fms/offloading/test_io_offloading.sh needs it too

integer, parameter :: non_domain_decomposed = 1 !< enumeration for type of domain
integer, parameter :: Unstructured_grid = 2 !< enumeration for type of domain

integer, parameter :: max_files = 10 !< amount of offloaded files to allocate space for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO Make this a namelist, runtime constant, etc

!! Registers an axis to a netcdf file on offloaded PEs. File must have been opened with open_file_offload.
interface register_axis_offload
procedure :: register_netcdf_axis_offload
procedure :: register_domain_axis_offload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO we need a register_unstructured_axis_offload for the unstructured grid


!< The number of tiles must be the same as the number of offloading pes
!if (size(pe_out) .ne. ntile ) &
! call mpp_error(FATAL, "The number of offloading PEs must be the same as the number of tiles of the domain")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added back, and it should crash if the ntiles is not a multiple of the ntiles

Comment on lines +121 to +123
call mpp_broadcast(filename_out, str_len, pe_in(1))
call mpp_broadcast(global_domain_size, size(global_domain_size), pe_in(1))
call mpp_broadcast(ntile, pe_in(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO maybe we can find a way to do this all in one mpp_brodcast call (or mpp_send/mpp_receive)

integer, allocatable :: all_current_pes(:)
integer, allocatable :: broadcasting_pes(:)
logical :: is_model_pe
character(len=255) :: var_info(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to use the max str length variable

Comment on lines +347 to +352
if (.not. is_model_pe) then
select type(wut=>this%fileobj)
type is(FmsNetcdfDomainFile_t)
call register_axis(wut, trim(var_info(1)), trim(var_info(2)))
end select
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clean this up

type(domain2D) :: domain_out
type(domain2D) :: domain_in
integer :: isc, iec, jsc, jec, nz, redistribute_clock
character(len=255) :: varname_tmp(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use max str length

!> Structure to hold offloading file information
type :: offloading_obj_out
integer :: id
character(len=:), allocatable :: filename !< filename of the offloaded netcdf file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to store the filename here, as opposed to getting it from fileobj%path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No looks like it can be removed, I'll add that change.

character(len=:), allocatable :: filename !< filename of the offloaded netcdf file
class(FmsNetcdfFile_t), allocatable :: fileobj !< fms2_io file object
type(domain2D) :: domain_out !< domain on offloading PEs
integer :: type_of_domain !< type of domain (domain_decomposed, non_domain_decomposed, Unstructured_grid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem that type_of_domain is used. Is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to remove this as well.

integer :: type_of_domain !< type of domain (domain_decomposed, non_domain_decomposed, Unstructured_grid)
end type

!> Offload equivalent of open_file in fms2_io_mod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems like it belongs above open_file_offload.

type_of_domain = domain_decomposed
type is (FmsNetcdfUnstructuredDomainFile_t)
type_of_domain = Unstructured_grid
end select
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_of_domain is assigned, but never used.

endif

if (.not. is_model_pe) then
select type(wut=>this%fileobj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since FmsNetcdfDomainFile_t has no types derived from it, perhaps change fileobj from class(FmsNetcdfDomainFile_t) to type(FmsNetcdfDomainFile_t), and remove the select type.

endif

if (.not. is_model_pe) then
select type(wut=>this%fileobj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment, regarding class(FmsNetcdfDomainFile_t) to type(FmsNetcdfDomainFile_t).

select type(wut=>this%fileobj)
type is(FmsNetcdfDomainFile_t)
call register_field(wut, trim(var_info(1)), trim(var_info(2)), var_info(3:))
end select
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to handle the case where fileobj is FmsNetcdfFile_t as opposed to FmsNetcdfDomainFile_t - is this case supported?

!! This routine should be called from both the model PEs and the offload PEs, with the full list
!! of pes for each group being provided. The model PEs will broadcast the filename and domain.
subroutine open_file_offload(fileobj, filename, domain_in, pe_in, pe_out)
class(FmsNetcdfFile_t), intent(inout) :: fileobj !< fms2_io file object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other subroutines (e.g., write_data_offload) only seem to support FmsNetcdfDomainFile_t and not the FmsNetcdfFile_t base type. If non-domain-decomposed files are unsupported, type(FmsNetcdfDomainFile_t) should probably be used consistently for fileobj arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants