Skip to content

Conversation

@samhatfield
Copy link
Collaborator

In the CPU code path, we don't have "AD" versions for TRLTOG, TRLTOM etc. Instead we simply use the inverse routine (e.g. G->L instead of L->G) with the input / output arguments swapped. In the GPU code path though we have additional subroutines TRLTOMAD and TRMTOLAD. This PR is for investigating whether these routines can be eliminated. If this succeeds, then we can close PR #340 unmerged.

@samhatfield samhatfield added enhancement New feature or request tidying Tidying up code or removing unused features gpu approved-for-ci labels Dec 4, 2025
@samhatfield samhatfield changed the title Remove redundant adjoint GPU communication routines [DRAFT] Remove redundant adjoint GPU communication routines Dec 4, 2025
@samhatfield samhatfield marked this pull request as draft December 4, 2025 11:17
@samhatfield samhatfield force-pushed the remove_redundant_ad_gpu_comm_routines branch from ee9a97f to 8e42ca4 Compare December 4, 2025 11:42
@samhatfield samhatfield force-pushed the remove_redundant_ad_gpu_comm_routines branch from b68ad6b to 8a58a58 Compare December 4, 2025 13:36
@samhatfield samhatfield force-pushed the remove_redundant_ad_gpu_comm_routines branch from 268c21a to f995241 Compare December 4, 2025 14:06
@samhatfield
Copy link
Collaborator Author

Any thoughts on this @l90lpa? I was hoping I could also delete TRLTOMAD_PACK_UNPACK and TRMTOLAD_PACK_UNPACK and somehow reuse the non-adjoint versions, saving another ~700 lines of code, but the adjoint packing/unpacking routines do seem to be genuinely different. I might therefore undo commit f995241.

@samhatfield samhatfield added this to the 1.8.0 milestone Dec 22, 2025
@l90lpa
Copy link
Contributor

l90lpa commented Jan 8, 2026

Hi @samhatfield, sorry for the slow reply. Yes, I had originally wanted to take the approach you're discussing, however, the issue I ran into was that trltom_pack/trltom/trltom_unpack and trmtol_pack/trmtol/trmtol_unpack don't just do transposition but also have some over responsibilities built in that only apply to one transform direction. If I recall correct, trltom_pack isn't just packing a buffer, it is also applying quadrature weights. And again, if I recall correctly somewhere in trmtol_pack/unpack it handles frequency padding. There might be some other bits I forgot. At the time, I did look and is seemed possible to break up these routines to avoid needing additional adjoint code, but this would have required more work outside of the scope of the PRs than I had time for (and I wasn't sure if the GPU code had been written this way intentionally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved-for-ci enhancement New feature or request gpu tidying Tidying up code or removing unused features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants