Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XPU and MPS take 3 #276

Merged
merged 43 commits into from
Feb 20, 2025
Merged

XPU and MPS take 3 #276

merged 43 commits into from
Feb 20, 2025

Conversation

jwallwork23
Copy link
Collaborator

@jwallwork23 jwallwork23 commented Feb 6, 2025

Closes #127.
Builds upon #125 and #209.
(Contains changes from #268 so that will need to be merged first.)

This PR adds support XPU and MPS. Unfortunately, it ended up requiring an overhaul of the pt2ts scripts, too.

Notable changes:

  • Switching from ENABLE_CUDA to the more general and extensible GPU_DEVICE=<NONE/CUDA/XPU/MPS>.
  • Pre-processor directives for handling different GPU types.
  • Support for XPU under GPU device code 12.
  • Support for MPS under GPU device code 13.
  • Use of argparse for reading command line arguments into Python scripts, rather than sys.argv.
  • Updates to docs.

Checklist

  • Test on 2 Nvidia GPUs
  • Test on 2 XPUs
  • Test on 1 MPS device

@jwallwork23 jwallwork23 added enhancement New feature or request gpu Related to buiding and running on GPU labels Feb 6, 2025
@jwallwork23 jwallwork23 self-assigned this Feb 6, 2025
@jwallwork23 jwallwork23 marked this pull request as ready for review February 10, 2025 16:21
@jwallwork23
Copy link
Collaborator Author

Offline testing for CUDA version of MultiGPU example with 2 devices passed on Ampere. In the queue for XPU testing on PVC.

@jwallwork23 jwallwork23 requested a review from ma595 February 10, 2025 16:22
This was referenced Feb 11, 2025
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jwallwork23 I have only done a quick pass of this so far and will need to schedule time for a closer look, but I suspect you know my first comment - can you update the docs at utils/README etc. to reflect the new args and usage of pt2ts?

I would also like to see it documented somewhere how the device enums are managed from CMake. Perhaps under the developer docs. As, whilst a very nifty solution, it's slightly abstract if you are not the one who came up with it 😉

@jatkinson1000
Copy link
Member

Also looks like you may want a rebase after #268

@jwallwork23
Copy link
Collaborator Author

Re-tested on Dawn with latest version of branch - all good

Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving all of this forward @jwallwork23 the devices make a fine addition to the collection.

I think we are on the edge of glory, but have a couple of suggestions/clarifications before approval.

More general ruminations on reviewing this:

  • I wonder if there is a better way to keep the pt2ts files in sync... (not to be resolved in this PR)

conda/README.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to update the mac conda to include this in #284

@jwallwork23
Copy link
Collaborator Author

More general ruminations on reviewing this:

* I wonder if there is a better way to keep the pt2ts files in sync... (not to be resolved in this PR)

Yeah agreed. Perhaps the script could be set up such that it doesn't need to be modified, although that might be a tall order.

@jwallwork23
Copy link
Collaborator Author

Latest version now passes tests on Ampere. Awaiting XPU job on PVC.

@jwallwork23
Copy link
Collaborator Author

Latest version now passes tests on Ampere. Awaiting XPU job on PVC.

Passed on PVC, too!

@jatkinson1000
Copy link
Member

Tests run as expected hoped on MPS.

Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am now happy with everything here @jwallwork23 .
Ran all OK on Mac (conda, with MPS to run MultiGPU example) and you checked PVC, CUDA, and CPU.

Fantastic stuff.

You may merge when ready (I would lean towards a squash based on the commit history. The PR touches a lot of files, but generally fairly specific lines in each).

@jwallwork23
Copy link
Collaborator Author

Amazing! Thanks for the thorough review @jatkinson1000 - will merge now.

@jwallwork23 jwallwork23 merged commit 720b067 into main Feb 20, 2025
@jwallwork23 jwallwork23 deleted the 127_xpu-take3 branch February 20, 2025 17:38
jatkinson1000 added a commit that referenced this pull request Feb 25, 2025
* Add MacOS GPU device option
* Add XPU device option
* Update C++ XPU interface to handle multiple devices indices.
* Update ftorch.F90 for XPU support
* Make device enums consistent with PyTorch
* Accept command line arguments in MultiGPU example
* Introduce GPU_DEVICE preprocessor option
* Update pt2ts scripts; use argparse over sys.argv
* Update GPU docs
* Update READMEs
* Add explanation of GPU device codes in dev docs

---------

Co-authored-by: ElliottKasoar <[email protected]>
Co-authored-by: Jack Atkinson <[email protected]>
Co-authored-by: Matt Archer <[email protected]>
Co-authored-by: Jack Atkinson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu Related to buiding and running on GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XPU and MPS support
4 participants