Skip to content

Conversation

mattkjames7
Copy link
Contributor

@mattkjames7 mattkjames7 commented Sep 23, 2025

Description

Added Dockerfile.cuda specifically for CUDA build and updated packaging workflows to use it.

A manual build seems to show CUDA as available, e.g.:

matt@matt-MS-7B86:/media/raid/Work/github$ocker run --rm --name mage-cuda -p 7687:7687 --gpus all -d memgraph/memgraph-mage:3.5.1-cuda --telemetry-enabled=False --log-level=TRACE
de3bd647563738c8040dcab120068c67cf78c324e9530ce438fd01fcc9f02236
matt@matt-MS-7B86:/media/raid/Work/github$ docker ps
CONTAINER ID   IMAGE                               COMMAND                  CREATED         STATUS         PORTS      NAMES
de3bd6475637   memgraph/memgraph-mage:3.5.1-cuda   "/usr/lib/memgraph/m…"   4 seconds ago   Up 3 seconds   7687/tcp   mage-cuda
0ab1328faca6   moby/buildkit:buildx-stable-1       "buildkitd --allow-i…"   3 months ago    Up 10 hours               buildx_buildkit_stupefied_jones0
matt@matt-MS-7B86:/media/raid/Work/github$ docker exec -it -u memgraph mage-cuda bash
memgraph@de3bd6475637:/$ python3
Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4070 Ti SUPER'
>>> 
  • CUDA: 12.6
  • PyTorch: 2.6.0

TODO:

  • Fine-tune the torch-* install to use prebuilt packages for whichever version of torch we end up building with (otherwise building takes forever, as these packages seem to require building from source).
  • Select torch version - current version used in MAGE is 2.6.0; current stable version is 2.8.0 - recommendation: use 2.6.0 until dgl is removed.
  • Select which version(s) of CUDA we want to support. Current release is 13.0, but the latest supported by torch==2.8.0 is 12.9; recommendation: use version 12.6 as it is supported by all versions of torch from 2.6.0 to 2.8.0, and it still has support for older drivers (>=525 to <580) and GPUs (Pascal, Maxwell).
  • ROCm
  • Test module to see if it can use the GPU
  • Test multiple GPUs
  • Revert changes to Dockerfile.release
  • Install the https://pypi.org/project/memgraph-toolbox/

Pull request type

  • Bugfix
  • Algorithm/Module
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Related issues

Delete if this PR doesn't resolve any issues. Link the issue if it does.

######################################

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

  • Core algorithm/module implementation
  • Query module implementation
  • Tests provided (unit / e2e)
  • Code documentation
  • README short description

Documentation checklist

  • Add the documentation label tag
  • Add the bug / feature label tag
  • Add the milestone for which this feature is intended
    • If not known, set for a later milestone
  • Write a release note, including added/changed clauses
    • [Release note text]
  • Link the documentation PR here
    • [Documentation PR link]

- name: Set target dockerfile
run: |
DOCKERFILE="Dockerfile.release"
if [[ "${{ inputs.cuda }}" == true ]]; then

Check failure

Code scanning / SonarCloud

GitHub Actions should not be vulnerable to script injections High

Change this workflow to not use user-controlled data directly in a run block. See more on SonarQube Cloud
@gitbuda gitbuda mentioned this pull request Sep 26, 2025
19 tasks
@mattkjames7 mattkjames7 added this to the mage-v3.6.0 milestone Sep 26, 2025
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
15 Security Hotspots
E Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@mattkjames7 mattkjames7 requested a review from gitbuda September 26, 2025 12:10
@mattkjames7 mattkjames7 marked this pull request as ready for review September 26, 2025 12:11
@gitbuda
Copy link
Member

gitbuda commented Oct 5, 2025

This is working (I've built the image and run it on GPUs). Merging into my branch, I think we should polish the API a bit.

@gitbuda gitbuda merged commit fa79c8c into add-torch-gpu-docker-support Oct 5, 2025
16 of 19 checks passed
@gitbuda gitbuda deleted the add-torch-gpu-docker-support-matt branch October 5, 2025 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants