Checks for inconsistent device types #272

jwallwork23 · 2025-01-30T14:51:14Z

Closes #269.

This PR adds checks that overloaded operators are only applied to tensors that have the same device type.

It's sufficient to add the checks in the assignment operator and the binary operators as the unary operators will go through the assignment anyway.

The PR also adds checks that tensors have been constructed before they are interrogated. Related fix in the autograd example - the output tensor Q wasn't set up so didn't have a device type yet.

Questions:

Do we also need to check the device indices are consistent?
- -> future PR
And what about data types?
- -> future PR
Can we add pFUnit unit tests that check these errors are raised as expected?
- ->Doesn't seem so (see below).

jatkinson1000

Hey @jwallwork23 I just rebased this off of current main and then modified an example to check that if we try and add two with different devices we get an error raised as expected.

I used the below:

real(wp), dimension(5), target :: a, b, combine_data
type(torch_tensor) :: tensor_a, tensor_b, combine_tensor

a = [0.0_wp, 1.0_wp, 2.0_wp, 3.0_wp, 4.0_wp]
b = [0.0_wp, 1.0_wp, 2.0_wp, 3.0_wp, 4.0_wp]

call torch_tensor_from_array(tensor_a, a, tensor_layout, torch_kCPU)
call torch_tensor_from_array(tensor_b, b, tensor_layout, torch_kMPS)

combine_tensor = tensor_a + tensor_b
call torch_tensor_print(combine_tensor)

When tensors a and b are on CPU this passes fine, but in the above state (b on MPS) we get:

 Error :: cannot add tensors with different device types
Note: The following floating-point exceptions are signalling: IEEE_OVERFLOW_FLAG
STOP 1

Since we looked together in-person I am happy to approve this now.

For posterity we also discussed that this error warning could be made friendlier (parent routine, line number etc) but this is not a simple task and not to be done here.

jwallwork23 · 2025-02-27T11:35:05Z

* Can we add pFUnit unit tests that check these errors are raised as expected?

I don't think it currently supports this - as far as I can tell, it can only mark certain tests as expected to fail, not interrogate error messages.

jwallwork23 · 2025-02-27T11:49:42Z

[Rebased on top of main, fixing various conflicts]

* Drop torch_tensor_to_<array/blob> * Remove torch_tensor_to_array from autograd example * Use std::move for assignment * Update constructors/destructors tests to avoid to_array * Update summary of autograd example * Update overloads tests to avoid to_array * Fix previous API change note

jatkinson1000 · 2025-03-03T19:25:11Z

I think I hadn't re-approved this because of the issues I had seen when trying to print a tensor.
See where I was investigating on mix_devices_temp branch.

Did we get to be bottom of that? I know it was mentioned in the meeting this morning.

Looking at the simplenet that failed that I uploaded I wonder if it was to do with a lack of importing the overloaded =.
There are also a few things that have changed with the addition of #303 so I will try and re-check this at some point to be certain.

jatkinson1000 · 2025-03-03T19:42:57Z

OK, got nerd sniped.

The segfault was resolved by ensuring use ftorch, only : assignment(=) was present.

So that is good news. The bad news I guess is that this can be a tricksy bug/error to spot/resolve as there isn't much to go on from the FTorch output.
Not sure there is an easy fix to this however, but definitely something to be aware and make overly clear to users of when writing the docs @jwallwork23 .

jwallwork23 · 2025-03-04T09:08:57Z

The segfault was resolved by ensuring use ftorch, only : assignment(=) was present.

So that is good news. The bad news I guess is that this can be a tricksy bug/error to spot/resolve as there isn't much to go on from the FTorch output. Not sure there is an easy fix to this however, but definitely something to be aware and make overly clear to users of when writing the docs @jwallwork23 .

Ah yes! Good work tracking it down. Opened #305 to address this.

jwallwork23 added bug Something isn't working autograd Tasks towards the online training / automatic differentiation feature testing Related to FTorch testing gpu Related to buiding and running on GPU labels Jan 30, 2025

jwallwork23 self-assigned this Jan 30, 2025

jwallwork23 changed the title ~~269 tensor op different device error~~ Checks for inconsistent device types Jan 30, 2025

jatkinson1000 added this to the Version v1.0 milestone Feb 26, 2025

jatkinson1000 mentioned this pull request Feb 27, 2025

Add ability to move tensors between devices in Fortran #299

Open

jatkinson1000 force-pushed the 269_tensor-op-different-device-error branch from 50125ff to f02e334 Compare February 27, 2025 11:10

jatkinson1000 marked this pull request as ready for review February 27, 2025 11:10

jatkinson1000 approved these changes Feb 27, 2025

View reviewed changes

jwallwork23 mentioned this pull request Feb 27, 2025

Windows CI failing to find TorchScript during build #300

Open

This was referenced Feb 27, 2025

Raise error if overloaded operator applied to tensors on different device indices #301

Open

Raise error if overloaded operator applied to tensors with different data types #302

Open

jwallwork23 added 5 commits February 27, 2025 11:47

Account for conflicts with recent main

15328d3

Assigned tensor needs to be inout

4216eb4

Output array needs to be created empty in autograd example

a272a38

Do not interrogate tensors that haven't been constructed

4043a65

Post-rebase fix

4cfa4ab

jwallwork23 force-pushed the 269_tensor-op-different-device-error branch from f02e334 to 4cfa4ab Compare February 27, 2025 11:49

jwallwork23 requested a review from jatkinson1000 February 27, 2025 11:50

jwallwork23 mentioned this pull request Mar 3, 2025

Drop torch_tensor_to_array #303

Merged

jwallwork23 merged commit 6deab9a into main Mar 3, 2025
5 of 6 checks passed

jwallwork23 deleted the 269_tensor-op-different-device-error branch March 3, 2025 18:13

jwallwork23 mentioned this pull request Mar 4, 2025

Docs on overloaded assignment #305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checks for inconsistent device types #272

Checks for inconsistent device types #272

jwallwork23 commented Jan 30, 2025 •

edited

Loading

jatkinson1000 left a comment

jwallwork23 commented Feb 27, 2025

jwallwork23 commented Feb 27, 2025

jatkinson1000 commented Mar 3, 2025 •

edited

Loading

jatkinson1000 commented Mar 3, 2025

jwallwork23 commented Mar 4, 2025

Checks for inconsistent device types #272

Checks for inconsistent device types #272

Conversation

jwallwork23 commented Jan 30, 2025 • edited Loading

Questions:

jatkinson1000 left a comment

Choose a reason for hiding this comment

jwallwork23 commented Feb 27, 2025

jwallwork23 commented Feb 27, 2025

jatkinson1000 commented Mar 3, 2025 • edited Loading

jatkinson1000 commented Mar 3, 2025

jwallwork23 commented Mar 4, 2025

jwallwork23 commented Jan 30, 2025 •

edited

Loading

jatkinson1000 commented Mar 3, 2025 •

edited

Loading