Remove dtype parameter, use previously existing "precision" instead #208

RaulPPelaez · 2023-08-04T07:44:00Z

The dtype parameter was causing some issues (see #205).
I realized that there is already a "pecision" parameter that can be used for the same thing.
This PR removes the dtype argument and uses precision instead, which can be 16, 32 or 64.

@AntonioMirarchi could you check that the issues you were seeing are gone with this PR?

AntonioMirarchi · 2023-08-04T08:28:02Z

Yes, let me to run a training and I will let you know. Looks good to me!

torchmdnet/models/model.py

…ision

float precision

RaulPPelaez · 2023-08-07T13:38:45Z

I worked with @AntonioMirarchi here to include the possibility of training in double precision.
Lightning complains when the module is set to double but the DataModule provides single precision inputs.

Ideally every dataset class should process their corresponding files in either a user-provided dtype or just float64 and have the "get" method cast to whatever type is needed.
However, this is a huge undertaking due to the large amount of datasets currently available (which most just read/write files in float32 and all of them provide float32 in their get method).

Instead I opted for writing a dataset wrapper that is used by the DataModule:

class FloatCastDatasetWrapper(Dataset):
    def __init__(self, dataset, dtype=torch.float64):
        super(FloatCastDatasetWrapper, self).__init__(dataset.root, dataset.transform, dataset.pre_transform, dataset.pre_filter)
        self.dataset = dataset
        self.dtype = dtype

    def len(self):
        return len(self.dataset)

    def get(self, idx):
        data = self.dataset.get(idx)
        for key, value in data:
            if torch.is_tensor(value) and torch.is_floating_point(value):
                setattr(data, key, value.to(self.dtype))
        return data
    def __getattr__(self, name):
        # Check if the attribute exists in the underlying dataset
        if hasattr(self.dataset, name):
            return getattr(self.dataset, name)
        raise AttributeError(f"'{type(self).__name__}' and its underlying dataset have no attribute '{name}'")

This simply intercepts the get method and casts every tensor in the data to the correct type.

Bonus points: Should the need arise it would be easy now to enable training/inference in other floating types, such as bfloat16 or even lower precision stuff like nf4

AntonioMirarchi

LGTM! I trained without any problem and error using single and double precision.

RaulPPelaez · 2023-08-07T14:26:43Z

@guillemsimeon @raimis please review

Remove dtype parameter, use previously existing "precision" instead

c155f18

AntonioMirarchi reviewed Aug 7, 2023

View reviewed changes

torchmdnet/models/model.py Outdated Show resolved Hide resolved

RaulPPelaez added 3 commits August 7, 2023 13:15

Do not store dtype in args when creating the model

87ea233

Wrap the dataset in the DataLoader to cast data to the requested prec…

d5031b2

…ision

Inherit every member from the wrapped datset when casting to other

24e3329

float precision

RaulPPelaez mentioned this pull request Aug 7, 2023

POC: enable to train at the double precision #207

Closed

blacken

b9277d8

Add tests for double precision training

ca88f19

AntonioMirarchi approved these changes Aug 7, 2023

View reviewed changes

RaulPPelaez added 3 commits August 7, 2023 15:56

Remove unnecessary default

a06541f

Add precision to a test

50a2196

Fix a test

ef7c20a

guillemsimeon approved these changes Aug 8, 2023

View reviewed changes

RaulPPelaez merged commit dca6679 into torchmd:main Aug 8, 2023

AntonioMirarchi mentioned this pull request Sep 5, 2023

fix lightning hparams conflict #205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove dtype parameter, use previously existing "precision" instead #208

Remove dtype parameter, use previously existing "precision" instead #208

Uh oh!

RaulPPelaez commented Aug 4, 2023

Uh oh!

AntonioMirarchi commented Aug 4, 2023

Uh oh!

Uh oh!

RaulPPelaez commented Aug 7, 2023

Uh oh!

AntonioMirarchi left a comment

Uh oh!

RaulPPelaez commented Aug 7, 2023

Uh oh!

Uh oh!

Remove dtype parameter, use previously existing "precision" instead #208

Remove dtype parameter, use previously existing "precision" instead #208

Uh oh!

Conversation

RaulPPelaez commented Aug 4, 2023

Uh oh!

AntonioMirarchi commented Aug 4, 2023

Uh oh!

Uh oh!

RaulPPelaez commented Aug 7, 2023

Uh oh!

AntonioMirarchi left a comment

Choose a reason for hiding this comment

Uh oh!

RaulPPelaez commented Aug 7, 2023

Uh oh!

Uh oh!