ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

isaaccorley · 2025-04-23T00:32:13Z

One of the wonderful surprises of torchvision's detector models is that a GeneralizedRCNNTransform gets added under the hood which defaults to ImageNet RGB mean/std normalize + dynamic resizing in the range of (800, 1333).

This PR fixes this by loading pretrained weights but overriding this transform to simply subtract 0 and divide by 1 which is a no-op and changes the dynamic resize to allow for a min/max input shape in the range of (1, 4096).

Alternatives considered:

I attempted to simply replace model.transform with nn.Identity() but this doesn't work because the detection models pass multiple args to the transform which will throw an error.

Fixes #2749

Copilot

Pull Request Overview

This PR fixes multispectral support in the ObjectDetectionTask by overriding the default transform parameters in the detector models. The changes update the initialization of FasterRCNN, FCOS, and RetinaNet with custom parameters (min_size, max_size, image_mean, and image_std) that enable multispectral inputs, and a new test is added to validate this functionality.

Updated transform parameters for multispectral support in three detection model constructors.
Added a new test in tests/trainers/test_detection.py to check multispectral behavior.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
torchgeo/trainers/detection.py	Updated model constructors to override transform parameters for multispectral data.
tests/trainers/test_detection.py	Added a test case to validate multispectral support with a non-RGB input channel.

adamjstewart · 2025-04-23T07:08:31Z

Can you also check instance segmentation?

robmarkcole · 2025-04-23T09:01:50Z

FYI I confirmed no issues with OD on my 4 channel dataset

robmarkcole · 2025-04-23T09:28:22Z

OK, whilst the error is resolved, the loss for OD models I train is always zero. Is there a validated results I can reproduce? Note this might just be my datasets which I recently updated for the new format

isaaccorley · 2025-04-24T02:47:49Z

OK, whilst the error is resolved, the loss for OD models I train is always zero. Is there a validated results I can reproduce? Note this might just be my datasets which I recently updated for the new format

This by default was resizing all imagery to a min of 800. What transforms are you using to preprocess your imagery?

isaaccorley · 2025-04-24T02:56:20Z

@adamjstewart fixed the instance segmentation task. It had the same issue.

robmarkcole · 2025-04-24T08:22:14Z

@isaaccorley can you elaborate on This by default was resizing all imagery to a min of 800?
Example transforms:

        self.train_aug = Ka.AugmentationSequential(    
                Ka.Normalize(mean=self.mean, std=self.std),
                Ka.Resize(self.chip_size),
                Ka.RandomHorizontalFlip(p=0.5),
                Ka.RandomVerticalFlip(p=0.5),
                Ka.RandomRotation(degrees=(90.0, 90.0), p=0.25),
                data_keys=None,
                keepdim=True,
        )

        test_transforms: List[Ka.AugmentationBase2D] = [
            Ka.Normalize(mean=self.mean, std=self.std),
            Ka.Resize(self.chip_size),
        ]

where typically chip_size = 224

isaaccorley · 2025-04-24T11:46:21Z

@isaaccorley can you elaborate on This by default was resizing all imagery to a min of 800?

Example transforms:


        self.train_aug = Ka.AugmentationSequential(    

                Ka.Normalize(mean=self.mean, std=self.std),

                Ka.Resize(self.chip_size),

                Ka.RandomHorizontalFlip(p=0.5),

                Ka.RandomVerticalFlip(p=0.5),

                Ka.RandomRotation(degrees=(90.0, 90.0), p=0.25),

                data_keys=None,

                keepdim=True,

        )



        test_transforms: List[Ka.AugmentationBase2D] = [

            Ka.Normalize(mean=self.mean, std=self.std),

            Ka.Resize(self.chip_size),

        ]

where typically chip_size = 224

Torchvision Faster-RCNN and MaskRCNN has a GeneralizedRCNNTransform transform module inside the model itself that will perform normalize + resizing to a minimum image size of 800. So any image you pass in <800 will be resized to 800. See the code here.

One trick that works well for object detection in remote sensing is to simply resize your small patches to be larger. This may be why you're getting poor performance.

robmarkcole · 2025-04-24T11:54:44Z

@isaaccorley good to know! Perhaps we should document this?

isaaccorley · 2025-04-24T12:00:14Z

@isaaccorley good to know! Perhaps we should document this?

This PR basically removes this transform, so a user can decide which normalize and resize Kornia transform they want to do themselves.

robmarkcole · 2025-04-24T17:25:19Z

I've ruled out issues with my dataset and the remaining differences I see beween my legacy implementation and this implementation are details such as the anchor sizes I've utilised. I suggest we merge this approach and then as a follow up (and pending a suitable test dataset) work on further optimisations in another PR

adamjstewart

LGTM, thanks for the fix!

adamjstewart · 2025-04-30T17:22:41Z

Except for the tests...

robmarkcole · 2025-05-08T17:44:29Z

As discussed in slack with @isaaccorley better results are achieved with the torchvision defaults for norm with

            min_size=800,
            max_size=1333,

Noting allenai/rslearn#171 where they choose 800 for both, which makes sense as we usually deal with square images in RS

robmarkcole · 2025-05-09T08:20:22Z

Some benchmarking based on the defaults using VHR10 for 25 epochs:

task = ObjectDetectionTask(
    model="faster-rcnn",
    backbone="resnet18",
    weights=True,
    in_channels=3,
    num_classes=11,
    trainable_layers=3,
    lr=1e-3,
    patience=10,
    freeze_backbone=False,
)

This PR:

min_size=1,
max_size=4096

test_map_50        │            0.0

Equal sizes - as in rslearn but actually skipped

min_size=800,
max_size=800

test_map_50        │    0.64

Torchvision defaults:

min_size=800,
max_size=1333,

test_map_50        │    0.53

Ideally we just don't transform of course - will check if we can use the NoopTransform from rslearn

adamjstewart · 2025-05-21T09:49:31Z

torchgeo/trainers/detection.py

+                min_size=800,
+                max_size=800,


Is this a hard constraint? It would be nice to be as flexible as possible here.

I actually think it might be better to pass this in as an arg, image_size or similar. However as mentioned it would be preferable if we could avoid this processing altogether.

I'll make it an input arg

Can't we make the range huge and let the datamodules control this?

Do all the datamodules support image_size as an input arg? I have it as one for mine and always perform a resize - definitely want control over this

Not consistently, no, but we could. We definitely could for all the object detection/instance segmentation ones, there aren't that many.

Let me try updating the datamodules with controllable sizes and see if I can get the models to converge. Ideally we could set this range to be a no-op which is what I originally set it to (1,4096) for

isaaccorley requested review from adamjstewart, ashnair1 and Copilot April 23, 2025 00:32

isaaccorley self-assigned this Apr 23, 2025

Copilot AI reviewed Apr 23, 2025

View reviewed changes

github-actions bot added testing Continuous integration testing trainers PyTorch Lightning trainers labels Apr 23, 2025

adamjstewart changed the title ~~Fix Multispectral Support in ObjectDetectionTask~~ ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images Apr 24, 2025

adamjstewart added this to the 0.7.1 milestone Apr 28, 2025

adamjstewart previously approved these changes Apr 30, 2025

View reviewed changes

isaaccorley force-pushed the trainers/multispectral-object-detection branch from 7c10564 to e4c6832 Compare May 1, 2025 19:07

isaaccorley dismissed adamjstewart’s stale review via c196441 May 20, 2025 20:27

isaaccorley added 4 commits May 20, 2025 15:27

fix transform bug in detection trainer when using channels other than 3

1989b3f

add multispectral support to instance segmentation

a03ea9c

add grayscale tests to detection

1c58f45

change sizes to 800x800

bd5d31d

isaaccorley force-pushed the trainers/multispectral-object-detection branch from c196441 to bd5d31d Compare May 20, 2025 20:27

isaaccorley requested a review from adamjstewart May 20, 2025 20:27

adamjstewart reviewed May 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

isaaccorley commented Apr 23, 2025 •

edited

Loading

Copilot AI left a comment

adamjstewart commented Apr 23, 2025

robmarkcole commented Apr 23, 2025

robmarkcole commented Apr 23, 2025 •

edited

Loading

isaaccorley commented Apr 24, 2025

isaaccorley commented Apr 24, 2025 •

edited

Loading

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

adamjstewart left a comment

adamjstewart commented Apr 30, 2025

robmarkcole commented May 8, 2025 •

edited

Loading

robmarkcole commented May 9, 2025 •

edited

Loading

adamjstewart May 21, 2025

robmarkcole May 21, 2025

isaaccorley May 21, 2025

adamjstewart May 21, 2025

robmarkcole May 21, 2025

adamjstewart May 21, 2025

isaaccorley May 21, 2025

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

Are you sure you want to change the base?

ObjectDetection/InstanceSegmentationTask: fix support for non-RGB images #2752

Conversation

isaaccorley commented Apr 23, 2025 • edited Loading

Alternatives considered:

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

adamjstewart commented Apr 23, 2025

robmarkcole commented Apr 23, 2025

robmarkcole commented Apr 23, 2025 • edited Loading

isaaccorley commented Apr 24, 2025

isaaccorley commented Apr 24, 2025 • edited Loading

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

isaaccorley commented Apr 24, 2025

robmarkcole commented Apr 24, 2025

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart commented Apr 30, 2025

robmarkcole commented May 8, 2025 • edited Loading

robmarkcole commented May 9, 2025 • edited Loading

adamjstewart May 21, 2025

Choose a reason for hiding this comment

robmarkcole May 21, 2025

Choose a reason for hiding this comment

isaaccorley May 21, 2025

Choose a reason for hiding this comment

adamjstewart May 21, 2025

Choose a reason for hiding this comment

robmarkcole May 21, 2025

Choose a reason for hiding this comment

adamjstewart May 21, 2025

Choose a reason for hiding this comment

isaaccorley May 21, 2025

Choose a reason for hiding this comment

isaaccorley commented Apr 23, 2025 •

edited

Loading

robmarkcole commented Apr 23, 2025 •

edited

Loading

isaaccorley commented Apr 24, 2025 •

edited

Loading

robmarkcole commented May 8, 2025 •

edited

Loading

robmarkcole commented May 9, 2025 •

edited

Loading