Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing image encoder #19

Open
oscarloch opened this issue Jan 21, 2025 · 1 comment
Open

Changing image encoder #19

oscarloch opened this issue Jan 21, 2025 · 1 comment

Comments

@oscarloch
Copy link

Hi!

I was wondering:
What do you think is the easiest way to change the image encoder from CvT to UniFormer?

I want to make this change, but I’m a little confused about how to do it.
My initial idea is to edit the file:
modules/transformers/single_model/modelling_single.py
However, I’m not entirely sure if this is the only .py file I need to modify or which functions in that file I can reuse when implementing UniFormer, and which ones I cannot recycle.

Thanks again for all the help! This is a really useful model =).

@anicolson
Copy link
Member

Hi @oscarloch,

In the lightning_module, e.g.,

if self.warm_start_modules:
,

Change this:

    if self.warm_start_modules:
        encoder = CvtWithProjectionHead.from_pretrained(encoder_ckpt_name, config=config_encoder)
        decoder = transformers.BertLMHeadModel(config=config_decoder)
        self.encoder_decoder = SingleCXREncoderDecoderModel(encoder=encoder, decoder=decoder)
    else:
        config = transformers.VisionEncoderDecoderConfig.from_pretrained(encoder_decoder_ckpt_name)
        self.encoder_decoder = SingleCXREncoderDecoderModel(config=config)

To this (ignoring self.warm_start_modules to make things easier):

        encoder = transformers.AutoModel.from_pretrained('aehrc/uniformer_base_tl_384', trust_remote_code=True)
        decoder = transformers.BertLMHeadModel(config=config_decoder)
        self.encoder_decoder = SingleCXREncoderDecoderModel(encoder=encoder, decoder=decoder)

This should also be the change for the multi-image and longitudinal, multi-image cases.

I haven't tested this, so let me know what errors occur (will probably be to do with the shape of the UniFormer output versus CvT, but we should be able to fix that easily).

A.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants