This repository contains the code for the blog post "Bridging the Gap: Enhancing CLIP's Linguistic Adaptability"
The code can run with similar environment as Open Clip, we also provided the environment.yml file as reference
- config/
config.yamlstoring dataset paths
- dataset_management/
dir_label_name_1k.jsona mapping for imagenet class names to their corresponding WordNet synset IDsopen_image_noun_filtered.csva filtered version of the OpenImage dataset, containing the corresponding WordNet synset IDstrain_image_ids.txtthe file for downloading OpenImage dataset
environment.ymlreference file for setting up conda environment- main/
classification_csv_maker.pyscripts to create a subset for Fer2013/ImageNet/OpenImage datasets to run experimentclip_classification.pyevaluation script for CLIP model that shows the impact of lexical adaptationtraining_utility.pyutility functions for trainingtrain_text_encoder.pymain script for training the text encoder
README.md- zero_shot_analyze/
- imagenet/
100_synonym_sample.csvsample file for imagenet synonym task, for evaluation purpose100_synonym_sample.jsonsample file for imagenet synonym task, for training purpose
- imagenet/
- Download both the training set from ImageNet official webiste.
- Set the "imagenet_root_path_train: " in
config/config.yamlto the path of the downloaded training set. - Make sure that it is structured in following format
├── n01443537
├── n01484850
├── n01491361
......
- Follow the instruction on the official website - "Download Manually" by using the file
train_image_ids.txt, - Insert the image path before the first column of
open_image_noun_filtered.csv.
The dataset can be downloaded via kaggle directly
From our knowledge, there is no widely used Image-classification dataset that contains the synset IDs.
If you have such dataset and want to test the method on it, please reference the format of the provided sample files and classification_csv_maker.py for dataset preparation.
You can do a trial run of the training script directly (without evaluation) by using the following command:
cd main
python train_text_encoder.py- During the run, the script will print some information about the training process.
- You should expect the Loss and Reg Loss be similar after the first few epochs.
- In the end, the script will save the trained model to the
tuned_modelsdirectory (created automatically if it does not exist).
If the dataset is prepared, you can do a full pipeline as follows:
-
Select a function in
classification_csv_maker.pyto create a subset of the dataset:Note: we usually suggest setting
levelto 1 since high level hypernyms are not very informative.
Number of classescorresponds to the number of classes you want to use for the dataset. If it is too large, the script will automatically select the largest possible non-overlapping subset for a given taskimagenet_csv_synonym(number_of_classes=)For ImageNet Synonym taskimagenet_csv_hypernym(number_of_classes=, level=)For ImageNet Hypernym taskimagenet_mixed(number_of_classes=, level=)For ImageNet Synonym + Hypernym taskopenimage_csv_synonym(number_of_classes=)For OpenImage Synonym taskopenimage_csv_hypernym(number_of_classes=, level=)For OpenImage Hypernym taskfer_csv_synonym()For FER2013 Synonym task
-
Change the variables
dataset_nameandtarget_file_nameintrain_text_encoder.pyto the name of the dataset you want to use. -
run the training script, the evaluation will be done automatically after the training
python train_text_encoder.py