Escher: Self-Evolving Visual Concept Library using Vision-Language Critics

Prior work in concept-bottleneck visual recognition aims to leverage discriminative visual concepts to enable more accurate object classification. Escher is an approach for iteratively evolving a visual concept library using feedback from a VLM critic to discover descriptive visual concepts.

More details can be found on the project page.

Getting started

With Escher:

$ conda create --name escher-dev python=3.12
$ conda activate escher-dev
$ pip install -e . # Installs escher in editable mode
$ vim escher/cbd_utils/server.py
# Edit this file to point to the correct GPT model, API key location, etc.
$ CUDA_VISIBLE_DEVICES=4 python escher/iteration.py --dataset cub --topk 50 --prompt_type confound_w_descriptors_with_conversational_history --distance_type confusion --subselect -1 --decay_factor 10 --classwise_topk 10 --num_iters 100 --perc_labels 0.0 --perc_initial_descriptors 1.00 --algorithm lm4cv --salt "1.debug"
# ^ This runs with openai-gpt-3.5-turbo, no vllm instance required.
$ CUDA_VISIBLE_DEVICES=1 python escher/iteration.py --dataset cub --topk 50 --openai_model LOCAL:meta-llama/Llama-3.1-8B-Instruct --prompt_type confound_w_descriptors_with_conversational_history --distance_type confusion --subselect -1 --decay_factor 10 --classwise_topk 10 --num_iters 100 --perc_labels 0.0 --perc_initial_descriptors 1.00 --algorithm lm4cv --salt "1.debug"
# ^ Same command but the LOCAL: prefix makes it use the vllm_client defined in `serve.py` and calls the LLama-3.1-8B-Instruct model.

# To run on multiple datasets, read cmds.sh

With vLLM (if using local LLM):

$ conda env create --name vllm python=3.9
$ conda activate vllm
$ pip install vllm
$ CUDA_VISIBLE_DEVICES=0,1,2,3 vllm meta-llama/Llama-3.1-8B-Instruct --api-key token-abc123 --port 11440 --tensor-parallel-size 4 --disable-log-requests
# ^ This is a debugging server. Running with meta-llama/Llama-3.3-70B-Instruct is recommended.
$ CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve meta-llama/Llama-3.3-70B-Instruct  --api-key token-abc123 --port 11440 --tensor-parallel-size 4 --disable-log-requests
# ^ See if this fits. I wasn't able to get it work and had to use a quantized model (which also work well).

Generating initial concepts

The respository comes with initial descriptors generated with gpt-4o and gpt-3.5-turbo. However, these initial descriptors need to be regenerated for each new LLM. The code to do this is borrowed from Sachit repo here and is available in escher/utils/generate_cbd_descriptors.py. Here's how it can be used for an arbitrary set of classes:

(base) atharvas@linux:/var/local/atharvas/f/escher$ conda activate escher-dev
(escher-dev) atharvas@linux:/var/local/atharvas/f/escher$ ipython
Python 3.12.0 | packaged by conda-forge | (main, Oct  3 2023, 08:43:22) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.0.2 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `object?` to see the help on `object`, `object??` to view it's source

In [1]: from escher.utils.generate_cbd_descriptors import generate_cbd_descriptors

In [2]: generate_cbd_descriptors(["Black-Footed Albatross", "Laysian Albatross"])
100%|██████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.06s/it]
Out[2]: 
[['large seabird',
  'white head, neck, and underparts',
  'dark grey or black back, wings, and tail',
  'long, narrow wings',
  'hooked beak',
  'dark, piercing eyes',
  'webbed feet with black claws',
  'black feet and legs',],
 ['large seabird',
  'white feathers on head, neck, and underbelly',
  'dark grey or black feathers on back, wings, and tail',
  'long, narrow wings',
  'hooked beak',
  'webbed feet with sharp claws',
  'distinct black eye patch',]]

General Structure

escher/
- iteration.py: Main file to run the self-evolving process. This parses the command line arguments, loads the initial set of concepts defined in descriptors/cbd_descriptors/descriptors_{dataset}.json, and instantiates the model.
- library/
  - library.py: A wrapper around the library of concepts. This class is responsible for loading the concepts, updating the concepts, and saving the concepts.
  - history_conditioned_library.py: An extension of library.py that uses the history of the concepts to condition the library generation.
- models/
  - model.py: The abstract class for a concept-bottleneck based model.
  - model_zero_shot.py: Implementation of the model which does not train any parameters.
  - model_lm4cv.py: Implementation of the model which trains the parameters of the model.
- utils/dataset_loader.py: Main entry point for loading the dataset.
- cbd_utils: Many utility functions for GPT calling / training / evaluation / caching useful for implementing a "zeroshot" Escher model.
- lm4cv: Many utility functions for implementing a "lm4cv" Escher model.
descriptors/: Contains the initial descriptors for each dataset.
cmds.sh: A set of hacky scripts to run escher on multiple datasets.
cache/: This is a cache for the image/text embeddings for all the datasets. This file is too big to keep on GitHub, but a zipped version of this folder is available on this google drive link: Link or if you email me at [email protected]. I hope this eases some of the pain of setting up the datasets, but I'm not equipped to answer general questions about dataset setup unfortunately.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
descriptors		descriptors
escher.egg-info		escher.egg-info
escher		escher
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cmds.sh		cmds.sh
pyproject.toml		pyproject.toml
sandbox.ipynb		sandbox.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Escher: Self-Evolving Visual Concept Library using Vision-Language Critics

Getting started

Generating initial concepts

General Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

trishullab/escher

Folders and files

Latest commit

History

Repository files navigation

Escher: Self-Evolving Visual Concept Library using Vision-Language Critics

Getting started

Generating initial concepts

General Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages