Hallucination is a common problem for Large Vision-Language Models (LVLMs) with long generations which is difficult to eradicate. The generation with hallucinations is partially inconsistent with the image content. To mitigate hallucination, current studies either focus on the process of model inference or the results of model generation, but the solutions they design sometimes do not deal appropriately with various types of queries and the hallucinations of the generations about these queries. To accurately deal with various hallucinations, we present a unified framework, Dentist, for hallucination mitigation. The core step is to first classify the queries, then perform different processes of hallucination mitigation based on the classification result, just like a dentist first observes the teeth and then makes a plan. In a simple deployment, Dentist can classify queries as perception or reasoning and easily mitigate potential hallucinations in answers which has been demonstrated in our experiments. On MMbench, we achieve a 13.44%/10.2%/15.8% improvement in accuracy on Image Quality, a Coarse Perception visual question answering (VQA) task, over the baseline InstructBLIP/LLaVA/VisualGLM.
To the best of our knowledge, our work is the first to distinguish treatment based on the classification of hallucinations and use a validation cycle for the removal of hallucinations. We encourage readers with questions to contact us at [email protected].
We first select 3 currently mainstream LVLMs as our baseline models, including InstructBLIP, LLaVA, and VisualGLM.
The sizes of baseline parameters are as followed, and please refer to our paper for more details.
We use the above three models to complete experiments on MMbench, LLaVA-QA90, CHAIR and POPE. For comparison, each experiment set up Woodpecker as a control. The experimental results are shown below, and you can check our paper for more details.
- To setup a conda environment
conda create -n dentist python=3.9
conda activate dentist
- Install related packages
- There may be environment conflicts between different benchmarks. Please configure according to the specific environment. The requirements.txt below only contains the basic packages.
pip install -r requirements.txt
- Download the baseline model
- Install Dentist
cd Dentist_ws
pip install -e .
you can run the LlaVA demo with config in ./Dentist/config/llava/llava_config.ini (need to fill in)
cd Dentist_ws
python demo.py
change config file
cd Dentist_ws
python demo.py --config_path new_path
or you can specify by this way
cd Dentist_ws
python demo.py --device 0 --limited_cnt 1 --model_path your_path --openai_key your_key
.
├── demo.py
└── Dentist
├── config
│ ├── instructblip
│ ├── llava
│ └── visualglm
│
└── model
├── instructblip
│ └──instructblip_verifier.py
│
├── llava
│ └──llava_verifier.py
│
├── only_detection
│ ├── instructblip_detector.py
│ ├── llava_detector.py
│ └── visualglm_detector.py
│
├── verifier.py
│
└── visualglm
└── visualglm_verifier.py
The base class is in Dentist/model/verifier.py
Inherited rewriting of base class functions can be referred to the other folders in Dentist/model/
This work is inspired by MMbench and Woodpecker. Sincerely thanks for their awesome works.
If you find our project helpful to your research, please consider citing: