A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Hallucination is a common problem for Large Vision-Language Models (LVLMs) with long generations which is difficult to eradicate. The generation with hallucinations is partially inconsistent with the image content. To mitigate hallucination, current studies either focus on the process of model inference or the results of model generation, but the solutions they design sometimes do not deal appropriately with various types of queries and the hallucinations of the generations about these queries. To accurately deal with various hallucinations, we present a unified framework, Dentist, for hallucination mitigation. The core step is to first classify the queries, then perform different processes of hallucination mitigation based on the classification result, just like a dentist first observes the teeth and then makes a plan. In a simple deployment, Dentist can classify queries as perception or reasoning and easily mitigate potential hallucinations in answers which has been demonstrated in our experiments. On MMbench, we achieve a 13.44%/10.2%/15.8% improvement in accuracy on Image Quality, a Coarse Perception visual question answering (VQA) task, over the baseline InstructBLIP/LLaVA/VisualGLM.

To the best of our knowledge, our work is the first to distinguish treatment based on the classification of hallucinations and use a validation cycle for the removal of hallucinations. We encourage readers with questions to contact us at [email protected].

Baseline

We first select 3 currently mainstream LVLMs as our baseline models, including InstructBLIP, LLaVA, and VisualGLM.

The sizes of baseline parameters are as followed, and please refer to our paper for more details.

Experiment

We use the above three models to complete experiments on MMbench, LLaVA-QA90, CHAIR and POPE. For comparison, each experiment set up Woodpecker as a control. The experimental results are shown below, and you can check our paper for more details.

MMbench Result

LLaVA-QA90 Result

CHAIR Result

POPE Result

Example

Setup

To setup a conda environment

conda create -n dentist python=3.9
conda activate dentist

Install related packages

There may be environment conflicts between different benchmarks. Please configure according to the specific environment. The requirements.txt below only contains the basic packages.

pip install -r requirements.txt

Download the baseline model

Install Dentist

cd Dentist_ws
pip install -e .

Demo

you can run the LlaVA demo with config in ./Dentist/config/llava/llava_config.ini (need to fill in)

cd Dentist_ws
python demo.py

change config file

cd Dentist_ws
python demo.py --config_path new_path

or you can specify by this way

cd Dentist_ws
python demo.py --device 0 --limited_cnt 1 --model_path your_path --openai_key your_key

Code Tree

.
├── demo.py
└── Dentist
    ├── config
    │   ├── instructblip
    │   ├── llava
    │   └── visualglm
    │
    └── model
        ├── instructblip
        │   └──instructblip_verifier.py
        │
        ├── llava
        │   └──llava_verifier.py
        │
        ├── only_detection
        │   ├── instructblip_detector.py
        │   ├── llava_detector.py
        │   └── visualglm_detector.py
        │
        ├── verifier.py
        │
        └── visualglm
            └── visualglm_verifier.py

The base class is in Dentist/model/verifier.py

Inherited rewriting of base class functions can be referred to the other folders in Dentist/model/

Acknowledgement

This work is inspired by MMbench and Woodpecker. Sincerely thanks for their awesome works.

Citation

If you find our project helpful to your research, please consider citing:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Dentist		Dentist
figs		figs
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Baseline

Experiment

MMbench Result

LLaVA-QA90 Result

CHAIR Result

POPE Result

Example

Setup

Demo

Code Tree

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

CYandYue/Dentist

Folders and files

Latest commit

History

Repository files navigation

A Unified Hallucination Mitigation Framework for Large Vision-Language Models

Baseline

Experiment

MMbench Result

LLaVA-QA90 Result

CHAIR Result

POPE Result

Example

Setup

Demo

Code Tree

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages