BLIP is a Vision-Language Pre-training (VLP) framework known for its proficiency in both vision-language understanding and generation tasks. It utilizes a multimodal mixture of encoder-decoder models, along with a novel captioning and filtering method for managing noisy data. In this project, we employ BLIP to detect deepfake content effectively.
The dataset utilized in this project includes 300 randomly selected examples from the FF++ dataset. It comprises:
- 150 Deepfake examples
- 150 Real examples
These examples are selected from 50 different videos, with each video contributing 3 frames to the dataset. You can access the dataset at this Kaggle link.
Follow these steps to run the code:
- Clone the repository.
- Download the dataset from the provided link.
- Execute the script by running the following command in your terminal:
python3 script.py /path/to/your/dataset
The logs of the testing process can be seen via logs_for_fakes.txt and logs_for_reals.txt files.
- Accuracy on DeepFakes: 58%
- True Positives: 29
- False Positives: 21
- Accuracy on Real Images: 62%
- True Negatives: 31
- False Negatives: 19
- Overall Accuracy: 60%
- Total Correct: 60
- Total Incorrect: 40