We introduce a novel spike-to-image reconstruction framework SpikeCLIP that goes beyond traditional supervised training paradigms. Leveraging the CLIP model's powerful capability to align text and images, we incorporate the textual description of the captured scene and unpaired high-quality datasets as the supervision. Our experiments on real-world low-light datasets U-CALTECH and U-CIFAR demonstrate that SpikeCLIP significantly enhances texture details and the luminance balance of recovered images. Furthermore, the reconstructed images are well-aligned with the broader visual features needed for downstream tasks, ensuring more robust and versatile performance in challenging environments.
The UHSR real-world spike dataset with class label is available for download here, where we split the data such that the range 0-4999 is used as the train dataset, and the remaining data is used as the test dataset.
Overall, the structure of our project is formulated as:
<project root>
├── imgs
├── data
│ ├── U-CALTECH
│ │ ├── train
│ │ └── test
│ └── U-CIFAR
│ ├── train
│ └── test
└── evaluate.py
- For evaluating our proposed SpikeCLIP on the U-CALTECH dataset, run:
python evaluate.py
For comparative comparison, please refer to Spike-Zoo.
If you find our work useful in your research, please cite:
@article{chen2025spikeclip,
title={Rethinking High-speed Image Reconstruction Framework with Spike Camera
},
author={Chen, Kang and Zheng, Yajing and Huang, Tiejun and Yu, Zhaofei},
journal={arXiv preprint arXiv:2501.04477},
year={2025}
}