I'm passionate about Video Understanding and Multi-modal Large Language Models (MLLMs).
-
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations.
Kyungho Bae, Jinhyung Kim, Sihaeng Lee, Soonyoung Lee, Gunhee Lee, and Jinwoo Choi*. -
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning.
Jongseo Lee†, Kyungho Bae†, Kyle Min, Gyeongmoon Park, and Jinwoo Choi*. -
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding.
Kyungho Bae†, Geo Ahn†, Youngrae Kim†, and Jinwoo Choi.
ECCV 2024 Oral (acceptance rate = 2.33%). -
GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap.
Hyogun Lee†, Kyungho Bae†, Yumin Ko, Seongjong Ha, Gyeongmoon Park, and Jinwoo Choi*.
WACV 2024.
- Video Understanding
- Multi-modal Large Language Models (LLMs)