Bug in qwen2-vl ? #552

LiJunscs · 2025-02-23T12:27:18Z

  for i in range(len(contexts)):
      if "<image>" in contexts[i]:
          contexts[i] = contexts[i].replace("<image>", "")

  messages = []
  processed_visuals = []
  for i, context in enumerate(contexts):
      if "<image>" in context:
          context = context.replace("<image>", "")

Code above is from the Qwen2_VL's generate_until function,. It replaces all the "<image>" token to empty string. When I evaluate the longvideobench with subtitles, it interleaves and subtitle text. In my opinion， it seems to evaluate in this format, but the implement of qwen2vl makes it only include text. Is this a bug?

Additionally, I see that the Qwen2_5_VL delete these code.

kcz358 · 2025-02-25T02:54:50Z

Because this is a dummy image token in LLaVA, so I don't think it would make any effect for Qwen2-VL. To truly interleave, the inputs are packed with messages format

LiJunscs · 2025-02-25T03:08:48Z

Because this is a dummy image token in LLaVA, so I don't think it would make any effect for Qwen2-VL. To truly interleave, the inputs are packed with messages format

Thank you for your answer.
Another question, the option to w/wo subtitiles is false default.

Does this meet the requirements of the benchmark?

kcz358 · 2025-02-25T05:08:26Z

I think there are 2 splits, one is interleave one is v right? Does this correspond to with and without subtitle?

LiJunscs · 2025-02-25T14:36:36Z

I think there are 2 splits, one is interleave one is v right? Does this correspond to with and without subtitle?

  if lmms_eval_specific_kwargs.get("insert_interleave_subtitles", False):
      with open(Path(__file__).parent / "longvideobench_val_i.yaml", "r") as f:
          raw_data = f.readlines()
          safe_data = []
          for i, line in enumerate(raw_data):
              # remove function definition since yaml load cannot handle it
              if "!function" not in line:
                  safe_data.append(line)
      cache_name = yaml.safe_load("".join(safe_data))["dataset_kwargs"]["cache_dir"]
      subtitle_subdir_name = yaml.safe_load("".join(safe_data))["dataset_kwargs"].get("subtitle_subdir", "subtitles")
      cache_dir = os.path.join(base_cache_dir, cache_name, subtitle_subdir_name)
      with open(os.path.join(cache_dir, doc["subtitle_path"])) as f:
          subtitles = json.load(f)

      max_num_frames = yaml.safe_load("".join(safe_data))["dataset_kwargs"].get("max_num_frames", 16)

      frame_timestamps = compute_frame_timestamps(doc["duration"], max_num_frames)
      interleaved_prefix = insert_subtitles_into_frames(frame_timestamps, subtitles, doc["starting_timestamp_for_subtitles"], doc["duration"])
      return f"{pre_prompt}{interleaved_prefix}\n{question}\n{post_prompt}"
  else:
      return f"{pre_prompt}{question}\n{post_prompt}"

This is the code segment to construct the prompt of longvideobench.
If the insert_interleaved_subtitles is false(default is false), there is no subtitles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in qwen2-vl ? #552

Bug in qwen2-vl ? #552

LiJunscs commented Feb 23, 2025 •

edited

Loading

kcz358 commented Feb 25, 2025

LiJunscs commented Feb 25, 2025

kcz358 commented Feb 25, 2025 •

edited

Loading

LiJunscs commented Feb 25, 2025

Bug in qwen2-vl ? #552

Bug in qwen2-vl ? #552

Comments

LiJunscs commented Feb 23, 2025 • edited Loading

kcz358 commented Feb 25, 2025

LiJunscs commented Feb 25, 2025

kcz358 commented Feb 25, 2025 • edited Loading

LiJunscs commented Feb 25, 2025

LiJunscs commented Feb 23, 2025 •

edited

Loading

kcz358 commented Feb 25, 2025 •

edited

Loading