Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in qwen2-vl ? #552

Open
LiJunscs opened this issue Feb 23, 2025 · 4 comments
Open

Bug in qwen2-vl ? #552

LiJunscs opened this issue Feb 23, 2025 · 4 comments

Comments

@LiJunscs
Copy link

LiJunscs commented Feb 23, 2025

  for i in range(len(contexts)):
      if "<image>" in contexts[i]:
          contexts[i] = contexts[i].replace("<image>", "")

  messages = []
  processed_visuals = []
  for i, context in enumerate(contexts):
      if "<image>" in context:
          context = context.replace("<image>", "")

Code above is from the Qwen2_VL's generate_until function,. It replaces all the "<image>" token to empty string. When I evaluate the longvideobench with subtitles, it interleaves and subtitle text. In my opinion, it seems to evaluate in this format, but the implement of qwen2vl makes it only include text. Is this a bug?

Additionally, I see that the Qwen2_5_VL delete these code.

@kcz358
Copy link
Collaborator

kcz358 commented Feb 25, 2025

Because this is a dummy image token in LLaVA, so I don't think it would make any effect for Qwen2-VL. To truly interleave, the inputs are packed with messages format

@LiJunscs
Copy link
Author

Because this is a dummy image token in LLaVA, so I don't think it would make any effect for Qwen2-VL. To truly interleave, the inputs are packed with messages format

Thank you for your answer.
Another question, the option to w/wo subtitiles is false default.

Does this meet the requirements of the benchmark?

@kcz358
Copy link
Collaborator

kcz358 commented Feb 25, 2025

I think there are 2 splits, one is interleave one is v right? Does this correspond to with and without subtitle?

@LiJunscs
Copy link
Author

I think there are 2 splits, one is interleave one is v right? Does this correspond to with and without subtitle?

  if lmms_eval_specific_kwargs.get("insert_interleave_subtitles", False):
      with open(Path(__file__).parent / "longvideobench_val_i.yaml", "r") as f:
          raw_data = f.readlines()
          safe_data = []
          for i, line in enumerate(raw_data):
              # remove function definition since yaml load cannot handle it
              if "!function" not in line:
                  safe_data.append(line)
      cache_name = yaml.safe_load("".join(safe_data))["dataset_kwargs"]["cache_dir"]
      subtitle_subdir_name = yaml.safe_load("".join(safe_data))["dataset_kwargs"].get("subtitle_subdir", "subtitles")
      cache_dir = os.path.join(base_cache_dir, cache_name, subtitle_subdir_name)
      with open(os.path.join(cache_dir, doc["subtitle_path"])) as f:
          subtitles = json.load(f)

      max_num_frames = yaml.safe_load("".join(safe_data))["dataset_kwargs"].get("max_num_frames", 16)

      frame_timestamps = compute_frame_timestamps(doc["duration"], max_num_frames)
      interleaved_prefix = insert_subtitles_into_frames(frame_timestamps, subtitles, doc["starting_timestamp_for_subtitles"], doc["duration"])
      return f"{pre_prompt}{interleaved_prefix}\n{question}\n{post_prompt}"
  else:
      return f"{pre_prompt}{question}\n{post_prompt}"

This is the code segment to construct the prompt of longvideobench.
If the insert_interleaved_subtitles is false(default is false), there is no subtitles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants