Skip to content

Possible typo in WebQA retrieval R@5 result (71.96 → 75.96?) #25

@ChaoyiZh

Description

@ChaoyiZh

Hi, thank you for the great work! We have reproduced the MMQA results and they match the paper exactly. However, when reproducing the WebQA CLIP retrieval results, we found a potential typo in the reported R@5 value.
We evaluated using the provided pre-built FAISS index and CLIP-ViT-L/14-336px, running utils/indexing_faiss.py with --datasets WebQA --clip_type clip --topk {2,5,10}.
Our results:
Metric | Paper | Ours
R@2 | 57.10 | 57.04
R@5 | 71.96 | 75.96
R@10 | 84.86 | 84.84
R@2 and R@10 match within 0.06%, which is within expected floating-point precision. However, R@5 differs by 4.0 points. Since the retrieval is fully deterministic (same model, same pre-built IndexFlatIP index), this gap is unexpected. We noticed that 71.96 and 75.96 differ by only a single digit (1 vs 5), which looks like a typo. Could you confirm the correct R@5 value?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions