reranking dataset #4

nabsabraham · 2024-10-10T03:30:04Z

hello! thank you for this IR dataset!
This excerpt below says there are 100 queries for reranking but on huggingface there are 5000 queries (but only 4991 unique queries). Am I looking in the wrong place for the rerank eval dataset?

Additionally, for a smaller 100-question IR/Rerank evaluation, the dataset contains one positive for each question and 99 false negatives extracted by hard-negative mining using BM25 and sentence vector models.

from datasets import load_dataset
ds = load_dataset("hotchpotch/JaCWIR", "eval")
df = ds['eval'].to_pandas()
print(df['query'].nunique(), len(df['query'])) 
# 4991 5000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reranking dataset #4

reranking dataset #4

nabsabraham commented Oct 10, 2024

reranking dataset #4

reranking dataset #4

Comments

nabsabraham commented Oct 10, 2024