Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reranking dataset #4

Open
nabsabraham opened this issue Oct 10, 2024 · 0 comments
Open

reranking dataset #4

nabsabraham opened this issue Oct 10, 2024 · 0 comments

Comments

@nabsabraham
Copy link

hello! thank you for this IR dataset!
This excerpt below says there are 100 queries for reranking but on huggingface there are 5000 queries (but only 4991 unique queries). Am I looking in the wrong place for the rerank eval dataset?

Additionally, for a smaller 100-question IR/Rerank evaluation, the dataset contains one positive for each question and 99 false negatives extracted by hard-negative mining using BM25 and sentence vector models.

from datasets import load_dataset
ds = load_dataset("hotchpotch/JaCWIR", "eval")
df = ds['eval'].to_pandas()
print(df['query'].nunique(), len(df['query'])) 
# 4991 5000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant