Release the evaluation dataset splits

Thanks for sharing the amazing work with the community. Could you share the filtered subsets of the datasets you have used in your evaluation?

Also, can you please release the inference script as well in the codebase?