Thank you for this detailed survey of LLM judges!
My team is starting a project to build a unified framework for LLM judging and would like a way to automatically evaluate the human agreement and bias of each method. Would you consider open sourcing the code you used to get the results in section 5 of your paper?
We would also require a permissive license like MIT so we can use the code without copyright issues. That would make things much easier for our project :)
Thank you for this detailed survey of LLM judges!
My team is starting a project to build a unified framework for LLM judging and would like a way to automatically evaluate the human agreement and bias of each method. Would you consider open sourcing the code you used to get the results in section 5 of your paper?
We would also require a permissive license like MIT so we can use the code without copyright issues. That would make things much easier for our project :)