How to test my own VLA model on this benchmark?

Hi, 
I’m looking to test my own VLA model on the LIBERO-Plus benchmark. Therefore, I want to reproduce the pi_0 results first(Table 10 in the paper). Since I couldn't find any ready-made evaluation scripts in the repository, do I need to implement the pi0 policy inference logic myself and write a custom script to test it across all 10030 tasks?