feat(accuracy): BigBench-Hard DeepEval-backed benchmark (AIP-878) #924
Codecov / codecov/patch
succeeded
May 23, 2026 in 0s
95.00% of diff hit (target 82.03%)
View this Pull Request on Codecov
95.00% of diff hit (target 82.03%)
Loading