Describe the bug
During kernel optimization, agents may introduce cheating behavior that produces unrealistic speedups of over 100x. At present, we have to manually inspect generated patches using Cursor or Claude Code to determine whether they are legitimate. To address this, we should add a patch verification step to the save_and_test tool so that suspicious or invalid patches can be automatically detected before benchmarking.
Describe the bug
During kernel optimization, agents may introduce cheating behavior that produces unrealistic speedups of over 100x. At present, we have to manually inspect generated patches using Cursor or Claude Code to determine whether they are legitimate. To address this, we should add a patch verification step to the
save_and_testtool so that suspicious or invalid patches can be automatically detected before benchmarking.