Skip to content

Agent Cheat on Kernel Optimization #254

Description

@yueliu14

Describe the bug

During kernel optimization, agents may introduce cheating behavior that produces unrealistic speedups of over 100x. At present, we have to manually inspect generated patches using Cursor or Claude Code to determine whether they are legitimate. To address this, we should add a patch verification step to the save_and_test tool so that suspicious or invalid patches can be automatically detected before benchmarking.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions