Aqua tests: agentic support#5820
Conversation
It takes about a minutes to establish an ames flow between ships who had no previous contact. When creating a snapshot, we allow agents both to establish ames flows as well as to process events generated during desk reloading.
The expose agent updates user's citations in the contacts profile on each reload. This causes non-deterministic behavior, where the profile after a restart depends on whether expose was updated. This is especially a problem in aqua tests. We fix this by preventing expose from generating updates when there are no citations.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aeaa1a2a53
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
Fang-
left a comment
There was a problem hiding this comment.
Very cool. Fair improvements to the setup/structure overall.
Haven't reviewed the .md files closely yet, but output looks good. Tests are particularly well-suited for generation because they sort of naturally gravitate towards the more-verbose, linear style of hoon that the models find easy to write.
Some comments and questions below!
- Do not hardcode thread ids in the aqua test runner - Do not hardcode any ships in the `ph-fleet` thread - Minor refactoring
|
Closes TLON-5746 |
|
Changes look good, thank you. CI isn't happy, but the trace ( That may not be the root cause though, we may just be trying to eval bad code. Given it's complaining about |

Summary
This PR significantly reworks the aqua tests tooling to accommodate model-assisted generation of aqua tests.
We implemented a few tools to assist with setting up virtual aqua environment, wrote a human-oriented aqua documentation, which at the same time provides enough context for models.
We tested this setup by having an agent generate two basic aqua tests for the contacts agent.
Changes
-ph-fleetand-ph-test-ls.-ph-fleetaids in management of aqua snapshots, while-ph-test-lscompiles and lists available aqua tests.-ph-testto work with snapshots generated by-ph-fleetHow did I test?
I run the tests.
Risks and impact
Rollback plan
Revert.