Skip to content

Aqua tests: agentic support#5820

Merged
mikolajpp merged 20 commits into
developfrom
mp/aqua-tests
May 18, 2026
Merged

Aqua tests: agentic support#5820
mikolajpp merged 20 commits into
developfrom
mp/aqua-tests

Conversation

@mikolajpp
Copy link
Copy Markdown
Collaborator

@mikolajpp mikolajpp commented May 5, 2026

Summary

This PR significantly reworks the aqua tests tooling to accommodate model-assisted generation of aqua tests.
We implemented a few tools to assist with setting up virtual aqua environment, wrote a human-oriented aqua documentation, which at the same time provides enough context for models.

We tested this setup by having an agent generate two basic aqua tests for the contacts agent.

Changes

  1. Implement two new tools -ph-fleet and -ph-test-ls. -ph-fleet aids in management of aqua snapshots, while -ph-test-ls compiles and lists available aqua tests.
  2. Adjust the aqua test runner -ph-test to work with snapshots generated by -ph-fleet
  3. Introduce aqua documentation.
  4. Adjust the backend test runner to use new tooling.
  5. Change the ci testing fleet to include real notify and bait providers, together with parent galaxies.
  6. Add two model-generated aqua tests for contacts.

How did I test?

I run the tests.

Risks and impact

  • Safe to rollback without consulting PR author? Yes
  • Affects important code area:
    • Onboarding
    • State / providers
    • Message sync
    • Channel display
    • Notifications
    • Other:

Rollback plan

Revert.

@mikolajpp mikolajpp changed the title Aqua tests: agent support Aqua tests: agentic support May 5, 2026
mikolajpp added 9 commits May 6, 2026 14:55
It takes about a minutes to establish an ames flow between
ships who had no previous contact. When creating a snapshot,
we allow agents both to establish ames flows as well as to process
events generated during desk reloading.
The expose agent updates user's citations in the contacts profile
on each reload. This causes non-deterministic behavior, where the
profile after a restart depends on whether expose was updated.
This is especially a problem in aqua tests. We fix this by
preventing expose from generating updates when there are no
citations.
@mikolajpp mikolajpp marked this pull request as ready for review May 13, 2026 06:41
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aeaa1a2a53

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread backend/run-aqua-tests.sh Outdated
Comment thread desk/ted/ph/test-ls.hoon
@blacksmith-sh

This comment has been minimized.

@Fang- Fang- self-requested a review May 13, 2026 09:10
@mikolajpp mikolajpp requested a review from arthyn May 13, 2026 09:26
Copy link
Copy Markdown
Member

@Fang- Fang- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. Fair improvements to the setup/structure overall.

Haven't reviewed the .md files closely yet, but output looks good. Tests are particularly well-suited for generation because they sort of naturally gravitate towards the more-verbose, linear style of hoon that the models find easy to write.

Some comments and questions below!

Comment thread backend/run-aqua-tests.sh Outdated
Comment thread desk/lib/ph/test.hoon Outdated
Comment thread desk/ted/ph/test.hoon Outdated
Comment thread desk/ted/ph/fleet.hoon Outdated
Comment thread desk/ted/ph/fleet.hoon
Comment thread desk/tests/ph/app/contacts.hoon
- Do not hardcode thread ids in the aqua test runner
- Do not hardcode any ships in the `ph-fleet` thread
- Minor refactoring
@mikolajpp
Copy link
Copy Markdown
Collaborator Author

Closes TLON-5746

@Fang-
Copy link
Copy Markdown
Member

Fang- commented May 15, 2026

Changes look good, thank you. CI isn't happy, but the trace (/lib/strandio/hoon:<[740 13].[740 27]>) doesn't make sense for our /desk/lib/strandio.hoon. Looks like it could be the one from urbit/urbit's master branch though. Are we copying these in, in the wrong order?

That may not be the root cause though, we may just be trying to eval bad code. Given it's complaining about @tatid, could be related to the thread id tweaking.

Copy link
Copy Markdown
Member

@Fang- Fang- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All green!

Image

@mikolajpp mikolajpp merged commit 9975d42 into develop May 18, 2026
4 checks passed
@mikolajpp mikolajpp deleted the mp/aqua-tests branch May 18, 2026 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants