Scope: DoWhiz_service only.
Status legend:
AUTO: runs in CI/local without external paid services (may still require local toolchain)LIVE: requires real external credentials/services/accountsMANUAL: script-assisted or exploratory verificationPLANNED: coverage gap to implement later
- For every
DoWhiz_servicecode change, run all relevant AUTO suites below. - For
LIVE/MANUAL/PLANNED, reportSKIPwith reason unless explicitly executed. - If env/infra prevents a relevant AUTO suite from running, report
SKIPwith blocker details. - Runtime env policy in tests follows production behavior:
- runtime
.envuses unprefixed keys DEPLOY_TARGETis policy metadata, not shell remapping
- runtime
| Test ID | Command | Scope | When Required |
|---|---|---|---|
| AUTO-RUN-01 | cargo test -p run_task_module |
run_task core/unit/integration coverage | Any run_task_module change |
| AUTO-MAIL-01 | cargo test -p send_emails_module |
Postmark payload construction + module behavior | Any send_emails_module change |
| AUTO-SCH-01 | cargo test -p scheduler_module --test scheduler_basic |
scheduler core lifecycle via Mongo-backed scheduler store | Any scheduler logic change (MONGODB_URI required; mark SKIP with blocker details when unavailable) |
| AUTO-SCH-02 | cargo test -p scheduler_module --test scheduler_agent_e2e |
scheduler + run_task integration path | scheduler/run_task orchestration changes |
| AUTO-SCH-03 | cargo test -p scheduler_module --test email_html_e2e |
inbound email HTML handling | email ingress/sanitization changes |
| AUTO-SCH-04 | cargo test -p scheduler_module --test email_html_e2e_2 |
advanced HTML/body fallback behavior | email ingress/sanitization changes |
| AUTO-SCH-05 | cargo test -p scheduler_module --test github_env_e2e |
GitHub/x402 env propagation | env injection / github/x402 changes |
| AUTO-SCH-06 | cargo test -p scheduler_module --test memory_e2e |
workspace memory sync | memory sync changes |
| AUTO-SCH-07 | cargo test -p scheduler_module --test secrets_e2e |
per-user secrets sync | secret sync changes |
| AUTO-SCH-08 | cargo test -p scheduler_module --test scheduler_followups |
scheduled follow-up persistence | follow-up/scheduler action changes |
| AUTO-SCH-09 | cargo test -p scheduler_module --test scheduler_concurrency |
scheduler concurrency behavior | concurrency/throughput changes |
| AUTO-SCH-10 | cargo test -p scheduler_module --test send_reply_outbound_e2e |
multi-channel outbound adapters (mocked) | outbound adapter changes |
| AUTO-SCH-11 | cargo test -p scheduler_module --test scheduler_retry_notifications_e2e |
retry + notification behavior, including transient Codex retry alerts | retry/notification changes (MONGODB_URI required) |
| AUTO-SCH-12 | cargo test -p scheduler_module --test scheduler_retry_notifications_slack_e2e |
Slack retry notifications | slack retry changes (MONGODB_URI required) |
| AUTO-SCH-13 | cargo test -p scheduler_module --test scheduler_x402_env_e2e |
scheduler x402 env bridge | x402/env bridge changes |
| AUTO-SCH-14 | cargo test -p scheduler_module --test thread_latest_epoch_e2e |
stale-thread cancellation, latest-epoch rule | thread state / email race changes |
| AUTO-SCH-15 | cargo test -p scheduler_module |
broad scheduler_module sweep (includes unit + integration with env-gated skips) | major scheduler/gateway refactors |
| Test ID | Command / Script | Scope | Required Env |
|---|---|---|---|
| LIVE-SCH-01 | cargo test -p scheduler_module --test service_real_email -- --nocapture |
real inbound/outbound email flow | RUST_SERVICE_LIVE_TEST=1 + Postmark + public hook URL (POSTMARK_TEST_HOOK_URL/POSTMARK_INBOUND_HOOK_URL; ngrok only for local tunneling) |
| LIVE-SCH-02 | cargo test -p scheduler_module --test google_docs_cli_e2e -- --nocapture |
real Google Docs CLI behavior | Google OAuth creds + target docs |
| LIVE-SCH-03 | cargo test -p scheduler_module --test unified_memo_e2e -- --ignored --nocapture |
unified memo/account/blob flow | service URL + supabase + azure blob creds + test account |
| LIVE-SCH-04 | cargo test -p scheduler_module --test billing_e2e -- --nocapture |
billing/account db logic vs real DB | SUPABASE_DB_URL + test account data |
| LIVE-SCH-05 | cargo test -p scheduler_module --test email_verification_e2e -- --nocapture |
email verification token flows | SUPABASE_DB_URL + test account data |
| LIVE-MAIL-01 | POSTMARK_LIVE_TEST=1 cargo test -p send_emails_module -- --nocapture |
real Postmark delivery tests | Postmark credentials |
| Test ID | Script / Action | Scope |
|---|---|---|
| MAN-OPS-01 | DoWhiz_service/scripts/test_auth_api.sh |
/auth/* endpoint roundtrip |
| MAN-OPS-02 | DoWhiz_service/scripts/test_auth_link_only.sh |
link/verify flow without full account deletion |
| MAN-OPS-03 | DoWhiz_service/scripts/test_blob_store.sh |
Azure Blob upload/download/list roundtrip |
| MAN-BB-01 | Send a task to [email protected] that opens /service/browserbase-handoff-demo?run=<unique-id>, stops at the blocked page, requests HAG help, then resume after the human completes the same-tab demo |
Deterministic Browserbase same-tab handoff validation on staging |
| MAN-BB-02 | Send a task to [email protected] that signs into Google as [email protected], lets the agent use GOOGLE_PASSWORD if available, and uses HAG for any remaining 2FA/device/CAPTCHA blocker |
Real Browserbase + HAG login handoff validation on staging |
| MAN-GWS-01 | DoWhiz_service/scheduler_module/tests/google_workspace_cli_test.sh |
Google Workspace CLI smoke test |
| MAN-GWS-02 | DoWhiz_service/scheduler_module/tests/google_workspace_e2e_test.sh |
Google Workspace comment workflow smoke |
| MAN-DRIFT-01 | cargo test -p run_task_module --test drift_audit -- --ignored --nocapture |
detection-only audit for prompt/skills/CLI contract drift; expected to fail when mismatches are present |
For MAN-BB-01 and MAN-BB-02, capture all of the following in the verification notes:
- The HAG help email showing the top live-browser button/link.
- The live handoff page in the blocked state the agent originally hit.
- The same live page after the human completes the unblock step.
- The final DoWhiz reply showing that the agent resumed and completed the task.
Additional expectations:
- For the demo route, the page must stay in one tab and persist state by
runvia browser localStorage. - For the real Google flow, prefer the staging admin mailbox
[email protected]and allow HAG only after the site is explicitly waiting for the human step. - If a live-browser button is present, use that path first rather than replying with raw codes unless the page itself requires it.
| Gap ID | Priority | Gap |
|---|---|---|
| GAP-01 | P0 | Explicit stress test for ingestion queue multi-worker claim race |
| GAP-02 | P0 | Deterministic coverage for Azure ACI end-to-end lifecycle (create/run/cleanup) under CI-like conditions |
| GAP-03 | P1 | Automated failure-injection tests for outbound Slack/Discord/SMS API 5xx retry mapping |
| GAP-04 | P1 | Supabase raw payload backend upload/download integration tests (currently mostly env-dependent path checks) |
| GAP-05 | P2 | DST/timezone boundary cron behavior regression suite |
Use this table in verification summaries when requested:
| Test ID | Status (PASS/FAIL/SKIP) | Evidence (short) | Notes / Reason |
|---|---|---|---|
| AUTO-... |
Rules:
- Include every relevant AUTO suite touched by the change.
- Mark each LIVE/MANUAL/PLANNED row as
SKIPwith reason unless executed.