Merge v4_Researcher → GEAK_v4: Deep Research Agent (DRA) for kernel_workflow by Umangatamd · Pull Request #293 · AMD-AGI/GEAK

Umangatamd · 2026-06-23T06:16:10Z

Promotes the Deep Research Agent (DRA) work from v4_Researcher into GEAK_v4 (via #291).

What DRA does

An opt-in Research phase in kernel_workflow (gated by dra_enabled, default off). After profiling, DRA:

extracts kernel facts + ranked bottleneck/design hypotheses,
generates ranked research questions (bottleneck + design-space / alternative-implementation),
researches them in parallel via native WebSearch/WebFetch,
synthesizes a compact, ranked portfolio of optimization directions → deep_search_brief.md (full evidence in deep_search.md).

The brief is handed to the TechLead planner as advisory suggestions: the optimizer does its own profile/code analysis first, then decides which (if any) to adopt.

Results (A/B: no-DRA vs DRA, budget=3)

Kernel	no-DRA	DRA	Δ
KNN (HIP)	9.25x	11.70x	+2.45x (+27%)
gemm_a16wfp4 (Triton)	1.539x	1.531x	~tie

On KNN, the DRA gain traces to adopted brief directions — warp-cooperative WarpSelect (wave64), Template<K> scratch-spill elimination into VGPRs, and wrapper/output-layout fixes.

Made with Cursor

Add the `researcher` persona (kernel_workflow/roles/researcher.md) — a v4-native Deep Research Agent mirroring v3's Stage 0-7 pipeline (fact extraction → ranked research questions → per-question native web research → optional blindspot critique → ranked-directions portfolio) with phase contracts for research_plan / research_question / research_blindspot / research_synthesize. Wire a new opt-in phase('Research') into kernel_workflow.js AFTER Profile and BEFORE the optimize loop, gated behind args.dra_enabled (default off → existing runs byte-identical). The phase fans research questions out in PARALLEL via parallel(), wraps every research agent in the agentT() hang-guard so a hung research agent resolves null instead of wedging the round-barrier, and writes deep_search.md / deep_search_brief.md / deep_search.json into EVAL_DIR. Adds RESEARCH_PLAN/QUESTION/BLINDSPOT/RESEARCH schemas and threads the brief path into tech_lead plan_round.

… brief plan_round now Reads EVAL_DIR/deep_search_brief.md (when DEEP_SEARCH_BRIEF is set) and seeds directions[] from the ranked DRA directions, carrying v3's hard-won lessons: DIVERSIFY (spread different ranked directions across parallel engineers, always keep >=1 free explorer slot, never anchor all engineers on one theme); treat HIGH-CEILING rewrites (raw-HIP/load_inline, HIP/CUDA graph capture, algorithmic reformulation) as FIRST-CLASS not secondary; and don't over-prescribe (idea/mechanism only). The brief is a prior, never a cage — profile/per-case data and measurement still rule. No-op when the brief path is empty.

Add WebSearch + WebFetch to interface/run_e2e.py ALLOWED_TOOLS so the Deep Research Agent's per-question research agents can do native web research. Harmless when dra_enabled is off (nothing opts into them).

Document the opt-in Research phase (Stage 0-7 flow, parallel fan-out + hang-guard, brief->plan_round handoff with diversity + de-conservatism), the dra_enabled / dra_max_questions / dra_blindspot / dra_max_blindspots args, the deep_search.* artifacts + research/ trail, the researcher role, and the WebSearch/WebFetch allowlist requirement.

CONCERN 1 (fusion): a single-kernel DRA could overlook fusion entirely. Add a "Fusion & kernel scope" section + a fusion angle to research_plan question generation + a synthesis rule so fusion is never buried: intra-kernel fusion (collapse dispatches / fold epilogue) is surfaced as a first-class executable direction; cross-kernel fusion (merge with an adjacent op) is recorded as an e2e-level ESCALATION in open_measurements (the single-kernel layer can't extract a neighbor against its immutable single-op oracle) rather than lost. The researcher must not propose keeping an op standalone against an upstream fusion. CONCERN 2 (advisory-not-dominant): add an explicit "You are ADVISORY, not the decision-maker" section and reframe the Stage 7 portfolio as suggestions to be vetted against the profile, never mandates.

…nant Rewrite plan_round rule 2b so the TechLead remains THE decision-maker and the DRA brief cannot regress into v3-style anchoring: - brief is ADVISORY/OPTIONAL, not a plan to execute; critically evaluate each Dk against THIS kernel's profile/per-case data and reject/ignore ones that don't fit - the DRA NEVER fills 100% of the round: always generate >=1 of the TechLead's own profile-driven directions, keep >=1 free explorer slot, brief seeds at most BUDGET_REMAINING-1 directions - DIVERSIFY (spread different Dk across engineers, never converge on one theme) - HIGH-CEILING directions first-class WHEN they fit the profile - FUSION: intra-kernel fusion is a normal direction; a cross-kernel-fusion escalation is NOT executable here, leave it as the researcher's note

… first Strengthen the advisory framing so the DRA brief is unambiguously a set of SUGGESTIONS to consider, not directives. plan_round now mandates an explicit order: the TechLead does its OWN independent profile/code analysis and forms its own candidate directions FIRST, then consults the brief and decides by its own judgment which (if any) suggestions to adopt — free to adapt/ignore/reject all (adopting none is valid). researcher.md synthesis tone reworded to "consider/one option is" rather than imperative. Existing diversity + free-explorer + high-ceiling-first + fusion rules preserved. node --check passes.

feat(dra): Deep Research Agent (DRA) for kernel_workflow

zihaoanllm · 2026-06-24T02:47:40Z

Could you also add the runtime overhead introduced by DRA?

In addition, DRA does not appear to provide much benefit for the LLM head kernels based on the current results. Could you include more head-kernel benchmark results to better evaluate its effectiveness in that scenario?

Umangatamd · 2026-06-24T03:17:48Z

Runtime overhead: Right now the DRA research phase adds about ~20% to the run wall-clock — a one-time, opt-in cost before the optimize loop (tunable via the question/blindspot budget).

Head kernels: Agreed it's worth more coverage there. Could you recommend a few head kernels you'd like benchmarked? Happy to run the no-DRA vs DRA A/B on them.

dra and others added 9 commits June 22, 2026 03:25

feat(dra): allowlist WebSearch/WebFetch for the research agents

af54f57

Add WebSearch + WebFetch to interface/run_e2e.py ALLOWED_TOOLS so the Deep Research Agent's per-question research agents can do native web research. Harmless when dra_enabled is off (nothing opts into them).

docs(dra): list the opt-in Research phase in kernel_workflow meta.phases

88944c4

Merge pull request #291 from AMD-AGI/feat/dra-researcher

a59ca84

feat(dra): Deep Research Agent (DRA) for kernel_workflow

Umangatamd requested a review from zihaoanllm June 23, 2026 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge v4_Researcher → GEAK_v4: Deep Research Agent (DRA) for kernel_workflow#293

Merge v4_Researcher → GEAK_v4: Deep Research Agent (DRA) for kernel_workflow#293
Umangatamd wants to merge 9 commits into
GEAK_v4from
v4_Researcher

Umangatamd commented Jun 23, 2026

Uh oh!

zihaoanllm commented Jun 24, 2026

Uh oh!

Umangatamd commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Umangatamd commented Jun 23, 2026

What DRA does

Results (A/B: no-DRA vs DRA, budget=3)

Uh oh!

zihaoanllm commented Jun 24, 2026

Uh oh!

Umangatamd commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants