Refit for embedded use as well#103
Conversation
🤖 GPT-5.4 Code ReviewNEEDS CHANGES Real issues1. Data race on callback registration/use
static ApiEventCallback g_api_event_cb = nullptr;
static void* g_api_event_cb_data = nullptr;
...These are read from app threads ( This matters even if README says “register before OnLoad”; the API itself exposes mutable global state and 2. Inconsistent shutdown behavior when only API callback is setIn if (!trace_db::get_kernel_event_callback()) {
trace_db::get_trace_db().flush();
trace_db::get_trace_db().close();
}and in if (!trace_db::get_kernel_event_callback()) {
(void)trace_db::get_trace_db();
}But HIP API interception uses
That contradicts the README claim that “No SQLite file is created” when callbacks are set. The condition should consider either callback, not just kernel callback. 3. Callback lifetime contract is unsafe as implementedBoth callback record structs carry raw
Those pointers are valid only for the duration of the callback invocation. The README does not state this explicitly, and embedders may reasonably retain the record or enqueue it asynchronously, causing use-after-scope bugs. Not necessarily a bug in current internal use if callbacks consume synchronously, but the API contract is unsafe/underspecified and easy to misuse. Either:
Other checksDependency violationsNo new references to PerformanceNo obvious new hot-path allocation regressions beyond existing behavior. Callback path avoids SQLite, which is good. SecurityNo new format-string issue seen. RecommendationFix the callback synchronization and the SQLite gating logic before merging. Also tighten/document the callback string lifetime contract. Model: gpt-5.4 |
🤖 GPT-5.4 Code ReviewNEEDS CHANGES Real issues1. Data race on callback pointers/user-data
static ApiEventCallback g_api_event_cb = nullptr;
static void* g_api_event_cb_data = nullptr;
static KernelEventCallback g_kernel_event_cb = nullptr;
static void* g_kernel_event_cb_data = nullptr;These are read from app threads and the completion worker without synchronization. Even if the intended contract is “set before OnLoad”, the code also exposes Why it matters: unsynchronized cross-thread reads of non-atomic globals are UB in C++. Fix: use 2.
|
🤖 GPT-5.4 Code ReviewNEEDS CHANGES Real issues1. Data race on callback pointers/user_data
static ApiEventCallback g_api_event_cb = nullptr;
static void* g_api_event_cb_data = nullptr;and reads them from multiple threads/hot paths ( The comment says registration happens before Fix: store callback + user_data atomically as one immutable struct pointer, or use atomics for both with acquire/release semantics and document that changing after startup is unsupported. 2. Exceptions escaping callbacks can terminate/interfere with runtime threadsBoth callback paths invoke embedder code directly:
If a C++ callback throws through these interception frames, behavior is bad at best and can terminate the process. The header comment says “must be noexcept”, but this is not enforced. Fix: wrap callback invocation in 3.
|
Test report — PR #103 (refit for embedded use)Verdict: ✅ ready to merge. Overhead-neutral vs main, new callback API works as documented, all CPU tests pass. Tested on Build
CPU tests
GPU smoke (50× 1024×1024 matmul, RTL lite)
Default path: bit-for-bit identical to main. With a callback registered, SQLite write is suppressed and 55 callback invocations fire instead — confirms the "instead of TraceDB" semantics in the header comment. E2E serving overhead — GPT-OSS 120B TP=8ATOM serving + PR vs main (apples-to-apples, both RTL lite):
All deltas inside run-to-run noise on a 4–6 second wall. Refactor is overhead-neutral. ✅ Gap (non-blocking)PR adds three exported C++ symbols ( RTL-vs-baseline aside (not a PR issue)Both RTL phases show ~30% wall-time penalty on this short workload (4s → 5.7s), driven by mean_tpot +47% and p99_itl 3.5× — but median_itl only +1.5%. The gap suggests a tail of high-TPOT events under signal injection. This affects main and PR equally, so it's not a regression for this PR. Worth a separate look at the mean/p99 ITL spread on short-context decode paths. |
🤖 GPT-5.4 Code ReviewNEEDS CHANGES Real issues1. Data race on callback pointers/user_data
The comment says registration happens before Where
Fix 2. Callback exceptions can unwind through C/HSA/HIP interception pathsThe header documents callbacks “must be noexcept”, but the implementation does not enforce containment. If an embedder throws from either callback, that exception can unwind through:
That can terminate the process or corrupt shutdown behavior. Where
Fix 3.
|
|
@mwootton friendly ping — the embedded-tests commit ( Lint (trivial): GPT-5.4 review (NEEDS CHANGES) — three real correctness items on the new callback API:
Once those are in, I'll rerun the full recipe (~30 min on mi355-gpu-15) against the new head and update the verdict. Build + GPU 8× tests are already passing, so I expect the fixes to be ~50 LOC and the re-test to come back green. |
Changes to allow the core tracing to be embedded into an application.
Based on registration, defer to callback functions to handle record output rather than trace_db.