BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization by Theodlz · Pull Request #476 · boom-astro/boom

Theodlz · 2026-05-15T13:23:13Z

While the new model is much larger in parameter size (8.8M vs 230k for the old model), performance is improved and VRAM usage on the GPU is a lot smaller (only my system, old model uses around 6GB at batch size = 1024, new one uses around 2GB!!! A huge improvement. Also runtime is better, I see 30 ms instead of 45 ms.

Now here's the issue. BTSbot wants the images in (N,3,63,63) format (NCHW). But acai (and old btsbot) expect (N,63,63,3) format (NWHC). So it's a bit of a pain, because we now need to have 2 sets of tensor images! But then, while I was looking at the ONNX graph of ACAI I noticed something I stumbled upon a few years back: ACAI wants (N,63,63,3) images as input but the first ONNX operator is a transpose (0,3,1,2) that converts it to (N,3,63,63), exactly the input format of btsbot v2! Not just that, but that seemingly unnecessary transpose means cuda has to use twice the memory it needs to the images! That means we do useless compute, and use more VRAM.

So I got claude's help to edit the ONNX files of ACAI to drop the transpose and simply take (N,3,63,63) images as input directly. VRAM usage on a batch size of 1024 drops from 3GB to 2GB. However, latency is worse somehow!!! Model takes 20ms instead of 15ms. Since we have 5 ACAI models that accumulates, but the 15ms shaved of from BTSbot compensate a bit. To me, that tradeoff is definitely acceptable given the VRAM savings.

So all in all, with new btsbost and the "nchw" acai to match its input format and lower VRAM, running a 1024 batch size with all the models sequentially uses 7.9GB of VRAM on my system, when previously it needed so much that it went OOM (I believe Sushant said it needed something like 16GB, which if true means a x2 improvement). Throughput might be slightly lower due to the - curious - increased runtime of the modified acai models, but I'd still call that a win, VRAM is our ZTF limiting factor, not throughput.

Notes:

I wrote scripts to validate that ACAI outputs (old vs new no-transpose model) were identical, and on our ZTF night used for throughput testing, they are literally bit-wise identical!!! So, no worries there.
BTSbot v2 outputs logits instead of a probability. I asked Nabeel and as I suspected we just need to pipe that through a sigmoid, which is what this PR adds too.
Even though we need less VRAM than before, I did not lower the 10 GiB requirement we are enforcing at scheduler startup. Clearly it was under-estimated as some edge cases (an enrichment worker processing 1k alerts at once) would go OOM, and now that the max VRAM usage is 8GB, it's not a bad idea to ask for a safety margin and stay at 10GB minimum. Besides, that gives us the room we need for the VRAM budget of the parametric lightcurve fitting :)

TODOs:

Validate the btsbot 2.0.0 outputs. I haven't compare it to btsbot v1, would be good to run this over a recent night of ZTF or something like that, and compare with what the v1 gives us (could check against prod).

…er performance and uses a lot less vram at runtime

Copilot

Pull request overview

Updates BTSBot from v1.0.1 to v2.0.0 for ZTF enrichment and GPU validation.

Changes:

Replaces BTSBot model paths in shared model loading and GPU smoke validation.
Adds the new BTSBot v2.0.0 ONNX LFS pointer and removes v1.0.1.
Removes the old BTSBot model copy step from the GPU Dockerfile.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/enrichment/models/mod.rs`	Loads BTSBot v2.0.0 for CPU and GPU model pools.
`src/bin/scheduler.rs`	Uses BTSBot v2.0.0 in GPU inference validation.
`Dockerfile.gpu`	Removes old BTSBot v1.0.1 model copy.
`data/models/btsbot-v2.0.0.onnx`	Adds the new ONNX model via Git LFS pointer.
`data/models/btsbot-v1.0.1.onnx`	Removes the old ONNX model pointer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…simply takes the post-transpose image format as input (which happens to match btsbot v2, so we don't hold 2 copies of it in memory and lower the vram usage a lot, even though inference seems slower somehow)

github-actions · 2026-05-15T17:07:16Z

Throughput results (f7bdd6efcedd3fbd1eb118555734aa209883b618):

New wall time	Baseline wall time	Difference
234.0	245.1	-4.00%

github-actions · 2026-05-21T13:19:23Z

Throughput results (864f85d355a2ad6d67d2d9ba764abf5383858651):

Storage	New wall time	Baseline wall time	Difference
mongo	234.0	245.2	-4.00%
s3	262.8	294.1	-10.00%

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

github-actions · 2026-05-22T07:56:20Z

Throughput results (c5515f32f7298902b2d7e001addbabdc2775e175):

Storage	New wall time	Baseline wall time	Difference
mongo	233.9	248.8	-5.00%
s3	272.6	268.9	1.00%

move to btsbot v2.0.0, the model is larger in parameter size but bett…

dc443a8

…er performance and uses a lot less vram at runtime

Copilot AI review requested due to automatic review settings May 15, 2026 13:23

Copilot started reviewing on behalf of Theodlz May 15, 2026 13:23 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread src/bin/scheduler.rs

replace acai onnx models with a version that drops the transpose and …

edc0441

…simply takes the post-transpose image format as input (which happens to match btsbot v2, so we don't hold 2 copies of it in memory and lower the vram usage a lot, even though inference seems slower somehow)

Theodlz changed the title ~~BTSbot update: v1.0.1 -> v2.0.0~~ BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization May 15, 2026

Theodlz changed the title ~~BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization~~ WIP: BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization May 15, 2026

Merge branch 'main' into btsbot-v2.0.0

9483e38

Theodlz changed the title ~~WIP: BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization~~ BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization May 21, 2026

Merge branch 'main' into btsbot-v2.0.0

6cddc2c

Theodlz requested a review from Copilot May 22, 2026 07:41

Copilot started reviewing on behalf of Theodlz May 22, 2026 07:41 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization#476

BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization#476
Theodlz wants to merge 4 commits into
mainfrom
btsbot-v2.0.0

Theodlz commented May 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Theodlz commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Theodlz commented May 15, 2026 •

edited

Loading