Skip to content

BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization#476

Open
Theodlz wants to merge 4 commits into
mainfrom
btsbot-v2.0.0
Open

BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization#476
Theodlz wants to merge 4 commits into
mainfrom
btsbot-v2.0.0

Conversation

@Theodlz

@Theodlz Theodlz commented May 15, 2026

Copy link
Copy Markdown
Collaborator

While the new model is much larger in parameter size (8.8M vs 230k for the old model), performance is improved and VRAM usage on the GPU is a lot smaller (only my system, old model uses around 6GB at batch size = 1024, new one uses around 2GB!!! A huge improvement. Also runtime is better, I see 30 ms instead of 45 ms.

Now here's the issue. BTSbot wants the images in (N,3,63,63) format (NCHW). But acai (and old btsbot) expect (N,63,63,3) format (NWHC). So it's a bit of a pain, because we now need to have 2 sets of tensor images! But then, while I was looking at the ONNX graph of ACAI I noticed something I stumbled upon a few years back: ACAI wants (N,63,63,3) images as input but the first ONNX operator is a transpose (0,3,1,2) that converts it to (N,3,63,63), exactly the input format of btsbot v2! Not just that, but that seemingly unnecessary transpose means cuda has to use twice the memory it needs to the images! That means we do useless compute, and use more VRAM.

So I got claude's help to edit the ONNX files of ACAI to drop the transpose and simply take (N,3,63,63) images as input directly. VRAM usage on a batch size of 1024 drops from 3GB to 2GB. However, latency is worse somehow!!! Model takes 20ms instead of 15ms. Since we have 5 ACAI models that accumulates, but the 15ms shaved of from BTSbot compensate a bit. To me, that tradeoff is definitely acceptable given the VRAM savings.

So all in all, with new btsbost and the "nchw" acai to match its input format and lower VRAM, running a 1024 batch size with all the models sequentially uses 7.9GB of VRAM on my system, when previously it needed so much that it went OOM (I believe Sushant said it needed something like 16GB, which if true means a x2 improvement). Throughput might be slightly lower due to the - curious - increased runtime of the modified acai models, but I'd still call that a win, VRAM is our ZTF limiting factor, not throughput.

Notes:

  • I wrote scripts to validate that ACAI outputs (old vs new no-transpose model) were identical, and on our ZTF night used for throughput testing, they are literally bit-wise identical!!! So, no worries there.
  • BTSbot v2 outputs logits instead of a probability. I asked Nabeel and as I suspected we just need to pipe that through a sigmoid, which is what this PR adds too.
  • Even though we need less VRAM than before, I did not lower the 10 GiB requirement we are enforcing at scheduler startup. Clearly it was under-estimated as some edge cases (an enrichment worker processing 1k alerts at once) would go OOM, and now that the max VRAM usage is 8GB, it's not a bad idea to ask for a safety margin and stay at 10GB minimum. Besides, that gives us the room we need for the VRAM budget of the parametric lightcurve fitting :)

TODOs:

  • Validate the btsbot 2.0.0 outputs. I haven't compare it to btsbot v1, would be good to run this over a recent night of ZTF or something like that, and compare with what the v1 gives us (could check against prod).

…er performance and uses a lot less vram at runtime
Copilot AI review requested due to automatic review settings May 15, 2026 13:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates BTSBot from v1.0.1 to v2.0.0 for ZTF enrichment and GPU validation.

Changes:

  • Replaces BTSBot model paths in shared model loading and GPU smoke validation.
  • Adds the new BTSBot v2.0.0 ONNX LFS pointer and removes v1.0.1.
  • Removes the old BTSBot model copy step from the GPU Dockerfile.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/enrichment/models/mod.rs Loads BTSBot v2.0.0 for CPU and GPU model pools.
src/bin/scheduler.rs Uses BTSBot v2.0.0 in GPU inference validation.
Dockerfile.gpu Removes old BTSBot v1.0.1 model copy.
data/models/btsbot-v2.0.0.onnx Adds the new ONNX model via Git LFS pointer.
data/models/btsbot-v1.0.1.onnx Removes the old ONNX model pointer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bin/scheduler.rs
…simply takes the post-transpose image format as input (which happens to match btsbot v2, so we don't hold 2 copies of it in memory and lower the vram usage a lot, even though inference seems slower somehow)
@Theodlz Theodlz changed the title BTSbot update: v1.0.1 -> v2.0.0 BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization May 15, 2026
@github-actions

Copy link
Copy Markdown

Throughput results (f7bdd6efcedd3fbd1eb118555734aa209883b618):

New wall time Baseline wall time Difference
234.0 245.1 -4.00%

@Theodlz Theodlz changed the title BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization WIP: BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization May 15, 2026
@github-actions

Copy link
Copy Markdown

Throughput results (864f85d355a2ad6d67d2d9ba764abf5383858651):

Storage New wall time Baseline wall time Difference
mongo 234.0 245.2 -4.00%
s3 262.8 294.1 -10.00%

@Theodlz Theodlz changed the title WIP: BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization BTSbot update: v1.0.1 -> v2.0.0; acai vram optimization May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

@github-actions

Copy link
Copy Markdown

Throughput results (c5515f32f7298902b2d7e001addbabdc2775e175):

Storage New wall time Baseline wall time Difference
mongo 233.9 248.8 -5.00%
s3 272.6 268.9 1.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants