Skip to content

Feat: RTF + mTAN encoder integration: feature preparation, inference, and embedding storage#484

Open
KeshavMajithia wants to merge 19 commits into
boom-astro:mainfrom
KeshavMajithia:feat/rtf-mtan-encoders
Open

Feat: RTF + mTAN encoder integration: feature preparation, inference, and embedding storage#484
KeshavMajithia wants to merge 19 commits into
boom-astro:mainfrom
KeshavMajithia:feat/rtf-mtan-encoders

Conversation

@KeshavMajithia

Copy link
Copy Markdown

Adds RTF and mTAN encoder inference to the ZTF enrichment pipeline. When a ZTF alert is processed, BOOM now computes a 128D RTF embedding and a 2D mTAN embedding alongside the existing ACAI/BTSBot classifications, and stores them in the alert document under an embeddings field.

Changes

src/enrichment/models/rtf.rsprepare_features() constructs the (1, 257, 37) photometry tensor, (1, 257) padding mask, and (1, 3, 63, 63) CHW cutout tensor from raw alert data. The 37 channels match the training pipeline: log time deltas, logflux, band one-hots, and 30 alert metadata keys.

src/enrichment/models/mtan.rsprepare_features() filters to g/r bands, merges nearby observations, normalizes magnitudes and time to [0,1], and pads to 200 steps. pool_embedding() mean-pools qz0_mean across 50 query times to produce the final 2D vector. Skips alerts with fewer than 3 observations.

src/enrichment/ztf.rs — Adds ZtfAlertEmbeddings struct, compute_embeddings() method, and integrates it into process_alerts(). Both models fail gracefully (log a warning and store None for that embedding).

k8s/08-boom-scheduler-ztf.yaml — Adds an initContainer that downloads all 8 ONNX model files (5 ACAI, BTSBot, RTF, mTAN) from boom-astro/boomEncoders into an emptyDir volume mounted at /app/data/models.

MongoDB schema after this change

{
  "classifications": { "acai_h": 0.95, ... },
  "embeddings": {
    "rtf": [0.123, -0.456, ...],   // 128 floats
    "mtan": [0.789, -0.012]        // 2 floats
  }
}

- rtf.rs: prepare_features() builds (1,257,37) photometry tensor,
  (1,257) pad mask, and (1,3,63,63) CHW cutout from ZTF alert data.
  30 metadata channels match ALERT_META_KEYS from training pipeline.

- mtan.rs: prepare_features() filters g/r photometry, merges nearby
  obs, normalizes magnitudes and time, pads to 200 steps. Adds
  pool_embedding() to mean-pool qz0_mean into 2D vector.

- ztf.rs: ZtfAlertEmbeddings struct, compute_embeddings() method,
  integrated into process_alerts() to store embeddings in MongoDB
  alongside existing classifications.
Adds an initContainer to the ZTF scheduler k8s deployment that pulls
rtf_embed.onnx and mtan_embed.onnx from boom-astro/boomEncoders into
an emptyDir volume mounted at /app/data/models. The main container
reads these model files at startup to initialize ONNX inference.
Copilot AI review requested due to automatic review settings June 3, 2026 18:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces RTF and mTAN encoder embeddings into the ZTF enrichment pipeline, adjusts Kafka topic handling to avoid an async deletion race, and makes select filter/schema endpoints publicly accessible.

Changes:

  • Add RTF + mTAN ONNX encoder model support and persist per-alert embeddings during enrichment.
  • Avoid Kafka topic deletion to prevent “ghost topic” races during initialization.
  • Expose filter test and schema-related endpoints without authentication and add a ZTF scheduler deployment that downloads ONNX models at startup.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/kafka/base.rs Skips topic deletion to avoid Kafka async delete race during initialization.
src/enrichment/ztf.rs Adds embeddings schema + computes/persists RTF/mTAN embeddings per alert.
src/enrichment/models/rtf.rs New RTF encoder wrapper and feature preparation for ONNX inference.
src/enrichment/models/mtan.rs New mTAN encoder wrapper, feature preparation, and pooling logic.
src/enrichment/models/mod.rs Registers new models and adds them to SharedModels.
src/api/routes/filters.rs Removes explicit auth requirement for filter test endpoints.
src/api/auth.rs Adds filter test and schema routes to the public allowlist.
k8s/08-boom-scheduler-ztf.yaml Adds a scheduler deployment that downloads ONNX models via initContainer.
Dockerfile Switches Kafka download URL to Apache archive.
.github/workflows/build-fork.yaml Adds workflow to build/push image for specific branches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/enrichment/ztf.rs Outdated
Comment on lines +926 to +929
results.push(Some(ZtfAlertEmbeddings {
rtf: rtf_embedding,
mtan: mtan_embedding,
}));
Comment thread src/enrichment/ztf.rs Outdated
// RTF embedding
let rtf_embedding = match RtfModel::prepare_features(&item.alert, &item.cutouts) {
Ok((x, pad_mask, images)) => {
match models.rtf_embed.lock().unwrap().embed(&x, &pad_mask, &images) {
Comment thread src/enrichment/ztf.rs Outdated
Comment on lines +910 to +914
match models
.mtan_embed
.lock()
.unwrap()
.embed_raw(&x, &time_steps, &query_times)
Comment thread src/kafka/base.rs
Comment on lines +262 to +268
// The topic and data directory are inconsistent.
// NOTE: We intentionally skip delete_topic here because Kafka
// deletes topics asynchronously, causing a race condition where
// initialize_topic later sees a ghost topic with 0 partitions.
// Instead, we let initialize_topic handle it after downloading.
// warn!("recreating topic {}", topic_name);
// delete_topic(&self.server_url(), &topic_name).await?;
Comment thread src/api/routes/filters.rs
Comment on lines 853 to 856
pub async fn post_filter_test(
db: web::Data<Database>,
body: web::Json<FilterTestRequest>,
current_user: Option<web::ReqData<User>>,
) -> HttpResponse {
# ACAI classifiers (~35 KB each)
for variant in acai_h acai_n acai_v acai_o acai_b; do
echo "Downloading ${variant}.d1_dnn_20201130.onnx ..."
curl -fsSL "${HF_BASE}/${variant}.d1_dnn_20201130.onnx" -o "/models/${variant}.d1_dnn_20201130.onnx"

# BTSBot classifier (~900 KB)
echo "Downloading btsbot-v1.0.1.onnx ..."
curl -fsSL "${HF_BASE}/btsbot-v1.0.1.onnx" -o /models/btsbot-v1.0.1.onnx

# RTF encoder (8.7 MB)
echo "Downloading rtf_embed.onnx ..."
curl -fsSL "${HF_BASE}/rtf_embed.onnx" -o /models/rtf_embed.onnx

# mTAN encoder (383 KB)
echo "Downloading mtan_embed.onnx ..."
curl -fsSL "${HF_BASE}/mtan_embed.onnx" -o /models/mtan_embed.onnx
Comment on lines +164 to +170
let mut j = i;
while j < points.len() && (points[j].jd - t).abs() <= MERGE_TOL_DAYS {
let band = points[j].band_idx;
mp.mag[band] = points[j].mag as f32;
mp.mask[band] = 1.0;
j += 1;
}
…dings

- compute_embeddings: replace .lock().unwrap() with match on lock()
  to gracefully handle poisoned mutex instead of panicking the worker
- compute_embeddings: return None when both RTF and mTAN fail, so
  MongoDB doesn't get { rtf: null, mtan: null } written needlessly
- mtan.rs: add comment explaining last-write-wins merge matches the
  Python training pipeline for numerical consistency
- k8s: remove stale NOTE about uploading ACAI/BTSBot (already done)
@petebachant

Copy link
Copy Markdown
Collaborator

I'm curious about the inclusion of a Kubernetes manifest here. AFAIK, Kubernetes is not used at Caltech or UMN. So far, models have been saved in this repo using Git LFS and baked into the Docker images.

@mcoughlin

Copy link
Copy Markdown
Collaborator

@petebachant @antoine-le-calloch I do worry about the difference in deployment methods and how we can effectively develop / add unit tests for some of these forthcoming features. k8s is how we will deploy testing instances at NRP, so we need to support that too.

@petebachant

Copy link
Copy Markdown
Collaborator

@petebachant @antoine-le-calloch I do worry about the difference in deployment methods and how we can effectively develop / add unit tests for some of these forthcoming features. k8s is how we will deploy testing instances at NRP, so we need to support that too.

Can we write up the use case, i.e., who will be running it, what they'll be testing, etc.? We could convert from Docker Compose to Helm, but if it's just to test ML models or something, the solution would look different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants