Feat: RTF + mTAN encoder integration: feature preparation, inference, and embedding storage#484
Feat: RTF + mTAN encoder integration: feature preparation, inference, and embedding storage#484KeshavMajithia wants to merge 19 commits into
Conversation
- rtf.rs: prepare_features() builds (1,257,37) photometry tensor, (1,257) pad mask, and (1,3,63,63) CHW cutout from ZTF alert data. 30 metadata channels match ALERT_META_KEYS from training pipeline. - mtan.rs: prepare_features() filters g/r photometry, merges nearby obs, normalizes magnitudes and time, pads to 200 steps. Adds pool_embedding() to mean-pool qz0_mean into 2D vector. - ztf.rs: ZtfAlertEmbeddings struct, compute_embeddings() method, integrated into process_alerts() to store embeddings in MongoDB alongside existing classifications.
Adds an initContainer to the ZTF scheduler k8s deployment that pulls rtf_embed.onnx and mtan_embed.onnx from boom-astro/boomEncoders into an emptyDir volume mounted at /app/data/models. The main container reads these model files at startup to initialize ONNX inference.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces RTF and mTAN encoder embeddings into the ZTF enrichment pipeline, adjusts Kafka topic handling to avoid an async deletion race, and makes select filter/schema endpoints publicly accessible.
Changes:
- Add RTF + mTAN ONNX encoder model support and persist per-alert embeddings during enrichment.
- Avoid Kafka topic deletion to prevent “ghost topic” races during initialization.
- Expose filter test and schema-related endpoints without authentication and add a ZTF scheduler deployment that downloads ONNX models at startup.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| src/kafka/base.rs | Skips topic deletion to avoid Kafka async delete race during initialization. |
| src/enrichment/ztf.rs | Adds embeddings schema + computes/persists RTF/mTAN embeddings per alert. |
| src/enrichment/models/rtf.rs | New RTF encoder wrapper and feature preparation for ONNX inference. |
| src/enrichment/models/mtan.rs | New mTAN encoder wrapper, feature preparation, and pooling logic. |
| src/enrichment/models/mod.rs | Registers new models and adds them to SharedModels. |
| src/api/routes/filters.rs | Removes explicit auth requirement for filter test endpoints. |
| src/api/auth.rs | Adds filter test and schema routes to the public allowlist. |
| k8s/08-boom-scheduler-ztf.yaml | Adds a scheduler deployment that downloads ONNX models via initContainer. |
| Dockerfile | Switches Kafka download URL to Apache archive. |
| .github/workflows/build-fork.yaml | Adds workflow to build/push image for specific branches. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| results.push(Some(ZtfAlertEmbeddings { | ||
| rtf: rtf_embedding, | ||
| mtan: mtan_embedding, | ||
| })); |
| // RTF embedding | ||
| let rtf_embedding = match RtfModel::prepare_features(&item.alert, &item.cutouts) { | ||
| Ok((x, pad_mask, images)) => { | ||
| match models.rtf_embed.lock().unwrap().embed(&x, &pad_mask, &images) { |
| match models | ||
| .mtan_embed | ||
| .lock() | ||
| .unwrap() | ||
| .embed_raw(&x, &time_steps, &query_times) |
| // The topic and data directory are inconsistent. | ||
| // NOTE: We intentionally skip delete_topic here because Kafka | ||
| // deletes topics asynchronously, causing a race condition where | ||
| // initialize_topic later sees a ghost topic with 0 partitions. | ||
| // Instead, we let initialize_topic handle it after downloading. | ||
| // warn!("recreating topic {}", topic_name); | ||
| // delete_topic(&self.server_url(), &topic_name).await?; |
| pub async fn post_filter_test( | ||
| db: web::Data<Database>, | ||
| body: web::Json<FilterTestRequest>, | ||
| current_user: Option<web::ReqData<User>>, | ||
| ) -> HttpResponse { |
| # ACAI classifiers (~35 KB each) | ||
| for variant in acai_h acai_n acai_v acai_o acai_b; do | ||
| echo "Downloading ${variant}.d1_dnn_20201130.onnx ..." | ||
| curl -fsSL "${HF_BASE}/${variant}.d1_dnn_20201130.onnx" -o "/models/${variant}.d1_dnn_20201130.onnx" |
|
|
||
| # BTSBot classifier (~900 KB) | ||
| echo "Downloading btsbot-v1.0.1.onnx ..." | ||
| curl -fsSL "${HF_BASE}/btsbot-v1.0.1.onnx" -o /models/btsbot-v1.0.1.onnx |
|
|
||
| # RTF encoder (8.7 MB) | ||
| echo "Downloading rtf_embed.onnx ..." | ||
| curl -fsSL "${HF_BASE}/rtf_embed.onnx" -o /models/rtf_embed.onnx |
|
|
||
| # mTAN encoder (383 KB) | ||
| echo "Downloading mtan_embed.onnx ..." | ||
| curl -fsSL "${HF_BASE}/mtan_embed.onnx" -o /models/mtan_embed.onnx |
| let mut j = i; | ||
| while j < points.len() && (points[j].jd - t).abs() <= MERGE_TOL_DAYS { | ||
| let band = points[j].band_idx; | ||
| mp.mag[band] = points[j].mag as f32; | ||
| mp.mask[band] = 1.0; | ||
| j += 1; | ||
| } |
# Conflicts: # src/enrichment/ztf.rs
…dings
- compute_embeddings: replace .lock().unwrap() with match on lock()
to gracefully handle poisoned mutex instead of panicking the worker
- compute_embeddings: return None when both RTF and mTAN fail, so
MongoDB doesn't get { rtf: null, mtan: null } written needlessly
- mtan.rs: add comment explaining last-write-wins merge matches the
Python training pipeline for numerical consistency
- k8s: remove stale NOTE about uploading ACAI/BTSBot (already done)
|
I'm curious about the inclusion of a Kubernetes manifest here. AFAIK, Kubernetes is not used at Caltech or UMN. So far, models have been saved in this repo using Git LFS and baked into the Docker images. |
|
@petebachant @antoine-le-calloch I do worry about the difference in deployment methods and how we can effectively develop / add unit tests for some of these forthcoming features. k8s is how we will deploy testing instances at NRP, so we need to support that too. |
Can we write up the use case, i.e., who will be running it, what they'll be testing, etc.? We could convert from Docker Compose to Helm, but if it's just to test ML models or something, the solution would look different. |
Adds RTF and mTAN encoder inference to the ZTF enrichment pipeline. When a ZTF alert is processed, BOOM now computes a 128D RTF embedding and a 2D mTAN embedding alongside the existing ACAI/BTSBot classifications, and stores them in the alert document under an
embeddingsfield.Changes
src/enrichment/models/rtf.rs—prepare_features()constructs the (1, 257, 37) photometry tensor, (1, 257) padding mask, and (1, 3, 63, 63) CHW cutout tensor from raw alert data. The 37 channels match the training pipeline: log time deltas, logflux, band one-hots, and 30 alert metadata keys.src/enrichment/models/mtan.rs—prepare_features()filters to g/r bands, merges nearby observations, normalizes magnitudes and time to [0,1], and pads to 200 steps.pool_embedding()mean-pools qz0_mean across 50 query times to produce the final 2D vector. Skips alerts with fewer than 3 observations.src/enrichment/ztf.rs— AddsZtfAlertEmbeddingsstruct,compute_embeddings()method, and integrates it intoprocess_alerts(). Both models fail gracefully (log a warning and storeNonefor that embedding).k8s/08-boom-scheduler-ztf.yaml— Adds an initContainer that downloads all 8 ONNX model files (5 ACAI, BTSBot, RTF, mTAN) from boom-astro/boomEncoders into an emptyDir volume mounted at/app/data/models.MongoDB schema after this change
{ "classifications": { "acai_h": 0.95, ... }, "embeddings": { "rtf": [0.123, -0.456, ...], // 128 floats "mtan": [0.789, -0.012] // 2 floats } }