TheTreasureMap · sfoale · Oct 14, 2025 · Oct 14, 2025 · Oct 14, 2025 · Oct 17, 2025
diff --git a/.github/workflows/docker-build.yml b/.github/workflows/docker-build.yml
@@ -0,0 +1,64 @@
+name: Build and Push Docker Image
+
+on:
+  push:
+    branches:
+      - master
+      - dockerise
+    tags:
+      - 'v*'
+  pull_request:
+    branches:
+      - master
+
+env:
+  REGISTRY: ghcr.io
+  IMAGE_NAME: ${{ github.repository }}
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    permissions:
+      contents: read
+      packages: write
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Log in to GitHub Container Registry
+        uses: docker/login-action@v3
+        with:
+          registry: ${{ env.REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Extract metadata (tags, labels)
+        id: meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
+          tags: |
+            type=ref,event=branch
+            type=ref,event=pr
+            type=semver,pattern={{version}}
+            type=semver,pattern={{major}}.{{minor}}
+            type=sha
+            type=raw,value=latest,enable={{is_default_branch}}
+
+      - name: Build and push Docker image
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+
+      - name: Image digest
+        run: echo "Image pushed successfully to ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}"
diff --git a/Dockerfile b/Dockerfile
@@ -1,13 +1,27 @@
-FROM python:3.10
+FROM python:3.11-slim
 
 WORKDIR /app
 
-ENV PYTHONBUFFERED 1
+ENV PYTHONUNBUFFERED=1
+
+# Install system dependencies for SSL certificates
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends ca-certificates && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Create symlink for RedHat-style certificate path (required by librdkafka OIDC)
+# librdkafka's OIDC token retrieval looks for /etc/pki/tls/certs/ca-bundle.crt
+RUN mkdir -p /etc/pki/tls/certs && \
+    ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt
 
 RUN pip install --upgrade pip
 COPY ./requirements.txt /app
 RUN pip install -r requirements.txt
 
 COPY . /app
 
+# Install the gwtm_cron package
+RUN pip install -e .
+
 CMD ["python", "src/gwtm_cron/gwtm_listener/listener.py"]
diff --git a/README.md b/README.md
@@ -1,20 +1,273 @@
 # gwtm_cron
-Gravitational Wave Treasure Map cron functions
-* GCN Listener
-* Others coming soon
 
-### Build/Deploy with docker
-```bash
+Gravitational Wave Treasure Map cron functions - a Kafka-based listener system for processing:
+* LIGO/Virgo/KAGRA gravitational wave alerts
+* IceCube neutrino coincidence notices
+
+**Requirements**: Python 3.11+
+
+## How It Works
+
+The listeners are **real-time streaming processors** that:
+
+1. **Subscribe to NASA GCN Kafka streams**
+   - LIGO Listener: `igwn.gwalert` topic for gravitational wave detections
+   - IceCube Listener: `gcn.notices.icecube.lvk_nu_track_search` for neutrino coincidences
+
+2. **Process each alert** as it arrives (typically within seconds of detection):
+   - Parse alert JSON and extract metadata (event ID, classification, instruments)
+   - Decode and analyze probability skymaps (FITS format)
+   - Calculate sky localization statistics (90%/50% credible areas, average position)
+   - Generate derived products:
+     - Sky contours (GeoJSON for visualization)
+     - MOC (Multi-Order Coverage) files
+     - Satellite visibility maps (Fermi, LAT)
+   - Query galaxy catalogs to identify potential host galaxies
+
+3. **Store products** to cloud storage (S3/Azure/OpenStack Swift):
+   - Raw alert JSON
+   - Processed FITS skymaps
+   - Visualization-ready contours and maps
+   - One event can produce 5-10 files as it evolves (Early Warning → Preliminary → Update)
+
+4. **POST to GWTM API** for public consumption by astronomers worldwide
 
+**Important**: Listeners only process **new alerts** that arrive after they start. Historical alerts are not backfilled. If you start with empty storage, it will remain empty until the next gravitational wave or neutrino detection is announced.
+
+## Quick Start
+
+### Docker Compose (Local Development)
+
+```bash
+# Build and run both listeners
 docker compose up
 
-docker build -t gwtm_cron .
+# Run specific listener
+docker compose up ligo-listener
+docker compose up icecube-listener
+```
 
-docker tag gwtm_cron:latest 929887798640.dkr.ecr.us-east-2.amazonaws.com/gwtm_cron_listener:latest
+### Manual Docker Build
 
+```bash
+docker build -t gwtm_cron .
+docker tag gwtm_cron:latest 929887798640.dkr.ecr.us-east-2.amazonaws.com/gwtm_cron_listener:latest
 ./ecrlogin.sh
-
 docker push 929887798640.dkr.ecr.us-east-2.amazonaws.com/gwtm_cron_listener:latest
 ```
 
+**Note**: GitHub Actions automatically builds and pushes to ECR on commits to master.
+
+## Environment Variables
+
+The following environment variables are required for Docker/Kubernetes/Helm deployments:
+
+### Required Variables
+
+#### Kafka Configuration
+- `KAFKA_CLIENT_ID` - GCN Kafka client ID for authentication
+- `KAFKA_CLIENT_SECRET` - GCN Kafka client secret
+
+#### GWTM API Configuration
+- `API_TOKEN` - Authentication token for GWTM API
+- `API_BASE` - Base URL for GWTM API (e.g., `https://treasuremap.space/api/v0/`)
+
+#### Cloud Storage Configuration
+
+**Option 1: AWS S3**
+- `AWS_ACCESS_KEY_ID` - AWS access key
+- `AWS_SECRET_ACCESS_KEY` - AWS secret key
+- `AWS_DEFAULT_REGION` - AWS region (default: `us-east-2`)
+- `AWS_BUCKET` - S3 bucket name (default: `gwtreasuremap`)
+- `STORAGE_BUCKET_SOURCE=s3` - Set to `s3` for AWS storage
+
+**Option 2: Azure Blob Storage**
+- `AZURE_ACCOUNT_NAME` - Azure storage account name
+- `AZURE_ACCOUNT_KEY` - Azure storage account key
+- `STORAGE_BUCKET_SOURCE=abfs` - Set to `abfs` for Azure storage
+
+**Option 3: OpenStack Swift**
+- `OS_AUTH_URL` - OpenStack authentication endpoint (e.g., `https://openstack.example.com:5000/v3`)
+- `OS_USERNAME` - OpenStack username
+- `OS_PASSWORD` - OpenStack password
+- `OS_PROJECT_NAME` - OpenStack project/tenant name
+- `OS_USER_DOMAIN_NAME` - User domain name (default: `Default`)
+- `OS_PROJECT_DOMAIN_NAME` - Project domain name (default: `Default`)
+- `OS_CONTAINER_NAME` - Swift container name (default: `gwtreasuremap`)
+- `STORAGE_BUCKET_SOURCE=swift` - Set to `swift` for OpenStack storage
+
+### Optional Variables
+
+- `OBSERVING_RUN` - Observing run identifier (default: `O4`)
+- `PATH_TO_GALAXY_CATALOG_CONFIG` - Path to galaxy catalog config file (only needed for LIGO listener if generating galaxy lists)
+
+### Listener Control Variables
+
+- `DRY_RUN` - Set to `true` or `1` to run in dry-run mode (no API calls, no storage writes)
+- `WRITE_TO_STORAGE` - Set to `false` or `0` to disable storage writes (default: `true`)
+- `VERBOSE` - Set to `false` or `0` to disable verbose logging (default: `true`)
+
+**Note**: Storage type (S3, Azure, Swift) is controlled by `STORAGE_BUCKET_SOURCE`, not by `WRITE_TO_STORAGE`.
+
+### Logging Configuration
+
+- `LOG_FORMAT` - Set to `json` to enable structured JSON logging for Kubernetes (default: print statements)
+- `LOG_LEVEL` - Set log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL` (default: `INFO`)
+
+### Example Kubernetes/Helm Values
+
+**AWS S3 Example:**
+```yaml
+env:
+  - name: KAFKA_CLIENT_ID
+    valueFrom:
+      secretKeyRef:
+        name: gwtm-secrets
+        key: kafka-client-id
+  - name: KAFKA_CLIENT_SECRET
+    valueFrom:
+      secretKeyRef:
+        name: gwtm-secrets
+        key: kafka-client-secret
+  - name: API_TOKEN
+    valueFrom:
+      secretKeyRef:
+        name: gwtm-secrets
+        key: api-token
+  - name: API_BASE
+    value: "https://treasuremap.space/api/v0/"
+  - name: STORAGE_BUCKET_SOURCE
+    value: "s3"
+  - name: AWS_ACCESS_KEY_ID
+    valueFrom:
+      secretKeyRef:
+        name: aws-credentials
+        key: access-key-id
+  - name: AWS_SECRET_ACCESS_KEY
+    valueFrom:
+      secretKeyRef:
+        name: aws-credentials
+        key: secret-access-key
+  - name: AWS_DEFAULT_REGION
+    value: "us-east-2"
+  - name: AWS_BUCKET
+    value: "gwtreasuremap"
+  - name: OBSERVING_RUN
+    value: "O4"
+```
+
+**OpenStack Swift Example:**
+```yaml
+env:
+  - name: KAFKA_CLIENT_ID
+    valueFrom:
+      secretKeyRef:
+        name: gwtm-secrets
+        key: kafka-client-id
+  - name: KAFKA_CLIENT_SECRET
+    valueFrom:
+      secretKeyRef:
+        name: gwtm-secrets
+        key: kafka-client-secret
+  - name: API_TOKEN
+    valueFrom:
+      secretKeyRef:
+        name: gwtm-secrets
+        key: api-token
+  - name: API_BASE
+    value: "https://treasuremap.space/api/v0/"
+  - name: STORAGE_BUCKET_SOURCE
+    value: "swift"
+  - name: OS_AUTH_URL
+    value: "https://openstack.example.com:5000/v3"
+  - name: OS_USERNAME
+    valueFrom:
+      secretKeyRef:
+        name: openstack-credentials
+        key: username
+  - name: OS_PASSWORD
+    valueFrom:
+      secretKeyRef:
+        name: openstack-credentials
+        key: password
+  - name: OS_PROJECT_NAME
+    value: "gwtm-project"
+  - name: OS_USER_DOMAIN_NAME
+    value: "Default"
+  - name: OS_PROJECT_DOMAIN_NAME
+    value: "Default"
+  - name: OS_CONTAINER_NAME
+    value: "gwtreasuremap"
+  - name: OBSERVING_RUN
+    value: "O4"
+```
+
+## Deployment Architecture
+
+The system runs two independent listener processes:
+
+1. **LIGO Listener** (`docker/run_ligo_listener.py`)
+   - Subscribes to `igwn.gwalert` Kafka topic
+   - Processes gravitational wave alerts
+   - Generates skymaps, contours, and galaxy lists
+   - Posts to GWTM API
+
+2. **IceCube Listener** (`docker/run_icecube_listener.py`)
+   - Subscribes to `gcn.notices.icecube.lvk_nu_track_search` Kafka topic
+   - Processes neutrino coincidence notices
+   - Posts to GWTM API
+
+Both listeners run continuously and process alerts in real-time as they arrive on the Kafka stream.
+
+## Data Migration
+
+When migrating between storage backends (e.g., moving from AWS to OpenStack):
+
+```bash
+# Dry run to see what would be migrated (includes size estimation)
+python scripts/migrate_storage.py --source s3 --dest swift --container fit --dry-run
+
+# Actual migration with progress tracking
+python scripts/migrate_storage.py --source s3 --dest swift --container fit
+
+# Migrate test data
+python scripts/migrate_storage.py --source s3 --dest swift --container test
+```
+
+**Migration Features**:
+- **Size Estimation**: Calculates total data size before transfer (samples files if >100)
+- **Time Estimation**: Shows ETA and transfer rate during migration
+- **Progress Tracking**: Real-time progress with percentage complete
+- **Transfer Statistics**: Reports total size, time, and average transfer rate
+- **Error Handling**: Continues on errors and reports failed files at end
+- **Dry-run Mode**: Preview migration without transferring data
+
+**Example Output**:
+```
+Scanning source (s3)...
+Found 1523 files. Calculating total size...
+Total size: ~4.23 GB (1523 files)
+--------------------------------------------------------------------------------
+
+[1/1523 - 0.1%] fit/S230518h-Preliminary.fits.gz
+  Size: 2.45 MB, Time: 1.2s, Rate: 2.04 MB/s, ETA: 30.5m
+[2/1523 - 0.1%] fit/S230518h-contours-smooth.json (145.23 KB) ✓
+...
+[1523/1523 - 100.0%] test/MS181101ab-retraction.json (8.12 KB) ✓
+
+================================================================================
+Migration complete!
+  Successful: 1523/1523
+  Total size: 4.23 GB
+  Total time: 28.3m
+  Avg rate: 2.56 MB/s
+```
+
+## Testing
+
+```bash
+# Run local ingestion tests with sample alerts
+python tests/listener_tests/test_local_ingest.py
+python tests/icecube_tests/test_local_ingest.py
+```