Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Build and Push Docker Image

on:
push:
branches:
- master
- dockerise
tags:
- 'v*'
pull_request:
branches:
- master

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build:
runs-on: ubuntu-latest

permissions:
contents: read
packages: write

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels)
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha
type=raw,value=latest,enable={{is_default_branch}}

- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Image digest
run: echo "Image pushed successfully to ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}"
18 changes: 16 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,13 +1,27 @@
FROM python:3.10
FROM python:3.11-slim

WORKDIR /app

ENV PYTHONBUFFERED 1
ENV PYTHONUNBUFFERED=1

# Install system dependencies for SSL certificates
RUN apt-get update && \
apt-get install -y --no-install-recommends ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Create symlink for RedHat-style certificate path (required by librdkafka OIDC)
# librdkafka's OIDC token retrieval looks for /etc/pki/tls/certs/ca-bundle.crt
RUN mkdir -p /etc/pki/tls/certs && \
ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

RUN pip install --upgrade pip
COPY ./requirements.txt /app
RUN pip install -r requirements.txt

COPY . /app

# Install the gwtm_cron package
RUN pip install -e .

CMD ["python", "src/gwtm_cron/gwtm_listener/listener.py"]
269 changes: 261 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,273 @@
# gwtm_cron
Gravitational Wave Treasure Map cron functions
* GCN Listener
* Others coming soon

### Build/Deploy with docker
```bash
Gravitational Wave Treasure Map cron functions - a Kafka-based listener system for processing:
* LIGO/Virgo/KAGRA gravitational wave alerts
* IceCube neutrino coincidence notices

**Requirements**: Python 3.11+

## How It Works

The listeners are **real-time streaming processors** that:

1. **Subscribe to NASA GCN Kafka streams**
- LIGO Listener: `igwn.gwalert` topic for gravitational wave detections
- IceCube Listener: `gcn.notices.icecube.lvk_nu_track_search` for neutrino coincidences

2. **Process each alert** as it arrives (typically within seconds of detection):
- Parse alert JSON and extract metadata (event ID, classification, instruments)
- Decode and analyze probability skymaps (FITS format)
- Calculate sky localization statistics (90%/50% credible areas, average position)
- Generate derived products:
- Sky contours (GeoJSON for visualization)
- MOC (Multi-Order Coverage) files
- Satellite visibility maps (Fermi, LAT)
- Query galaxy catalogs to identify potential host galaxies

3. **Store products** to cloud storage (S3/Azure/OpenStack Swift):
- Raw alert JSON
- Processed FITS skymaps
- Visualization-ready contours and maps
- One event can produce 5-10 files as it evolves (Early Warning → Preliminary → Update)

4. **POST to GWTM API** for public consumption by astronomers worldwide

**Important**: Listeners only process **new alerts** that arrive after they start. Historical alerts are not backfilled. If you start with empty storage, it will remain empty until the next gravitational wave or neutrino detection is announced.

## Quick Start

### Docker Compose (Local Development)

```bash
# Build and run both listeners
docker compose up

docker build -t gwtm_cron .
# Run specific listener
docker compose up ligo-listener
docker compose up icecube-listener
```

docker tag gwtm_cron:latest 929887798640.dkr.ecr.us-east-2.amazonaws.com/gwtm_cron_listener:latest
### Manual Docker Build

```bash
docker build -t gwtm_cron .
docker tag gwtm_cron:latest 929887798640.dkr.ecr.us-east-2.amazonaws.com/gwtm_cron_listener:latest
./ecrlogin.sh

docker push 929887798640.dkr.ecr.us-east-2.amazonaws.com/gwtm_cron_listener:latest
```

**Note**: GitHub Actions automatically builds and pushes to ECR on commits to master.

## Environment Variables

The following environment variables are required for Docker/Kubernetes/Helm deployments:

### Required Variables

#### Kafka Configuration
- `KAFKA_CLIENT_ID` - GCN Kafka client ID for authentication
- `KAFKA_CLIENT_SECRET` - GCN Kafka client secret

#### GWTM API Configuration
- `API_TOKEN` - Authentication token for GWTM API
- `API_BASE` - Base URL for GWTM API (e.g., `https://treasuremap.space/api/v0/`)

#### Cloud Storage Configuration

**Option 1: AWS S3**
- `AWS_ACCESS_KEY_ID` - AWS access key
- `AWS_SECRET_ACCESS_KEY` - AWS secret key
- `AWS_DEFAULT_REGION` - AWS region (default: `us-east-2`)
- `AWS_BUCKET` - S3 bucket name (default: `gwtreasuremap`)
- `STORAGE_BUCKET_SOURCE=s3` - Set to `s3` for AWS storage

**Option 2: Azure Blob Storage**
- `AZURE_ACCOUNT_NAME` - Azure storage account name
- `AZURE_ACCOUNT_KEY` - Azure storage account key
- `STORAGE_BUCKET_SOURCE=abfs` - Set to `abfs` for Azure storage

**Option 3: OpenStack Swift**
- `OS_AUTH_URL` - OpenStack authentication endpoint (e.g., `https://openstack.example.com:5000/v3`)
- `OS_USERNAME` - OpenStack username
- `OS_PASSWORD` - OpenStack password
- `OS_PROJECT_NAME` - OpenStack project/tenant name
- `OS_USER_DOMAIN_NAME` - User domain name (default: `Default`)
- `OS_PROJECT_DOMAIN_NAME` - Project domain name (default: `Default`)
- `OS_CONTAINER_NAME` - Swift container name (default: `gwtreasuremap`)
- `STORAGE_BUCKET_SOURCE=swift` - Set to `swift` for OpenStack storage

### Optional Variables

- `OBSERVING_RUN` - Observing run identifier (default: `O4`)
- `PATH_TO_GALAXY_CATALOG_CONFIG` - Path to galaxy catalog config file (only needed for LIGO listener if generating galaxy lists)

### Listener Control Variables

- `DRY_RUN` - Set to `true` or `1` to run in dry-run mode (no API calls, no storage writes)
- `WRITE_TO_STORAGE` - Set to `false` or `0` to disable storage writes (default: `true`)
- `VERBOSE` - Set to `false` or `0` to disable verbose logging (default: `true`)

**Note**: Storage type (S3, Azure, Swift) is controlled by `STORAGE_BUCKET_SOURCE`, not by `WRITE_TO_STORAGE`.

### Logging Configuration

- `LOG_FORMAT` - Set to `json` to enable structured JSON logging for Kubernetes (default: print statements)
- `LOG_LEVEL` - Set log level: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL` (default: `INFO`)

### Example Kubernetes/Helm Values

**AWS S3 Example:**
```yaml
env:
- name: KAFKA_CLIENT_ID
valueFrom:
secretKeyRef:
name: gwtm-secrets
key: kafka-client-id
- name: KAFKA_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: gwtm-secrets
key: kafka-client-secret
- name: API_TOKEN
valueFrom:
secretKeyRef:
name: gwtm-secrets
key: api-token
- name: API_BASE
value: "https://treasuremap.space/api/v0/"
- name: STORAGE_BUCKET_SOURCE
value: "s3"
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-credentials
key: secret-access-key
- name: AWS_DEFAULT_REGION
value: "us-east-2"
- name: AWS_BUCKET
value: "gwtreasuremap"
- name: OBSERVING_RUN
value: "O4"
```

**OpenStack Swift Example:**
```yaml
env:
- name: KAFKA_CLIENT_ID
valueFrom:
secretKeyRef:
name: gwtm-secrets
key: kafka-client-id
- name: KAFKA_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: gwtm-secrets
key: kafka-client-secret
- name: API_TOKEN
valueFrom:
secretKeyRef:
name: gwtm-secrets
key: api-token
- name: API_BASE
value: "https://treasuremap.space/api/v0/"
- name: STORAGE_BUCKET_SOURCE
value: "swift"
- name: OS_AUTH_URL
value: "https://openstack.example.com:5000/v3"
- name: OS_USERNAME
valueFrom:
secretKeyRef:
name: openstack-credentials
key: username
- name: OS_PASSWORD
valueFrom:
secretKeyRef:
name: openstack-credentials
key: password
- name: OS_PROJECT_NAME
value: "gwtm-project"
- name: OS_USER_DOMAIN_NAME
value: "Default"
- name: OS_PROJECT_DOMAIN_NAME
value: "Default"
- name: OS_CONTAINER_NAME
value: "gwtreasuremap"
- name: OBSERVING_RUN
value: "O4"
```

## Deployment Architecture

The system runs two independent listener processes:

1. **LIGO Listener** (`docker/run_ligo_listener.py`)
- Subscribes to `igwn.gwalert` Kafka topic
- Processes gravitational wave alerts
- Generates skymaps, contours, and galaxy lists
- Posts to GWTM API

2. **IceCube Listener** (`docker/run_icecube_listener.py`)
- Subscribes to `gcn.notices.icecube.lvk_nu_track_search` Kafka topic
- Processes neutrino coincidence notices
- Posts to GWTM API

Both listeners run continuously and process alerts in real-time as they arrive on the Kafka stream.

## Data Migration

When migrating between storage backends (e.g., moving from AWS to OpenStack):

```bash
# Dry run to see what would be migrated (includes size estimation)
python scripts/migrate_storage.py --source s3 --dest swift --container fit --dry-run

# Actual migration with progress tracking
python scripts/migrate_storage.py --source s3 --dest swift --container fit

# Migrate test data
python scripts/migrate_storage.py --source s3 --dest swift --container test
```

**Migration Features**:
- **Size Estimation**: Calculates total data size before transfer (samples files if >100)
- **Time Estimation**: Shows ETA and transfer rate during migration
- **Progress Tracking**: Real-time progress with percentage complete
- **Transfer Statistics**: Reports total size, time, and average transfer rate
- **Error Handling**: Continues on errors and reports failed files at end
- **Dry-run Mode**: Preview migration without transferring data

**Example Output**:
```
Scanning source (s3)...
Found 1523 files. Calculating total size...
Total size: ~4.23 GB (1523 files)
--------------------------------------------------------------------------------

[1/1523 - 0.1%] fit/S230518h-Preliminary.fits.gz
Size: 2.45 MB, Time: 1.2s, Rate: 2.04 MB/s, ETA: 30.5m
[2/1523 - 0.1%] fit/S230518h-contours-smooth.json (145.23 KB) ✓
...
[1523/1523 - 100.0%] test/MS181101ab-retraction.json (8.12 KB) ✓

================================================================================
Migration complete!
Successful: 1523/1523
Total size: 4.23 GB
Total time: 28.3m
Avg rate: 2.56 MB/s
```

## Testing

```bash
# Run local ingestion tests with sample alerts
python tests/listener_tests/test_local_ingest.py
python tests/icecube_tests/test_local_ingest.py
```

Loading