SLEAP-specific deployment of LabLink infrastructure to AWS for cloud-based pose estimation
Deploy SLEAP LabLink infrastructure for cloud-based VM allocation and management. This repository uses Terraform and GitHub Actions to automate deployment of the LabLink allocator service to AWS for SLEAP pose estimation workflows.
📖 Main Documentation: https://lablink.talmolab.org 🚀 Deployment Guide: See DEPLOYMENT.md for step-by-step deployment instructions 📋 Deployment Checklist: See DEPLOYMENT_CHECKLIST.md for pre-deployment verification 📊 AWS Resources: See AWS_RESOURCES.md for EIPs, AMIs, and DNS details
SLEAP LabLink automates deployment and management of cloud-based VMs for SLEAP pose estimation. It provides:
- Web interface for requesting and managing SLEAP VMs
- Automatic VM provisioning with SLEAP and GPU drivers pre-installed
- GPU support for ML/AI workloads (g4dn.xlarge instances)
- Chrome Remote Desktop access to VM GUI
- Tutorial data pre-loaded from sleap-tutorial-data repository
Full deployment instructions: See DEPLOYMENT.md
Environment | Purpose | URL | Deploy Method |
---|---|---|---|
Dev | Local development | http://<IP>:5000 |
Local Terraform |
Test | Staging | http://test.lablink.sleap.ai |
GitHub Actions |
Prod | Production | https://lablink.sleap.ai |
GitHub Actions |
1. Copy test configuration:
cd lablink-infrastructure
cp config/config-test.yaml config/config.yaml
git add config/config.yaml
git commit -m "Configure for test deployment"
git push
2. Run GitHub Actions:
- Go to Actions → Deploy LabLink Infrastructure
- Click Run workflow
- Select environment:
test
- Click Run workflow
3. Access after deployment:
- URL:
http://test.lablink.sleap.ai
- Admin:
http://test.lablink.sleap.ai/admin
(username:admin
) - SSH: Download
lablink-key-test.pem
from workflow artifacts
1. Copy production configuration:
cd lablink-infrastructure
cp config/config-prod.yaml config/config.yaml
git add config/config.yaml
git commit -m "Configure for production deployment"
git push
2. Run GitHub Actions:
- Go to Actions → Deploy LabLink Infrastructure
- Click Run workflow
- Select environment:
prod
- Click Run workflow
3. Access after deployment:
- URL:
https://lablink.sleap.ai
(HTTPS with SSL) - Admin:
https://lablink.sleap.ai/admin
(username:admin
) - SSH: Download
lablink-key-prod.pem
from workflow artifacts
For detailed instructions including Dev (local) deployment, see DEPLOYMENT.md
-
AWS Account with permissions to create:
- EC2 instances
- Security Groups
- Elastic IPs
- (Optional) Route 53 records for DNS
-
GitHub Account with ability to:
- Create repositories from templates
- Configure GitHub Actions secrets
- Run GitHub Actions workflows
-
Basic Knowledge of:
- Terraform (helpful but not required)
- AWS services
Before deploying, you must set up:
- S3 Bucket for Terraform state storage
- IAM Role for GitHub Actions OIDC authentication
- (Optional) Elastic IP for persistent allocator address
- (Optional) Route 53 Hosted Zone for custom domain
See AWS Setup Guide below for detailed instructions.
Create an IAM role with OIDC provider for GitHub Actions:
-
Create OIDC provider in IAM (if not exists):
- Provider URL:
https://token.actions.githubusercontent.com
- Audience:
sts.amazonaws.com
- Provider URL:
-
Create IAM role with trust policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "token.actions.githubusercontent.com:sub": "repo:YOUR_ORG/YOUR_REPO:*" } } } ] }
-
Attach permissions:
PowerUserAccess
(or custom policy with EC2, VPC, S3, Route53, IAM permissions)
-
Copy the Role ARN and add to GitHub secrets
The AWS region where your infrastructure will be deployed. Must match the region in your config.yaml
.
Common regions:
us-west-2
(Oregon)us-east-1
(N. Virginia)eu-west-1
(Ireland)
Important: AMI IDs are region-specific. If you change regions, update the ami_id
in config.yaml
.
Password for accessing the allocator web interface. Choose a strong password (12+ characters, mixed case, numbers, symbols).
This password is used to log in to the admin dashboard where you can:
- Create and destroy client VMs
- View VM status
- Assign VMs to users
Password for the PostgreSQL database used by the allocator service. Choose a different strong password than ADMIN_PASSWORD
.
This is stored securely and injected into the configuration at deployment time.
# Create bucket (must be globally unique)
aws s3 mb s3://tf-state-YOUR-ORG-lablink --region us-west-2
# Enable versioning (recommended)
aws s3api put-bucket-versioning \
--bucket tf-state-YOUR-ORG-lablink \
--versioning-configuration Status=Enabled
Update bucket_name
in lablink-infrastructure/config/config.yaml
to match.
For persistent allocator IP address across deployments:
# Allocate EIP
aws ec2 allocate-address --domain vpc --region us-west-2
# Tag it for reuse
aws ec2 create-tags \
--resources eipalloc-XXXXXXXX \
--tags Key=Name,Value=lablink-eip
Update eip.tag_name
in config.yaml
if using a different tag name.
If using a custom domain:
-
Create or use existing hosted zone:
aws route53 create-hosted-zone --name your-domain.com --caller-reference $(date +%s)
-
Update your domain's nameservers to point to Route 53 NS records
-
Update
dns
section inconfig.yaml
:dns: enabled: true domain: "your-domain.com" zone_id: "Z..." # Optional - will auto-lookup if empty
See GitHub Secrets Setup above for detailed IAM role configuration.
All configuration is in lablink-infrastructure/config/config.yaml
.
db:
dbname: "lablink_db"
user: "lablink"
password: "PLACEHOLDER_DB_PASSWORD" # Injected from GitHub secret
host: "localhost"
port: 5432
machine:
machine_type: "g4dn.xlarge" # AWS instance type
image: "ghcr.io/talmolab/lablink-client-base-image:latest" # Docker image
ami_id: "ami-0601752c11b394251" # Region-specific AMI
repository: "https://github.com/YOUR_ORG/YOUR_REPO.git" # Your code/data repo
software: "your-software" # Software identifier
extension: "ext" # Data file extension
Instance Types:
g4dn.xlarge
- GPU instance (NVIDIA T4, good for ML)t3.large
- CPU-only, cheaperp3.2xlarge
- More powerful GPU (NVIDIA V100)
AMI IDs (Custom Ubuntu 24.04 - see AWS_RESOURCES.md):
- Client VM (GPU):
ami-0601752c11b394251
(Docker + Nvidia GPU drivers) - Allocator VM:
ami-0bd08c9d4aa9f0bc6
(Docker only) - Region: us-west-2 only (custom AMIs maintained by SLEAP team)
app:
admin_user: "admin"
admin_password: "PLACEHOLDER_ADMIN_PASSWORD" # Injected from secret
region: "us-west-2" # Must match AWS_REGION secret
dns:
enabled: false # true to use DNS, false for IP-only
terraform_managed: false # true = Terraform creates records
domain: "lablink.example.com"
zone_id: "" # Leave empty for auto-lookup
app_name: "lablink"
pattern: "auto" # "auto" or "custom"
DNS Patterns:
auto
: Creates{env}.{app_name}.{domain}
(e.g.,test.lablink.example.com
)custom
: Usescustom_subdomain
value
ssl:
provider: "none" # "letsencrypt", "cloudflare", or "none"
email: "[email protected]" # For Let's Encrypt notifications
staging: true # true = staging certs, false = production certs
SSL Providers:
none
: HTTP only (for testing)letsencrypt
: Automatic SSL with Caddycloudflare
: Use CloudFlare proxy for SSL
eip:
strategy: "persistent" # "persistent" or "dynamic"
tag_name: "lablink-eip" # Tag to find reusable EIP
Deploys or updates your LabLink infrastructure.
Triggers:
- Manual: Actions → "Deploy LabLink Infrastructure" → Run workflow
- Automatic: Push to
test
branch
Inputs:
environment
:test
orprod
image_tag
: (Optional) Specific Docker image tag for prod
What it does:
- Configures AWS credentials via OIDC
- Injects passwords from GitHub secrets into config
- Runs Terraform to create/update infrastructure
- Verifies deployment and DNS
- Uploads SSH key as artifact
Triggers:
- Manual only: Actions → "Destroy LabLink Infrastructure" → Run workflow
Inputs:
confirm_destroy
: Must type "yes" to confirmenvironment
:test
orprod
Tests that client VMs can be provisioned correctly.
Triggers:
- Manual only
-
Update
config.yaml
:machine: repository: "https://github.com/your-org/your-software-data.git" software: "your-software-name" extension: "your-file-ext" # e.g., "h5", "npy", "csv"
-
(Optional) Use custom Docker image:
machine: image: "ghcr.io/your-org/your-custom-image:latest"
-
Update
config.yaml
:app: region: "eu-west-1" # Your region machine: ami_id: "ami-XXXXXXX" # Region-specific AMI
-
Update GitHub secret
AWS_REGION
-
Find appropriate AMI for region (Ubuntu 24.04 with Docker)
machine:
machine_type: "t3.xlarge" # No GPU, cheaper
# or
machine_type: "p3.2xlarge" # More powerful GPU
See AWS EC2 Instance Types for options.
Cause: AMI ID doesn't exist in your region
Solution: Update ami_id
in config.yaml
with a region-appropriate AMI
Cause: Security group or DNS not configured
Solution:
- Check security group allows inbound traffic on port 5000
- If using DNS, verify DNS records propagated
- Try accessing via public IP first
Cause: Previous deployment didn't complete or cleanup
Solution:
# In lablink-infrastructure/
terraform force-unlock LOCK_ID
Cause: DNS propagation delay or Route 53 not configured
Solution:
- Wait 5-10 minutes for propagation
- Verify Route 53 hosted zone exists
- Check nameservers match at domain registrar
- Use
nslookup your-domain.com
to test
- Main Documentation: https://talmolab.github.io/lablink/
- Infrastructure Docs: lablink-infrastructure/README.md
- GitHub Issues: https://github.com/talmolab/lablink/issues
- Deployment Checklist: DEPLOYMENT_CHECKLIST.md
lablink-template/
├── .github/workflows/ # GitHub Actions workflows
│ ├── terraform-deploy.yml # Deploy infrastructure
│ ├── terraform-destroy.yml # Destroy infrastructure
│ └── client-vm-infrastructure-test.yml
├── lablink-infrastructure/ # Terraform infrastructure
│ ├── config/
│ │ ├── config.yaml # Main configuration
│ │ └── example.config.yaml # Configuration reference
│ ├── main.tf # Core Terraform config
│ ├── backend.tf # Terraform backend
│ ├── backend-*.hcl # Environment-specific backends
│ ├── terraform.tfvars # Terraform variables
│ ├── user_data.sh # EC2 initialization script
│ ├── verify-deployment.sh # Deployment verification
│ └── README.md # Infrastructure documentation
├── README.md # This file
├── DEPLOYMENT_CHECKLIST.md # Pre-deployment checklist
└── LICENSE
Found an issue with the template or want to suggest improvements?
- Open an issue: https://github.com/talmolab/lablink-template/issues
- For LabLink core issues: https://github.com/talmolab/lablink/issues
BSD 2-Clause License - see LICENSE file for details.
- Main LabLink Repository: https://github.com/talmolab/lablink
- Documentation: https://talmolab.github.io/lablink/
- Template Repository: https://github.com/talmolab/lablink-template
- Example Deployment: https://github.com/talmolab/sleap-lablink (SLEAP-specific configuration)
Need Help? Check the Deployment Checklist or Troubleshooting section above.