Skip to content

Latest commit

 

History

History
382 lines (286 loc) · 9.93 KB

File metadata and controls

382 lines (286 loc) · 9.93 KB

🚀 Connectome Deployment Instructions

Date: 2025-10-25 Status: Ready for deployment API Keys: OpenAI ✅ | NGC ✅ | Anthropic ⚠️ (optional)


Prerequisites Checklist

Before running deployment, verify:

  • SSH access to Connectome server
  • Git repository cloned on Connectome
  • Docker installed with GPU runtime
  • nvidia-smi shows 8x RTX 3090 GPUs
  • GPUs 1, 5, 6 are available (≥20GB free memory each)

Step-by-Step Deployment

1. SSH to Connectome Server

ssh [email protected]
cd /path/to/AI-CoScientist

2. Copy Environment File

# Copy the local .env.local you just created
# Option A: Copy from your local machine
scp .env.local connectome:/path/to/AI-CoScientist/.env.production

# Option B: Create directly on server
cat > .env.production << 'EOF'
# Copy contents from .env.local below
EOF

Contents to copy (.env.local):

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CRITICAL API KEYS
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

NGC_API_KEY=YOUR_NGC_API_KEY_HERE
OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE

# Anthropic (optional - skip for now, has model access issues)
# ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY_HERE

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# GPU CONFIGURATION
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

NEMOTRON_GPU_ID=1
NEMO_EMBEDDER_GPU_ID=5
NEMO_RERANKER_GPU_ID=6

ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO

HYBRID_MODE=true
USE_GPT4_FOR_EVALUATION=true
USE_CLAUDE_FOR_EVALUATION=false  # Disabled until Claude key fixed
USE_NEMOTRON_FOR_SUMMARIZATION=true
USE_NEMOTRON_FOR_EXTRACTION=true

# Ensemble weights (without Claude: 60% GPT-4, 40% Nemotron)
ENSEMBLE_WEIGHT_GPT4=0.60
ENSEMBLE_WEIGHT_CLAUDE=0.0
ENSEMBLE_WEIGHT_NEMOTRON=0.40

NEMOTRON_CONFIDENCE_THRESHOLD=0.75

# OpenAI Configuration
OPENAI_MODEL=gpt-4
OPENAI_TEMPERATURE=0.3
OPENAI_MAX_TOKENS=4096

# Nemotron Configuration
NIM_OPTIMIZATION_PROFILE=throughput
NEMOTRON_BASE_URL=http://nemotron-llm:8000/v1
NEMOTRON_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
NEMOTRON_TEMPERATURE=0.7
NEMOTRON_MAX_TOKENS=2048

NEMO_EMBEDDER_URL=http://nemo-embedder:8000/v1
NEMO_EMBEDDER_MODEL=nvidia/llama-3.2-nv-embedqa-1b-v2
EMBEDDING_DIMENSION=1024

NEMO_RERANKER_URL=http://nemo-reranker:8000/v1
NEMO_RERANKER_MODEL=nvidia/llama-3.2-nv-rerankqa-1b-v2
RERANKER_TOP_K=5

# Database (will be auto-generated by deployment script)
POSTGRES_USER=postgres
POSTGRES_DB=ai_coscientist
POSTGRES_PORT=5432

CHROMADB_HOST=chromadb
CHROMADB_PORT=8000
CHROMA_TELEMETRY=FALSE

REDIS_HOST=redis
REDIS_PORT=6379

# Application
APP_NAME=AI-CoScientist
APP_VERSION=1.0.0
API_PORT=8080

# Performance
UVICORN_WORKERS=4
CELERY_CONCURRENCY=4

# Monitoring
PROMETHEUS_PORT=9090
GRAFANA_USER=admin
GRAFANA_PORT=3000

# Paths
PAPERS_COLLECTION_DIR=./papers_collection
LOGS_DIR=./logs

CORS_ORIGINS=http://localhost,http://127.0.0.1

3. Run Deployment Script

# Make script executable
chmod +x scripts/deploy_to_connectome_hybrid.sh

# Run deployment (takes 10-15 minutes)
./scripts/deploy_to_connectome_hybrid.sh

Expected Output:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AI-CoScientist - Connectome Hybrid Deployment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[1/9] Checking prerequisites...
✓ GPU prerequisites verified (8x RTX 3090 found)
✓ Docker GPU runtime configured
✓ NGC API key configured

[2/9] Generating secure passwords...
✓ PostgreSQL password: ******************
✓ Redis password: ******************
✓ Grafana password: ******************
✓ Secret key: ******************

[3/9] Pulling Docker images...
⏳ Downloading NIM containers (10-15 minutes)...
✓ nemotron-llm:latest pulled (5.2 GB)
✓ nemo-embedder:latest pulled (2.1 GB)
✓ nemo-reranker:latest pulled (2.1 GB)
✓ All images pulled successfully

[4/9] Starting infrastructure services...
✓ postgres (port 5432)
✓ redis (port 6379)
✓ chromadb (port 8003)

[5/9] Starting Nemotron GPU services...
⏳ Loading models (3-5 minutes)...
✓ nemotron-llm on GPU 1 (18GB VRAM)
✓ nemo-embedder on GPU 5 (4GB VRAM)
✓ nemo-reranker on GPU 6 (4GB VRAM)

[6/9] Starting application services...
✓ api (port 8080)
✓ celery-worker
✓ celery-beat

[7/9] Starting monitoring services...
✓ prometheus (port 9090)
✓ grafana (port 3000)

[8/9] Running health checks...
✓ All 10 services healthy (1 skipped: claude)

[9/9] Deployment summary...

Deployment successful! 🎉

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SERVICE URLS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

API Documentation:  http://localhost:8080/docs
API Health:         http://localhost:8080/api/v1/health
Hybrid RAG Status:  http://localhost:8080/api/v1/hybrid-rag/status

Nemotron LLM:       http://localhost:8000/v1/health
NeMo Embedder:      http://localhost:8001/v1/health
NeMo Reranker:      http://localhost:8002/v1/health
ChromaDB:           http://localhost:8003/api/v1/heartbeat

Prometheus:         http://localhost:9090
Grafana:            http://localhost:3000 (admin / <auto-generated-password>)

GPU Monitoring:     nvidia-smi -l 1

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

4. Verify Deployment

# Check all services running
docker-compose -f docker-compose.connectome.yml ps

# Test API health
curl http://localhost:8080/api/v1/health

# Test hybrid RAG status
curl http://localhost:8080/api/v1/hybrid-rag/status

# Monitor GPU usage
nvidia-smi -l 1
# Should show:
# GPU 1: nemotron-llm (~18GB VRAM)
# GPU 5: nemo-embedder (~4GB VRAM)
# GPU 6: nemo-reranker (~4GB VRAM)

5. Run Test Evaluation

# Test hybrid evaluation
curl -X POST http://localhost:8080/api/v1/hybrid-rag/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "paper_text": "Recent advances in deep learning have revolutionized natural language processing. Our novel transformer architecture achieves state-of-the-art results on multiple benchmarks, demonstrating significant improvements in both accuracy and efficiency.",
    "section": "abstract",
    "use_ensemble": true
  }'

# Expected response (~2-3 seconds):
{
  "overall_quality": 8.1,
  "novelty": 7.8,
  "methodology": 8.3,
  "clarity": 8.2,
  "significance": 8.0,
  "feedback": "[gpt4] Strong methodology and clear presentation... [nemotron] Novel architecture with good benchmark results...",
  "provider_scores": {
    "gpt4": {
      "overall_quality": 8.2,
      "confidence": 0.9,
      "latency_ms": 1523
    },
    "nemotron": {
      "overall_quality": 8.0,
      "confidence": 0.75,
      "latency_ms": 234
    }
  },
  "ensemble_confidence": 0.85,
  "total_latency_ms": 1757
}

Troubleshooting

Issue: GPU not available

# Check GPU runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# If fails, install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Issue: Port already in use

# Check what's using the port
sudo lsof -i :8080  # API port
sudo lsof -i :8000  # Nemotron port

# Stop conflicting service or change port in .env.production

Issue: Nemotron service not starting

# Check logs
docker-compose -f docker-compose.connectome.yml logs nemotron-llm

# Verify NGC key
echo $NGC_API_KEY

# Restart service
docker-compose -f docker-compose.connectome.yml restart nemotron-llm

Issue: Out of memory

# Check GPU memory
nvidia-smi

# Verify GPU 1 has ≥20GB free before starting Nemotron
# If not, use different GPU:
NEMOTRON_GPU_ID=7  # Try GPU 7

Post-Deployment

Monitor Services

# Watch GPU usage
watch -n 1 nvidia-smi

# Monitor API logs
docker-compose -f docker-compose.connectome.yml logs -f api

# Check Prometheus metrics
curl http://localhost:8080/metrics

# Access Grafana
# http://localhost:3000
# Login: admin / <password from .env.production>

Backup Database

# Create backup
docker-compose -f docker-compose.connectome.yml exec postgres \
  pg_dump -U postgres ai_coscientist > backup_$(date +%Y%m%d).sql

# Backup ChromaDB
tar -czf chromadb_backup_$(date +%Y%m%d).tar.gz chromadb_data/

Security Reminders

⚠️ AFTER DEPLOYMENT, ROTATE ALL API KEYS:

  1. OpenAI: https://platform.openai.com/api-keys
  2. NGC: https://org.ngc.nvidia.com/setup/api-key
  3. Anthropic (when ready): https://console.anthropic.com/settings/keys

Update .env.production with new keys and restart services:

docker-compose -f docker-compose.connectome.yml restart api celery-worker

Next Steps

  1. ✅ Deployment complete
  2. 📊 Run test evaluations
  3. 🔍 Monitor GPU utilization
  4. 📈 Check Grafana dashboards
  5. 🔄 Set up automated backups
  6. 🔐 Rotate API keys

Support: See claudedocs/NEMOTRON_HYBRID_GUIDE.md for detailed documentation

Issues: Report at https://github.com/Transconnectome/AI-CoScientist/issues