Date: 2025-10-25
Status: Ready for deployment
API Keys: OpenAI ✅ | NGC ✅ | Anthropic
Before running deployment, verify:
- SSH access to Connectome server
- Git repository cloned on Connectome
- Docker installed with GPU runtime
- nvidia-smi shows 8x RTX 3090 GPUs
- GPUs 1, 5, 6 are available (≥20GB free memory each)
ssh [email protected]
cd /path/to/AI-CoScientist# Copy the local .env.local you just created
# Option A: Copy from your local machine
scp .env.local connectome:/path/to/AI-CoScientist/.env.production
# Option B: Create directly on server
cat > .env.production << 'EOF'
# Copy contents from .env.local below
EOFContents to copy (.env.local):
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CRITICAL API KEYS
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
NGC_API_KEY=YOUR_NGC_API_KEY_HERE
OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE
# Anthropic (optional - skip for now, has model access issues)
# ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY_HERE
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# GPU CONFIGURATION
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
NEMOTRON_GPU_ID=1
NEMO_EMBEDDER_GPU_ID=5
NEMO_RERANKER_GPU_ID=6
ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO
HYBRID_MODE=true
USE_GPT4_FOR_EVALUATION=true
USE_CLAUDE_FOR_EVALUATION=false # Disabled until Claude key fixed
USE_NEMOTRON_FOR_SUMMARIZATION=true
USE_NEMOTRON_FOR_EXTRACTION=true
# Ensemble weights (without Claude: 60% GPT-4, 40% Nemotron)
ENSEMBLE_WEIGHT_GPT4=0.60
ENSEMBLE_WEIGHT_CLAUDE=0.0
ENSEMBLE_WEIGHT_NEMOTRON=0.40
NEMOTRON_CONFIDENCE_THRESHOLD=0.75
# OpenAI Configuration
OPENAI_MODEL=gpt-4
OPENAI_TEMPERATURE=0.3
OPENAI_MAX_TOKENS=4096
# Nemotron Configuration
NIM_OPTIMIZATION_PROFILE=throughput
NEMOTRON_BASE_URL=http://nemotron-llm:8000/v1
NEMOTRON_MODEL=nvidia/nvidia-nemotron-nano-9b-v2
NEMOTRON_TEMPERATURE=0.7
NEMOTRON_MAX_TOKENS=2048
NEMO_EMBEDDER_URL=http://nemo-embedder:8000/v1
NEMO_EMBEDDER_MODEL=nvidia/llama-3.2-nv-embedqa-1b-v2
EMBEDDING_DIMENSION=1024
NEMO_RERANKER_URL=http://nemo-reranker:8000/v1
NEMO_RERANKER_MODEL=nvidia/llama-3.2-nv-rerankqa-1b-v2
RERANKER_TOP_K=5
# Database (will be auto-generated by deployment script)
POSTGRES_USER=postgres
POSTGRES_DB=ai_coscientist
POSTGRES_PORT=5432
CHROMADB_HOST=chromadb
CHROMADB_PORT=8000
CHROMA_TELEMETRY=FALSE
REDIS_HOST=redis
REDIS_PORT=6379
# Application
APP_NAME=AI-CoScientist
APP_VERSION=1.0.0
API_PORT=8080
# Performance
UVICORN_WORKERS=4
CELERY_CONCURRENCY=4
# Monitoring
PROMETHEUS_PORT=9090
GRAFANA_USER=admin
GRAFANA_PORT=3000
# Paths
PAPERS_COLLECTION_DIR=./papers_collection
LOGS_DIR=./logs
CORS_ORIGINS=http://localhost,http://127.0.0.1# Make script executable
chmod +x scripts/deploy_to_connectome_hybrid.sh
# Run deployment (takes 10-15 minutes)
./scripts/deploy_to_connectome_hybrid.shExpected Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AI-CoScientist - Connectome Hybrid Deployment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[1/9] Checking prerequisites...
✓ GPU prerequisites verified (8x RTX 3090 found)
✓ Docker GPU runtime configured
✓ NGC API key configured
[2/9] Generating secure passwords...
✓ PostgreSQL password: ******************
✓ Redis password: ******************
✓ Grafana password: ******************
✓ Secret key: ******************
[3/9] Pulling Docker images...
⏳ Downloading NIM containers (10-15 minutes)...
✓ nemotron-llm:latest pulled (5.2 GB)
✓ nemo-embedder:latest pulled (2.1 GB)
✓ nemo-reranker:latest pulled (2.1 GB)
✓ All images pulled successfully
[4/9] Starting infrastructure services...
✓ postgres (port 5432)
✓ redis (port 6379)
✓ chromadb (port 8003)
[5/9] Starting Nemotron GPU services...
⏳ Loading models (3-5 minutes)...
✓ nemotron-llm on GPU 1 (18GB VRAM)
✓ nemo-embedder on GPU 5 (4GB VRAM)
✓ nemo-reranker on GPU 6 (4GB VRAM)
[6/9] Starting application services...
✓ api (port 8080)
✓ celery-worker
✓ celery-beat
[7/9] Starting monitoring services...
✓ prometheus (port 9090)
✓ grafana (port 3000)
[8/9] Running health checks...
✓ All 10 services healthy (1 skipped: claude)
[9/9] Deployment summary...
Deployment successful! 🎉
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SERVICE URLS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
API Documentation: http://localhost:8080/docs
API Health: http://localhost:8080/api/v1/health
Hybrid RAG Status: http://localhost:8080/api/v1/hybrid-rag/status
Nemotron LLM: http://localhost:8000/v1/health
NeMo Embedder: http://localhost:8001/v1/health
NeMo Reranker: http://localhost:8002/v1/health
ChromaDB: http://localhost:8003/api/v1/heartbeat
Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin / <auto-generated-password>)
GPU Monitoring: nvidia-smi -l 1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Check all services running
docker-compose -f docker-compose.connectome.yml ps
# Test API health
curl http://localhost:8080/api/v1/health
# Test hybrid RAG status
curl http://localhost:8080/api/v1/hybrid-rag/status
# Monitor GPU usage
nvidia-smi -l 1
# Should show:
# GPU 1: nemotron-llm (~18GB VRAM)
# GPU 5: nemo-embedder (~4GB VRAM)
# GPU 6: nemo-reranker (~4GB VRAM)# Test hybrid evaluation
curl -X POST http://localhost:8080/api/v1/hybrid-rag/evaluate \
-H "Content-Type: application/json" \
-d '{
"paper_text": "Recent advances in deep learning have revolutionized natural language processing. Our novel transformer architecture achieves state-of-the-art results on multiple benchmarks, demonstrating significant improvements in both accuracy and efficiency.",
"section": "abstract",
"use_ensemble": true
}'
# Expected response (~2-3 seconds):
{
"overall_quality": 8.1,
"novelty": 7.8,
"methodology": 8.3,
"clarity": 8.2,
"significance": 8.0,
"feedback": "[gpt4] Strong methodology and clear presentation... [nemotron] Novel architecture with good benchmark results...",
"provider_scores": {
"gpt4": {
"overall_quality": 8.2,
"confidence": 0.9,
"latency_ms": 1523
},
"nemotron": {
"overall_quality": 8.0,
"confidence": 0.75,
"latency_ms": 234
}
},
"ensemble_confidence": 0.85,
"total_latency_ms": 1757
}# Check GPU runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# If fails, install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker# Check what's using the port
sudo lsof -i :8080 # API port
sudo lsof -i :8000 # Nemotron port
# Stop conflicting service or change port in .env.production# Check logs
docker-compose -f docker-compose.connectome.yml logs nemotron-llm
# Verify NGC key
echo $NGC_API_KEY
# Restart service
docker-compose -f docker-compose.connectome.yml restart nemotron-llm# Check GPU memory
nvidia-smi
# Verify GPU 1 has ≥20GB free before starting Nemotron
# If not, use different GPU:
NEMOTRON_GPU_ID=7 # Try GPU 7# Watch GPU usage
watch -n 1 nvidia-smi
# Monitor API logs
docker-compose -f docker-compose.connectome.yml logs -f api
# Check Prometheus metrics
curl http://localhost:8080/metrics
# Access Grafana
# http://localhost:3000
# Login: admin / <password from .env.production># Create backup
docker-compose -f docker-compose.connectome.yml exec postgres \
pg_dump -U postgres ai_coscientist > backup_$(date +%Y%m%d).sql
# Backup ChromaDB
tar -czf chromadb_backup_$(date +%Y%m%d).tar.gz chromadb_data/- OpenAI: https://platform.openai.com/api-keys
- NGC: https://org.ngc.nvidia.com/setup/api-key
- Anthropic (when ready): https://console.anthropic.com/settings/keys
Update .env.production with new keys and restart services:
docker-compose -f docker-compose.connectome.yml restart api celery-worker- ✅ Deployment complete
- 📊 Run test evaluations
- 🔍 Monitor GPU utilization
- 📈 Check Grafana dashboards
- 🔄 Set up automated backups
- 🔐 Rotate API keys
Support: See claudedocs/NEMOTRON_HYBRID_GUIDE.md for detailed documentation
Issues: Report at https://github.com/Transconnectome/AI-CoScientist/issues