|
| 1 | +# JVM Analysis Service |
| 2 | + |
| 3 | +A Spring Boot microservice that provides automated JVM performance analysis using AI-powered recommendations. The service processes alert webhooks, retrieves thread dumps and profiling data, generates flame graphs, and provides intelligent analysis using AWS Bedrock. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Automated JVM Analysis**: Processes performance alerts and generates comprehensive analysis reports |
| 8 | +- **AI-Powered Recommendations**: Uses AWS Bedrock (Claude 3.7 Sonnet) for intelligent performance insights |
| 9 | +- **Flame Graph Generation**: Converts profiling data to interactive HTML flame graphs |
| 10 | +- **S3 Integration**: Stores and retrieves profiling data, thread dumps, and analysis results |
| 11 | +- **Resilient Design**: Built-in retry mechanisms for external service calls |
| 12 | + |
| 13 | +## Architecture |
| 14 | + |
| 15 | +### Components |
| 16 | + |
| 17 | +- **JvmAnalysisController**: REST API endpoint for webhook processing |
| 18 | +- **JvmAnalysisService**: Core business logic orchestrating the analysis workflow |
| 19 | +- **AIRecommendation**: AWS Bedrock integration for AI-powered analysis |
| 20 | +- **FlameGraphConverter**: Converts collapsed profiling data to HTML flame graphs |
| 21 | +- **S3Connector**: Handles all S3 operations for data storage and retrieval |
| 22 | + |
| 23 | +### Workflow |
| 24 | + |
| 25 | +1. Receives alert webhook with pod information |
| 26 | +2. Retrieves thread dump from target pod |
| 27 | +3. Fetches latest profiling data from S3 |
| 28 | +4. Converts profiling data to flame graph |
| 29 | +5. Analyzes performance using AI recommendations |
| 30 | +6. Stores results (thread dump, flame graph, analysis) in S3 |
| 31 | + |
| 32 | +## API Reference |
| 33 | + |
| 34 | +### POST /webhook |
| 35 | + |
| 36 | +Processes performance alert webhooks and triggers JVM analysis. |
| 37 | + |
| 38 | +**Request Body:** |
| 39 | +```json |
| 40 | +{ |
| 41 | + "alerts": [ |
| 42 | + { |
| 43 | + "labels": { |
| 44 | + "pod": "my-app-pod-123", |
| 45 | + "instance": "10.0.1.100:8080" |
| 46 | + } |
| 47 | + } |
| 48 | + ] |
| 49 | +} |
| 50 | +``` |
| 51 | + |
| 52 | +**Response:** |
| 53 | +```json |
| 54 | +{ |
| 55 | + "message": "Processed alerts", |
| 56 | + "count": 1 |
| 57 | +} |
| 58 | +``` |
| 59 | + |
| 60 | +**Status Codes:** |
| 61 | +- `200 OK`: Successfully processed alerts |
| 62 | +- `400 Bad Request`: Invalid request format |
| 63 | +- `500 Internal Server Error`: Processing failed |
| 64 | + |
| 65 | +### Health Endpoints |
| 66 | + |
| 67 | +- `GET /actuator/health`: Application health status |
| 68 | +- `GET /health`: Custom health endpoint for readiness probe |
| 69 | + |
| 70 | +## Configuration |
| 71 | + |
| 72 | +### Environment Variables |
| 73 | + |
| 74 | +| Variable | Description | Default | |
| 75 | +|----------|-------------|---------| |
| 76 | +| `AWS_REGION` | AWS region for services | `us-east-1` | |
| 77 | +| `AWS_S3_BUCKET` | S3 bucket for data storage | `default_bucket_name` | |
| 78 | +| `AWS_S3_PREFIX_ANALYSIS` | S3 prefix for analysis results | `analysis/` | |
| 79 | +| `AWS_S3_PREFIX_PROFILING` | S3 prefix for profiling data | `profiling/` | |
| 80 | +| `AWS_BEDROCK_MODEL_ID` | Bedrock model identifier | `us.anthropic.claude-3-7-sonnet-20250219-v1:0` | |
| 81 | +| `AWS_BEDROCK_MAX_TOKENS` | Maximum tokens for AI analysis | `10000` | |
| 82 | +| `THREADDUMP_URL_TEMPLATE` | Thread dump endpoint template | `http://{podIp}:8080/actuator/threaddump` | |
| 83 | + |
| 84 | +### Application Properties |
| 85 | + |
| 86 | +```properties |
| 87 | +# Resilience4J retry configuration |
| 88 | +resilience4j.retry.instances.threadDump.max-attempts=3 |
| 89 | +resilience4j.retry.instances.threadDump.wait-duration=2s |
| 90 | +resilience4j.retry.instances.threadDump.exponential-backoff-multiplier=2 |
| 91 | +``` |
| 92 | + |
| 93 | +## Prerequisites |
| 94 | + |
| 95 | +- Java 21+ |
| 96 | +- Maven 3.6+ |
| 97 | +- AWS Account with appropriate permissions |
| 98 | +- S3 bucket for data storage |
| 99 | +- AWS Bedrock access (Claude 3.7 Sonnet model) |
| 100 | + |
| 101 | +### Required AWS Permissions |
| 102 | + |
| 103 | +```json |
| 104 | +{ |
| 105 | + "Version": "2012-10-17", |
| 106 | + "Statement": [ |
| 107 | + { |
| 108 | + "Effect": "Allow", |
| 109 | + "Action": [ |
| 110 | + "s3:GetObject", |
| 111 | + "s3:PutObject", |
| 112 | + "s3:ListBucket" |
| 113 | + ], |
| 114 | + "Resource": [ |
| 115 | + "arn:aws:s3:::your-bucket-name", |
| 116 | + "arn:aws:s3:::your-bucket-name/*" |
| 117 | + ] |
| 118 | + }, |
| 119 | + { |
| 120 | + "Effect": "Allow", |
| 121 | + "Action": [ |
| 122 | + "bedrock:InvokeModel" |
| 123 | + ], |
| 124 | + "Resource": "arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-7-sonnet-*" |
| 125 | + } |
| 126 | + ] |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +## Development |
| 131 | + |
| 132 | +### Build |
| 133 | + |
| 134 | +```bash |
| 135 | +mvn clean compile |
| 136 | +``` |
| 137 | + |
| 138 | +### Test |
| 139 | + |
| 140 | +```bash |
| 141 | +mvn test |
| 142 | +``` |
| 143 | + |
| 144 | +### Package |
| 145 | + |
| 146 | +```bash |
| 147 | +mvn clean package |
| 148 | +``` |
| 149 | + |
| 150 | +### Run Locally |
| 151 | + |
| 152 | +```bash |
| 153 | +mvn spring-boot:run |
| 154 | +``` |
| 155 | + |
| 156 | +### Docker Build |
| 157 | + |
| 158 | +```bash |
| 159 | +mvn compile jib:dockerBuild |
| 160 | +``` |
| 161 | + |
| 162 | +## Deployment |
| 163 | + |
| 164 | +### Kubernetes |
| 165 | + |
| 166 | +1. **Set up AWS permissions:** |
| 167 | + ```bash |
| 168 | + ./k8s/enable-s3-bedrock-access.sh |
| 169 | + ``` |
| 170 | + |
| 171 | +2. **Deploy to cluster:** |
| 172 | + ```bash |
| 173 | + ./k8s/deploy.sh |
| 174 | + ``` |
| 175 | + |
| 176 | +### Manual Deployment |
| 177 | + |
| 178 | +```bash |
| 179 | +# Apply Kubernetes manifests |
| 180 | +kubectl apply -f k8s/deloyment.yaml |
| 181 | + |
| 182 | +# Wait for deployment |
| 183 | +kubectl wait deployment jvm-analysis-service -n monitoring --for condition=Available=True --timeout=120s |
| 184 | + |
| 185 | +# Check logs |
| 186 | +kubectl logs -l app=jvm-analysis-service -n monitoring |
| 187 | +``` |
| 188 | + |
| 189 | +## Monitoring |
| 190 | + |
| 191 | +### Health Checks |
| 192 | + |
| 193 | +- **Readiness Probe**: `GET /health` (30s initial delay, 10s interval) |
| 194 | +- **Liveness Probe**: `GET /actuator/health` (60s initial delay, 30s interval) |
| 195 | + |
| 196 | +### Resource Requirements |
| 197 | + |
| 198 | +- **CPU**: 1 core (request and limit) |
| 199 | +- **Memory**: 2Gi (request and limit) |
| 200 | + |
| 201 | +## Data Storage |
| 202 | + |
| 203 | +### S3 Structure |
| 204 | + |
| 205 | +``` |
| 206 | +bucket/ |
| 207 | +├── profiling/ |
| 208 | +│ └── {pod-name}/ |
| 209 | +│ └── {date}/ |
| 210 | +│ └── {timestamp}.txt |
| 211 | +└── analysis/ |
| 212 | + ├── {timestamp}_profiling_{pod-name}.txt |
| 213 | + ├── {timestamp}_profiling_{pod-name}.html |
| 214 | + ├── {timestamp}_threaddump_{pod-name}.json |
| 215 | + └── {timestamp}_analysis_{pod-name}.md |
| 216 | +``` |
| 217 | + |
| 218 | +## AI Analysis Output |
| 219 | + |
| 220 | +The service generates comprehensive analysis reports including: |
| 221 | + |
| 222 | +- **Health Status**: Overall application health rating |
| 223 | +- **Thread Analysis**: Thread state distribution and patterns |
| 224 | +- **Top Issues**: Critical performance problems with root causes |
| 225 | +- **Performance Hotspots**: CPU consumers and bottlenecks from flame graphs |
| 226 | +- **Recommendations**: Immediate and short-term improvement suggestions |
| 227 | + |
| 228 | +## Troubleshooting |
| 229 | + |
| 230 | +### Common Issues |
| 231 | + |
| 232 | +1. **Thread dump retrieval fails** |
| 233 | + - Verify pod IP and port accessibility |
| 234 | + - Check actuator endpoints are enabled on target pods |
| 235 | + |
| 236 | +2. **S3 access denied** |
| 237 | + - Verify AWS credentials and permissions |
| 238 | + - Check bucket name and region configuration |
| 239 | + |
| 240 | +3. **Bedrock model access** |
| 241 | + - Ensure model is available in your region |
| 242 | + - Verify Bedrock permissions and quotas |
| 243 | + |
| 244 | +### Logs |
| 245 | + |
| 246 | +Check application logs for detailed error information: |
| 247 | +```bash |
| 248 | +kubectl logs -l app=jvm-analysis-service -n monitoring -f |
| 249 | +``` |
| 250 | + |
| 251 | +## Contributing |
| 252 | + |
| 253 | +1. Fork the repository |
| 254 | +2. Create a feature branch |
| 255 | +3. Make changes with appropriate tests |
| 256 | +4. Submit a pull request |
| 257 | + |
| 258 | +## License |
| 259 | + |
| 260 | +This project is licensed under the MIT License. |
0 commit comments