Skip to content

Commit 96ad1e8

Browse files
committed
Added jvm-analysis-service to process trace and metric data and provide recommendations to improve application scalability and performance
1 parent 3593bc1 commit 96ad1e8

23 files changed

+2448
-0
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/mvnw text eol=lf
2+
*.cmd text eol=crlf
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
HELP.md
2+
target/
3+
.mvn/wrapper/maven-wrapper.jar
4+
!**/src/main/**/target/
5+
!**/src/test/**/target/
6+
7+
### STS ###
8+
.apt_generated
9+
.classpath
10+
.factorypath
11+
.project
12+
.settings
13+
.springBeans
14+
.sts4-cache
15+
16+
### IntelliJ IDEA ###
17+
.idea
18+
*.iws
19+
*.iml
20+
*.ipr
21+
22+
### NetBeans ###
23+
/nbproject/private/
24+
/nbbuild/
25+
/dist/
26+
/nbdist/
27+
/.nb-gradle/
28+
build/
29+
!**/src/main/**/build/
30+
!**/src/test/**/build/
31+
32+
### VS Code ###
33+
.vscode/
Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# JVM Analysis Service
2+
3+
A Spring Boot microservice that provides automated JVM performance analysis using AI-powered recommendations. The service processes alert webhooks, retrieves thread dumps and profiling data, generates flame graphs, and provides intelligent analysis using AWS Bedrock.
4+
5+
## Features
6+
7+
- **Automated JVM Analysis**: Processes performance alerts and generates comprehensive analysis reports
8+
- **AI-Powered Recommendations**: Uses AWS Bedrock (Claude 3.7 Sonnet) for intelligent performance insights
9+
- **Flame Graph Generation**: Converts profiling data to interactive HTML flame graphs
10+
- **S3 Integration**: Stores and retrieves profiling data, thread dumps, and analysis results
11+
- **Resilient Design**: Built-in retry mechanisms for external service calls
12+
13+
## Architecture
14+
15+
### Components
16+
17+
- **JvmAnalysisController**: REST API endpoint for webhook processing
18+
- **JvmAnalysisService**: Core business logic orchestrating the analysis workflow
19+
- **AIRecommendation**: AWS Bedrock integration for AI-powered analysis
20+
- **FlameGraphConverter**: Converts collapsed profiling data to HTML flame graphs
21+
- **S3Connector**: Handles all S3 operations for data storage and retrieval
22+
23+
### Workflow
24+
25+
1. Receives alert webhook with pod information
26+
2. Retrieves thread dump from target pod
27+
3. Fetches latest profiling data from S3
28+
4. Converts profiling data to flame graph
29+
5. Analyzes performance using AI recommendations
30+
6. Stores results (thread dump, flame graph, analysis) in S3
31+
32+
## API Reference
33+
34+
### POST /webhook
35+
36+
Processes performance alert webhooks and triggers JVM analysis.
37+
38+
**Request Body:**
39+
```json
40+
{
41+
"alerts": [
42+
{
43+
"labels": {
44+
"pod": "my-app-pod-123",
45+
"instance": "10.0.1.100:8080"
46+
}
47+
}
48+
]
49+
}
50+
```
51+
52+
**Response:**
53+
```json
54+
{
55+
"message": "Processed alerts",
56+
"count": 1
57+
}
58+
```
59+
60+
**Status Codes:**
61+
- `200 OK`: Successfully processed alerts
62+
- `400 Bad Request`: Invalid request format
63+
- `500 Internal Server Error`: Processing failed
64+
65+
### Health Endpoints
66+
67+
- `GET /actuator/health`: Application health status
68+
- `GET /health`: Custom health endpoint for readiness probe
69+
70+
## Configuration
71+
72+
### Environment Variables
73+
74+
| Variable | Description | Default |
75+
|----------|-------------|---------|
76+
| `AWS_REGION` | AWS region for services | `us-east-1` |
77+
| `AWS_S3_BUCKET` | S3 bucket for data storage | `default_bucket_name` |
78+
| `AWS_S3_PREFIX_ANALYSIS` | S3 prefix for analysis results | `analysis/` |
79+
| `AWS_S3_PREFIX_PROFILING` | S3 prefix for profiling data | `profiling/` |
80+
| `AWS_BEDROCK_MODEL_ID` | Bedrock model identifier | `us.anthropic.claude-3-7-sonnet-20250219-v1:0` |
81+
| `AWS_BEDROCK_MAX_TOKENS` | Maximum tokens for AI analysis | `10000` |
82+
| `THREADDUMP_URL_TEMPLATE` | Thread dump endpoint template | `http://{podIp}:8080/actuator/threaddump` |
83+
84+
### Application Properties
85+
86+
```properties
87+
# Resilience4J retry configuration
88+
resilience4j.retry.instances.threadDump.max-attempts=3
89+
resilience4j.retry.instances.threadDump.wait-duration=2s
90+
resilience4j.retry.instances.threadDump.exponential-backoff-multiplier=2
91+
```
92+
93+
## Prerequisites
94+
95+
- Java 21+
96+
- Maven 3.6+
97+
- AWS Account with appropriate permissions
98+
- S3 bucket for data storage
99+
- AWS Bedrock access (Claude 3.7 Sonnet model)
100+
101+
### Required AWS Permissions
102+
103+
```json
104+
{
105+
"Version": "2012-10-17",
106+
"Statement": [
107+
{
108+
"Effect": "Allow",
109+
"Action": [
110+
"s3:GetObject",
111+
"s3:PutObject",
112+
"s3:ListBucket"
113+
],
114+
"Resource": [
115+
"arn:aws:s3:::your-bucket-name",
116+
"arn:aws:s3:::your-bucket-name/*"
117+
]
118+
},
119+
{
120+
"Effect": "Allow",
121+
"Action": [
122+
"bedrock:InvokeModel"
123+
],
124+
"Resource": "arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-7-sonnet-*"
125+
}
126+
]
127+
}
128+
```
129+
130+
## Development
131+
132+
### Build
133+
134+
```bash
135+
mvn clean compile
136+
```
137+
138+
### Test
139+
140+
```bash
141+
mvn test
142+
```
143+
144+
### Package
145+
146+
```bash
147+
mvn clean package
148+
```
149+
150+
### Run Locally
151+
152+
```bash
153+
mvn spring-boot:run
154+
```
155+
156+
### Docker Build
157+
158+
```bash
159+
mvn compile jib:dockerBuild
160+
```
161+
162+
## Deployment
163+
164+
### Kubernetes
165+
166+
1. **Set up AWS permissions:**
167+
```bash
168+
./k8s/enable-s3-bedrock-access.sh
169+
```
170+
171+
2. **Deploy to cluster:**
172+
```bash
173+
./k8s/deploy.sh
174+
```
175+
176+
### Manual Deployment
177+
178+
```bash
179+
# Apply Kubernetes manifests
180+
kubectl apply -f k8s/deloyment.yaml
181+
182+
# Wait for deployment
183+
kubectl wait deployment jvm-analysis-service -n monitoring --for condition=Available=True --timeout=120s
184+
185+
# Check logs
186+
kubectl logs -l app=jvm-analysis-service -n monitoring
187+
```
188+
189+
## Monitoring
190+
191+
### Health Checks
192+
193+
- **Readiness Probe**: `GET /health` (30s initial delay, 10s interval)
194+
- **Liveness Probe**: `GET /actuator/health` (60s initial delay, 30s interval)
195+
196+
### Resource Requirements
197+
198+
- **CPU**: 1 core (request and limit)
199+
- **Memory**: 2Gi (request and limit)
200+
201+
## Data Storage
202+
203+
### S3 Structure
204+
205+
```
206+
bucket/
207+
├── profiling/
208+
│ └── {pod-name}/
209+
│ └── {date}/
210+
│ └── {timestamp}.txt
211+
└── analysis/
212+
├── {timestamp}_profiling_{pod-name}.txt
213+
├── {timestamp}_profiling_{pod-name}.html
214+
├── {timestamp}_threaddump_{pod-name}.json
215+
└── {timestamp}_analysis_{pod-name}.md
216+
```
217+
218+
## AI Analysis Output
219+
220+
The service generates comprehensive analysis reports including:
221+
222+
- **Health Status**: Overall application health rating
223+
- **Thread Analysis**: Thread state distribution and patterns
224+
- **Top Issues**: Critical performance problems with root causes
225+
- **Performance Hotspots**: CPU consumers and bottlenecks from flame graphs
226+
- **Recommendations**: Immediate and short-term improvement suggestions
227+
228+
## Troubleshooting
229+
230+
### Common Issues
231+
232+
1. **Thread dump retrieval fails**
233+
- Verify pod IP and port accessibility
234+
- Check actuator endpoints are enabled on target pods
235+
236+
2. **S3 access denied**
237+
- Verify AWS credentials and permissions
238+
- Check bucket name and region configuration
239+
240+
3. **Bedrock model access**
241+
- Ensure model is available in your region
242+
- Verify Bedrock permissions and quotas
243+
244+
### Logs
245+
246+
Check application logs for detailed error information:
247+
```bash
248+
kubectl logs -l app=jvm-analysis-service -n monitoring -f
249+
```
250+
251+
## Contributing
252+
253+
1. Fork the repository
254+
2. Create a feature branch
255+
3. Make changes with appropriate tests
256+
4. Submit a pull request
257+
258+
## License
259+
260+
This project is licensed under the MIT License.
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: jvm-analysis-service
5+
namespace: monitoring
6+
labels:
7+
app: jvm-analysis-service
8+
spec:
9+
replicas: 1
10+
selector:
11+
matchLabels:
12+
app: jvm-analysis-service
13+
template:
14+
metadata:
15+
labels:
16+
app: jvm-analysis-service
17+
spec:
18+
serviceAccountName: jvm-analysis-service
19+
containers:
20+
- name: jvm-analysis-service
21+
resources:
22+
requests:
23+
cpu: "1"
24+
memory: "2Gi"
25+
limits:
26+
cpu: "1"
27+
memory: "2Gi"
28+
image: ${ECR_URI}:latest
29+
ports:
30+
- containerPort: 8080
31+
env:
32+
- name: AWS_REGION
33+
value: "${AWS_REGION:-us-east-1}"
34+
- name: AWS_S3_BUCKET
35+
value: "${S3_BUCKET}"
36+
- name: AWS_S3_PREFIX_ANALYSIS
37+
value: "analysis/"
38+
- name: AWS_S3_PREFIX_PROFILING
39+
value: "profiling/"
40+
- name: SPRING_AI_BEDROCK_CONVERSE_CHAT_OPTIONS_MODEL
41+
value: "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
42+
- name: SPRING_AI_BEDROCK_CONVERSE_CHAT_OPTIONS_MAX_TOKENS
43+
value: "10000"
44+
readinessProbe:
45+
httpGet:
46+
path: /health
47+
port: 8080
48+
initialDelaySeconds: 30
49+
periodSeconds: 10
50+
livenessProbe:
51+
httpGet:
52+
path: /actuator/health
53+
port: 8080
54+
initialDelaySeconds: 60
55+
periodSeconds: 30
56+
---
57+
apiVersion: v1
58+
kind: Service
59+
metadata:
60+
name: jvm-analysis-service
61+
namespace: monitoring
62+
labels:
63+
app: jvm-analysis-service
64+
spec:
65+
selector:
66+
app: jvm-analysis-service
67+
ports:
68+
- port: 80
69+
targetPort: 8080
70+
protocol: TCP
71+
type: ClusterIP
72+
---
73+
apiVersion: v1
74+
kind: ServiceAccount
75+
metadata:
76+
name: jvm-analysis-service
77+
namespace: monitoring

0 commit comments

Comments
 (0)