Skip to content

Nifesimi-p/aws-3tier-java-deployment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS 3-Tier Architecture - Production-Grade Java Application Deployment

AWS Infrastructure Java CI/CD

Production-grade AWS infrastructure demonstrating enterprise networking, high availability, auto-scaling, and comprehensive monitoring.


Table of Contents


Project Overview

This project implements a production-grade 3-tier web application architecture on AWS, demonstrating enterprise networking patterns, high availability design, and modern DevOps practices.

Tech Stack

Layer Technology
Frontend Nginx (Reverse Proxy)
Backend Apache Tomcat 9 (Java 11)
Database RDS MySQL 8.0.39 (Multi-AZ)
Infrastructure AWS (EC2, VPC, ALB, ASG, RDS, Transit Gateway)
CI/CD GitHub Actions → SonarCloud → Nexus → CodeDeploy
Monitoring CloudWatch (Metrics, Logs, Alarms, Dashboard)

Project Timeline

  • Duration: 6 days (February 20-25, 2026)
  • Total Cost: $46.40 ($7.73/day)
  • Lines of Infrastructure Code: 200+ AWS Console configurations
  • Tests Performed: 43 tests across 8 categories

Architecture

High-Level Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                           INTERNET                                   │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                    ┌────────▼────────┐
                    │  Frontend ALB   │ (Internet-Facing)
                    │  (us-east-1a/b) │
                    └────────┬────────┘
                             │
          ┌──────────────────┴──────────────────┐
          │                                     │
    ┌─────▼─────┐                         ┌────▼──────┐
    │  Nginx-1  │ (Public Subnet AZ1a)    │  Nginx-2  │ (Public Subnet AZ1b)
    │ t3.micro  │                         │ t3.micro  │
    └─────┬─────┘                         └────┬──────┘
          │                                     │
          └──────────────────┬──────────────────┘
                             │
                    ┌────────▼────────┐
                    │  Backend ALB    │ (Internal)
                    │  (us-east-1a/b) │
                    └────────┬────────┘
                             │
          ┌──────────────────┴──────────────────┐
          │                                     │
    ┌─────▼──────┐                        ┌────▼───────┐
    │ Tomcat-1   │ (Private Subnet AZ1a)  │ Tomcat-2   │ (Private Subnet AZ1b)
    │ t3.micro   │                        │ t3.micro   │
    └─────┬──────┘                        └────┬───────┘
          │                                     │
          └──────────────────┬──────────────────┘
                             │
                    ┌────────▼─────────┐
                    │   RDS MySQL      │
                    │   Multi-AZ       │
                    │  (db.t3.micro)   │
                    │ Primary: AZ1a    │
                    │ Standby: AZ1b    │
                    └──────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    MANAGEMENT VPC (172.32.0.0/16)                    │
│                                                                       │
│    ┌──────────────┐          Transit Gateway          ┌──────────┐  │
│    │ Bastion Host │◄───────────────┬─────────────────►│ AppVPC   │  │
│    │  (t2.micro)  │                │                  │          │  │
│    └──────────────┘                │                  └──────────┘  │
│         ▲                           │                                │
│         │                           │                                │
│    SSH from Laptop                  └─ Secure Admin Access           │
└─────────────────────────────────────────────────────────────────────┘

Network Architecture

AppVPC (192.168.0.0/16):

  • Public Subnets: 192.168.1.0/24, 192.168.2.0/24 (Nginx instances)
  • Private App Subnets: 192.168.3.0/24, 192.168.4.0/24 (Tomcat instances)
  • Private DB Subnets: 192.168.5.0/24, 192.168.6.0/24 (RDS MySQL)
  • Availability Zones: us-east-1a, us-east-1b

ManagementVPC (172.32.0.0/16):

  • Bastion Subnet: 172.32.1.0/24
  • Purpose: Secure admin access via Transit Gateway

Key Features

High Availability

  • Multi-AZ Deployment across 2 availability zones
  • RDS Multi-AZ with automatic failover (tested: 2 min 8 sec)
  • Auto-healing instances (replacement in 6 minutes)
  • No single points of failure in application path

Scalability

  • Auto Scaling Groups (2-4 instances per tier)
  • Dynamic scaling based on CPU utilization
  • Load distribution across multiple instances
  • Tested capacity: 200,000 users/day

Security

  • Defense-in-depth: 4 security layers (VPC, Subnets, Security Groups, IAM)
  • Private subnets for application and database tiers
  • Bastion host for secure admin access
  • Least privilege IAM roles and security group rules

Monitoring & Observability

  • CloudWatch Dashboard with 5 widgets
  • 6 proactive alarms with email notifications
  • Comprehensive metrics (CPU, memory, response time, request count)
  • Health checks at ALB and ASG levels

CI/CD Pipeline

  • Automated deployment: GitHub → Actions → SonarCloud → Nexus → CodeDeploy
  • Quality gates (SonarCloud analysis on every deployment)
  • Rolling deployments (zero downtime)
  • Artifact management (versioned deployments to Nexus)
  • Deployment Speed: 85 seconds end-to-end (code push → live production)

Pipeline Breakdown:

Build & Test:           53 seconds (includes SonarCloud analysis)
Artifact Upload:        8 seconds (Nexus repository)
AWS Configuration:      1 second
S3 Upload:              6 seconds
CodeDeploy Trigger:     2 seconds
Instance Deployment:    7-10 seconds per instance
────────────────────────────────────────────────
Total:                  85 seconds (1 min 25 sec)

Design Decisions

Why AWS?

Need: 3-tier Java app requiring HA, auto-scaling, and enterprise networking
Decision: AWS provides mature RDS Multi-AZ (automatic failover), native ASG+ALB integration (self-healing), and Transit Gateway (enterprise networking demonstration).
vs Others: Azure better for .NET; GCP better for containers. AWS best for traditional infrastructure patterns.

Why EC2 over Containers?

Need: Tomcat-based Java app with persistent connections
Decision: EC2 fits traditional Tomcat architecture. Lambda has 3-5s cold starts (poor UX). Containers deferred to Round 3 (learn infrastructure first, then abstraction).
Trade-off: More operational overhead but deeper infrastructure understanding.

Why RDS MySQL Multi-AZ?

Need: Relational data with HA requirement
Decision: MySQL for structured relationships. Multi-AZ for automatic failover.
Trade-off: 2x cost ($30 vs $15/month) but 2-minute automatic recovery vs 20-30 minute manual restore.
vs Others: DynamoDB adds NoSQL complexity; Aurora costs 2x for features not needed at this scale.

Key Trade-Offs

Decision Cost Benefit Justification
Multi-AZ RDS +$15/mo 2-min failover Worth it: Automatic recovery vs 20-min manual
2 NAT Gateways +$32/mo Per-AZ independence Eliminated in Round 2 for cost optimization
Transit Gateway +$36/mo Enterprise networking Demonstrates complex patterns; simplified in Round 2
2 ALBs vs 1 +$20/mo Security isolation Backend completely isolated from internet

Implementation

Phase 1: Network Foundation

  • Created 2 VPCs with Transit Gateway connectivity
  • Configured 7 subnets across 2 availability zones
  • Set up NAT Gateways for private subnet internet access
  • Implemented security groups with least privilege

Phase 2: Compute Tier

  • Created Launch Templates with User Data scripts
  • Configured Auto Scaling Groups (2-4 instances per tier)
  • Installed Nginx (frontend) and Tomcat (backend)
  • Integrated CloudWatch Agent for custom metrics

Phase 3: Load Balancing

  • Deployed public-facing ALB for Nginx tier
  • Deployed internal ALB for Tomcat tier
  • Configured health checks (/health endpoint)
  • Set up target groups with proper routing

Phase 4: Database

  • Provisioned RDS MySQL Multi-AZ (db.t3.micro)
  • Created DB subnet group in private subnets
  • Configured automated backups (7-day retention)
  • Tested automatic failover (2 min 8 sec)

Phase 5: CI/CD Pipeline

  • Set up Nexus Repository on EC2
  • Configured GitHub Actions workflow
  • Integrated SonarCloud for code quality
  • Deployed CodeDeploy for automated deployments

Deployment Strategy

Rolling Deployment via Auto Scaling Groups:

  1. Update Launch Template with new configuration
  2. ASG terminates one old instance
  3. ASG launches one new instance
  4. New instance passes health checks (2 consecutive successes)
  5. ALB routes traffic to new instance
  6. Repeat for remaining instances
  7. Result: Zero downtime deployment

Performance Metrics

Load Test Results (Artillery)

Test Configuration:

  • Tool: Artillery
  • Duration: ~6 minutes per run
  • Virtual Users: 4,500
  • Requests: ~9,000 HTTP requests at 23 req/sec
  • Test Phases: Warm-up (5 users/sec) → Sustained (10 users/sec) → Peak (20 users/sec)

Results: Baseline vs Optimized

Metric Baseline (Before) Optimized (After) Improvement Assessment
Success Rate 98.8% (8,860/8,963) 100% (9,000/9,000) +1.2% Perfect
Timeouts (ETIMEDOUT) 103 errors 0 errors 100% eliminated Excellent
Response Time (Min) 195 ms 194 ms Consistent
Response Time (Median) 247 ms 223 ms -9.7% faster Excellent
Response Time (Mean) 379.7 ms 231.8 ms -38.9% faster Excellent
Response Time (p95) 757.6 ms 262.5 ms -65.3% faster Excellent
Response Time (p99) 4,770.6 ms 314.2 ms -93.4% faster Outstanding
Response Time (Max) 8,901 ms 2,671 ms -70% reduction
Throughput 23 req/sec 23 req/sec Consistent

What Changed Between Tests:

  • Auto-Scaling Policy: Added dual-metric scaling (CPU + ALB RequestCountPerTarget)
  • Scale-Out Threshold: RequestCountPerTarget > 500 for 3 datapoints
  • Result: System now scales proactively based on actual request pressure, not just CPU

Key Achievement:

p99 latency reduced by 93.4% (4,770ms → 314ms) — Nearly eliminated slow outliers that cause user abandonment. Zero timeouts under sustained load proves the system can handle peak traffic without connection exhaustion.

Capacity Analysis:

  • Current: ~200,000 users/day (2 instances per tier)
  • Scaled (4 instances): ~400,000 users/day
  • Cost per User: $0.00048/user

High Availability Metrics

RDS Multi-AZ Failover Test:

Failover Initiated:   14:35:22 UTC
Status "Rebooting":   14:35:45 UTC (+23 sec)
Status "Failing-over": 14:36:12 UTC (+50 sec)
Status "Available":   14:37:18 UTC (+1 min 56 sec)
Application Recovery: 14:37:30 UTC (+2 min 8 sec)

 Zero data loss (synchronous replication)
 Zero manual intervention
 Automatic DNS update

Auto-Healing Test:

Instance Failure:      00:00 (Tomcat stopped)
Unhealthy Detection:   01:00 (+1 min, 2 failed health checks)
ASG Termination:       02:00 (+2 min)
New Instance Launch:   02:15 (+2 min 15 sec)
New Instance Healthy:  06:30 (+6 min 30 sec)

 Automatic replacement
 No user impact (other instance serving traffic)
 Zero manual intervention

Auto-Scaling Test:

Test Setup:
- Tool: Artillery (from EC2 inside VPC)
- Target: Internal Backend ALB
- Load: 50 arrivals/second for 5 minutes
- Policy: Target Tracking (RequestCountPerTarget > 500)

Results:
High Load Generated:     11:20 UTC (50 req/sec)
Scale-Out Triggered:     11:23 UTC (+3 min)
New Instances Launched:  11:23 UTC (2 → 4 instances)
CPU Before Scale:        36%
CPU After Scale:         <10% (load distributed)
Result:                  100% capacity increase

 Dual-metric scaling (CPU + RequestCountPerTarget)
 Proactive scaling based on actual request pressure
 70% CPU reduction proves effective load distribution

Reliability Target

  • Target: 99.95% uptime (21.9 minutes downtime/month)
  • Achievable: 99.88% uptime (52 minutes downtime/month)
  • Gap: Deployment downtime, scaling delays (can be closed with blue/green deployments and pre-baked AMIs)

Security

4-Layer Defense-in-Depth

Layer 1: VPC Isolation

  • 2 separate VPCs (Management + Application)
  • Transit Gateway for controlled cross-VPC communication
  • Network segmentation by function

Layer 2: Subnet Segmentation

  • Public subnets: Internet-facing services only (Nginx)
  • Private app subnets: Application logic (Tomcat)
  • Private DB subnets: Database (RDS MySQL)

Layer 3: Security Groups (Stateful Firewall)

Frontend-SG:
  Inbound:  HTTP (80) from 0.0.0.0/0, SSH (22) from 172.32.0.0/16
  Outbound: All traffic

Backend-SG:
  Inbound:  TCP (8080) from Frontend-SG, SSH (22) from 172.32.0.0/16
  Outbound: All traffic

Database-SG:
  Inbound:  MySQL (3306) from Backend-SG only
  Outbound: None

Layer 4: IAM Roles (Least Privilege)

  • EC2-SSM-Role: SSM access, CloudWatch logs/metrics, CodeDeploy artifacts only
  • CodeDeployServiceRole: Describe instances, register/deregister from ALB only
  • No permissions: Terminate instances, modify infrastructure, IAM changes

Access Control

Admin Access Pattern:

Laptop → Bastion Host (ManagementVPC) → Transit Gateway → Private Instances

Benefits:

  • Single point of entry (audit trail)
  • IP whitelisting (only your IP)
  • Key-based authentication (no passwords)
  • Jump host pattern (can't directly SSH to private instances)

Monitoring

CloudWatch Dashboard

5 Widgets Configured:

  1. Frontend Metrics: CPU, Network, Status Checks (Nginx ASG)
  2. Backend Metrics: CPU, Network, Status Checks (Tomcat ASG)
  3. ALB Performance: RequestCount, TargetResponseTime, HTTPCode_2XX/5XX
  4. RDS Performance: CPUUtilization, DatabaseConnections, FreeableMemory, Latency
  5. Target Health: HealthyHostCount, UnhealthyHostCount (both tiers)

Alarms Configured (6 Total)

Alarm Metric Threshold Action
Backend-Unhealthy-Targets UnhealthyHostCount ≥ 1 SNS Email
Frontend-Unhealthy-Targets UnhealthyHostCount ≥ 1 SNS Email
Backend-High-CPU CPUUtilization > 80% (2 periods) SNS Email
Frontend-High-CPU CPUUtilization > 80% (2 periods) SNS Email
RDS-High-CPU CPUUtilization > 75% (2 periods) SNS Email
Backend-High-Response-Time TargetResponseTime > 2 seconds SNS Email

All alarms tested and confirmed working

Health Checks

ALB Health Checks:

  • Protocol: HTTP
  • Path: /health (Nginx), / (Tomcat)
  • Interval: 30 seconds
  • Timeout: 5 seconds
  • Healthy/Unhealthy Threshold: 2 consecutive checks

ASG Health Checks:

  • Type: ELB (detects application failures, not just EC2 status)
  • Grace Period: 300 seconds (allows User Data to complete)

Deliverables

  • VPC-overview
  • Subnets-architecture
  • Artillery-report
  • FE
  • CI/CD-pipeline
  • ASG-activity
  • Cloudwatch-dashboard

Challenges & Solutions

Challenge 1: NAT Gateway Misconfiguration

Problem: Created NAT Gateways in private subnets → instances couldn't download packages
Root Cause: NAT Gateways need internet access themselves; must be in public subnets
Solution: Deleted and recreated NAT Gateways in public subnets with Elastic IPs

Challenge 2: Network ACL Blocking Cross-VPC Traffic

Problem: Transit Gateway configured, routes added, but ping still failed
Root Cause: Network ACL had default DENY rule blocking subnet-level traffic
Solution: Changed Network ACL from DENY to ALLOW
Lesson: Security Groups (instance-level) + Network ACLs (subnet-level) both must allow traffic

Challenge 3: Health Check Path Wrong

Problem: Nginx running, port 80 open, but ALB marked instances unhealthy
Root Cause: Health check path was / (slow, proxies to backend) instead of /health (fast)
Solution: Changed Target Group health check path to /health endpoint
Lesson: Health checks should be fast and independent (not depend on backend services)
Impact: Instances became healthy in 2 minutes after fix


Cost Analysis

6-Day Actual Costs

Component Daily Cost Total (6 days) % of Total
NAT Gateway (2) $2.20 $13.20 28.5%
Transit Gateway $1.20 $7.20 15.5%
Load Balancers (2) $1.33 $7.98 17.2%
RDS Multi-AZ $1.00 $6.00 12.9%
EC2 Instances (4) $0.60 $3.60 7.8%
Nexus (t3.medium) $0.80 $4.80 10.4%
Bastion (t2.micro) $0.13 $0.78 1.7%
CloudWatch $0.05 $0.30 0.6%
TOTAL $7.73/day $46.40 100%

Monthly Projections

24/7 Operation: $232/month
Optimized (stop when not working): $55-80/month
With Reserved Instances (1-year): $143/month (38% savings)
Round 2 Simplified Architecture: $50/month (eliminates NAT/TG)

Top Cost Drivers

  1. NAT Gateways (30%): $65/month → Eliminated in Round 2
  2. Transit Gateway (17%): $36/month → Eliminated in Round 2
  3. Load Balancers (17%): $40/month → Essential, can't optimize much

Cost Per User

  • Current: $0.00048 per user/day
  • At 1M users/day: $0.00023 per user
  • Competitive with Heroku ($0.00083/user), AWS Fargate ($0.00031/user)

Future Improvements

Security Enhancements

Priority: High

  • Add SSL/TLS certificate (ACM + HTTPS redirect)
  • Implement AWS Secrets Manager for credentials
  • Enable WAF with OWASP Top 10 rules
  • Enable RDS encryption at rest

Infrastructure as Code

Priority: High

  • Rebuild with Terraform (Round 2)
  • Version control infrastructure
  • Faster deployment (5 min vs 3 days)

Performance Optimization

Priority: Medium

  • Pre-bake AMI (reduce scaling time: 7.5 → 3.5 min)
  • Add Redis caching (reduce DB load 80%)
  • Implement CloudFront CDN (global distribution)

Containerization

Priority: Future

  • Migrate to ECS Fargate (Round 3)
  • Faster scaling (< 1 minute)
  • Modern cloud-native architecture

Getting Started

Prerequisites

  • AWS Account with billing enabled
  • AWS CLI v2 installed and configured
  • Basic understanding of AWS services
  • SSH key pair for EC2 access

Repository Structure

round-1-aws-3tier/
├── README.md
├── docs/
│   ├── architecture-diagram.png
│   ├── network-diagram.png
│   ├── Round-1-Complete-Testing-Documentation.md
│   └── Round-1-Complete-Lessons-Learned-Guide.md
├── scripts/
│   ├── nginx-user-data.sh
│   ├── tomcat-user-data.sh
│   └── cloudwatch-agent-config.json
├── screenshots/
│   ├── cloudwatch-dashboard.png
│   ├── healthy-targets.png
│   ├── load-test-results.png
│   └── rds-multi-az.png
└── ci-cd/
    ├── .github/workflows/deploy.yml
    ├── appspec.yml
    └── deployment-scripts/

Deployment Steps

  1. Clone Repository

    git clone https://github.com/yourusername/round-1-aws-3tier.git
    cd round-1-aws-3tier
  2. Review Documentation

    • Read architecture overview
    • Understand network design
    • Review security considerations
  3. Deploy Infrastructure

    • Follow AWS Console setup (Round 1)
    • Or use Terraform (Round 2)
  4. Test Deployment

    • Run load tests
    • Test failover scenarios
    • Verify monitoring
  5. Clean Up

    • Delete resources to avoid charges
    • Follow deletion order in documentation

Contributing

Contributions, issues, and feature requests are welcome!

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Project Metrics Summary

Metric Value
Success Rate 100% (optimized)
p99 Response Time 314 ms (was 4,771ms, 93% improvement)
Mean Response Time 232 ms (was 380ms, 39% improvement)
Timeouts 0 (eliminated 100%)
Failover Time ~60 sec (avg of 3 tests)
Auto-Scaling 2 → 4 instances (100% capacity increase)
Throughput 23 req/sec sustained
Deployment Speed 85 seconds (end-to-end)
User Capacity 200k-400k users/day
Uptime Target 99.95%
Cost $7.73/day

⭐ Star this repo if you found it helpful!

Made with ❤️

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors