AWS 3-Tier Architecture - Production-Grade Java Application Deployment

Production-grade AWS infrastructure demonstrating enterprise networking, high availability, auto-scaling, and comprehensive monitoring.

Project Overview

This project implements a production-grade 3-tier web application architecture on AWS, demonstrating enterprise networking patterns, high availability design, and modern DevOps practices.

Tech Stack

Layer	Technology
Frontend	Nginx (Reverse Proxy)
Backend	Apache Tomcat 9 (Java 11)
Database	RDS MySQL 8.0.39 (Multi-AZ)
Infrastructure	AWS (EC2, VPC, ALB, ASG, RDS, Transit Gateway)
CI/CD	GitHub Actions → SonarCloud → Nexus → CodeDeploy
Monitoring	CloudWatch (Metrics, Logs, Alarms, Dashboard)

Project Timeline

Duration: 6 days (February 20-25, 2026)
Total Cost: $46.40 ($7.73/day)
Lines of Infrastructure Code: 200+ AWS Console configurations
Tests Performed: 43 tests across 8 categories

Architecture

High-Level Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                           INTERNET                                   │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                    ┌────────▼────────┐
                    │  Frontend ALB   │ (Internet-Facing)
                    │  (us-east-1a/b) │
                    └────────┬────────┘
                             │
          ┌──────────────────┴──────────────────┐
          │                                     │
    ┌─────▼─────┐                         ┌────▼──────┐
    │  Nginx-1  │ (Public Subnet AZ1a)    │  Nginx-2  │ (Public Subnet AZ1b)
    │ t3.micro  │                         │ t3.micro  │
    └─────┬─────┘                         └────┬──────┘
          │                                     │
          └──────────────────┬──────────────────┘
                             │
                    ┌────────▼────────┐
                    │  Backend ALB    │ (Internal)
                    │  (us-east-1a/b) │
                    └────────┬────────┘
                             │
          ┌──────────────────┴──────────────────┐
          │                                     │
    ┌─────▼──────┐                        ┌────▼───────┐
    │ Tomcat-1   │ (Private Subnet AZ1a)  │ Tomcat-2   │ (Private Subnet AZ1b)
    │ t3.micro   │                        │ t3.micro   │
    └─────┬──────┘                        └────┬───────┘
          │                                     │
          └──────────────────┬──────────────────┘
                             │
                    ┌────────▼─────────┐
                    │   RDS MySQL      │
                    │   Multi-AZ       │
                    │  (db.t3.micro)   │
                    │ Primary: AZ1a    │
                    │ Standby: AZ1b    │
                    └──────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    MANAGEMENT VPC (172.32.0.0/16)                    │
│                                                                       │
│    ┌──────────────┐          Transit Gateway          ┌──────────┐  │
│    │ Bastion Host │◄───────────────┬─────────────────►│ AppVPC   │  │
│    │  (t2.micro)  │                │                  │          │  │
│    └──────────────┘                │                  └──────────┘  │
│         ▲                           │                                │
│         │                           │                                │
│    SSH from Laptop                  └─ Secure Admin Access           │
└─────────────────────────────────────────────────────────────────────┘

Network Architecture

AppVPC (192.168.0.0/16):

Public Subnets: 192.168.1.0/24, 192.168.2.0/24 (Nginx instances)
Private App Subnets: 192.168.3.0/24, 192.168.4.0/24 (Tomcat instances)
Private DB Subnets: 192.168.5.0/24, 192.168.6.0/24 (RDS MySQL)
Availability Zones: us-east-1a, us-east-1b

ManagementVPC (172.32.0.0/16):

Bastion Subnet: 172.32.1.0/24
Purpose: Secure admin access via Transit Gateway

Key Features

High Availability

Multi-AZ Deployment across 2 availability zones
RDS Multi-AZ with automatic failover (tested: 2 min 8 sec)
Auto-healing instances (replacement in 6 minutes)
No single points of failure in application path

Scalability

Auto Scaling Groups (2-4 instances per tier)
Dynamic scaling based on CPU utilization
Load distribution across multiple instances
Tested capacity: 200,000 users/day

Security

Defense-in-depth: 4 security layers (VPC, Subnets, Security Groups, IAM)
Private subnets for application and database tiers
Bastion host for secure admin access
Least privilege IAM roles and security group rules

Monitoring & Observability

CloudWatch Dashboard with 5 widgets
6 proactive alarms with email notifications
Comprehensive metrics (CPU, memory, response time, request count)
Health checks at ALB and ASG levels

CI/CD Pipeline

Automated deployment: GitHub → Actions → SonarCloud → Nexus → CodeDeploy
Quality gates (SonarCloud analysis on every deployment)
Rolling deployments (zero downtime)
Artifact management (versioned deployments to Nexus)
Deployment Speed: 85 seconds end-to-end (code push → live production)

Pipeline Breakdown:

Build & Test:           53 seconds (includes SonarCloud analysis)
Artifact Upload:        8 seconds (Nexus repository)
AWS Configuration:      1 second
S3 Upload:              6 seconds
CodeDeploy Trigger:     2 seconds
Instance Deployment:    7-10 seconds per instance
────────────────────────────────────────────────
Total:                  85 seconds (1 min 25 sec)

Design Decisions

Why AWS?

Need: 3-tier Java app requiring HA, auto-scaling, and enterprise networking
Decision: AWS provides mature RDS Multi-AZ (automatic failover), native ASG+ALB integration (self-healing), and Transit Gateway (enterprise networking demonstration).
vs Others: Azure better for .NET; GCP better for containers. AWS best for traditional infrastructure patterns.

Why EC2 over Containers?

Need: Tomcat-based Java app with persistent connections
Decision: EC2 fits traditional Tomcat architecture. Lambda has 3-5s cold starts (poor UX). Containers deferred to Round 3 (learn infrastructure first, then abstraction).
Trade-off: More operational overhead but deeper infrastructure understanding.

Why RDS MySQL Multi-AZ?

Need: Relational data with HA requirement
Decision: MySQL for structured relationships. Multi-AZ for automatic failover.
Trade-off: 2x cost ($30 vs $15/month) but 2-minute automatic recovery vs 20-30 minute manual restore.
vs Others: DynamoDB adds NoSQL complexity; Aurora costs 2x for features not needed at this scale.

Key Trade-Offs

Decision	Cost	Benefit	Justification
Multi-AZ RDS	+$15/mo	2-min failover	Worth it: Automatic recovery vs 20-min manual
2 NAT Gateways	+$32/mo	Per-AZ independence	Eliminated in Round 2 for cost optimization
Transit Gateway	+$36/mo	Enterprise networking	Demonstrates complex patterns; simplified in Round 2
2 ALBs vs 1	+$20/mo	Security isolation	Backend completely isolated from internet

Implementation

Phase 1: Network Foundation

Created 2 VPCs with Transit Gateway connectivity
Configured 7 subnets across 2 availability zones
Set up NAT Gateways for private subnet internet access
Implemented security groups with least privilege

Phase 2: Compute Tier

Created Launch Templates with User Data scripts
Configured Auto Scaling Groups (2-4 instances per tier)
Installed Nginx (frontend) and Tomcat (backend)
Integrated CloudWatch Agent for custom metrics

Phase 3: Load Balancing

Deployed public-facing ALB for Nginx tier
Deployed internal ALB for Tomcat tier
Configured health checks (/health endpoint)
Set up target groups with proper routing

Phase 4: Database

Provisioned RDS MySQL Multi-AZ (db.t3.micro)
Created DB subnet group in private subnets
Configured automated backups (7-day retention)
Tested automatic failover (2 min 8 sec)

Phase 5: CI/CD Pipeline

Set up Nexus Repository on EC2
Configured GitHub Actions workflow
Integrated SonarCloud for code quality
Deployed CodeDeploy for automated deployments

Deployment Strategy

Rolling Deployment via Auto Scaling Groups:

Update Launch Template with new configuration
ASG terminates one old instance
ASG launches one new instance
New instance passes health checks (2 consecutive successes)
ALB routes traffic to new instance
Repeat for remaining instances
Result: Zero downtime deployment

Performance Metrics

Load Test Results (Artillery)

Test Configuration:

Tool: Artillery
Duration: ~6 minutes per run
Virtual Users: 4,500
Requests: ~9,000 HTTP requests at 23 req/sec
Test Phases: Warm-up (5 users/sec) → Sustained (10 users/sec) → Peak (20 users/sec)

Results: Baseline vs Optimized

Metric	Baseline (Before)	Optimized (After)	Improvement	Assessment
Success Rate	98.8% (8,860/8,963)	100% (9,000/9,000)	+1.2%	Perfect
Timeouts (ETIMEDOUT)	103 errors	0 errors	100% eliminated	Excellent
Response Time (Min)	195 ms	194 ms	Consistent
Response Time (Median)	247 ms	223 ms	-9.7% faster	Excellent
Response Time (Mean)	379.7 ms	231.8 ms	-38.9% faster	Excellent
Response Time (p95)	757.6 ms	262.5 ms	-65.3% faster	Excellent
Response Time (p99)	4,770.6 ms	314.2 ms	-93.4% faster	Outstanding
Response Time (Max)	8,901 ms	2,671 ms	-70% reduction
Throughput	23 req/sec	23 req/sec	Consistent

What Changed Between Tests:

Auto-Scaling Policy: Added dual-metric scaling (CPU + ALB RequestCountPerTarget)
Scale-Out Threshold: RequestCountPerTarget > 500 for 3 datapoints
Result: System now scales proactively based on actual request pressure, not just CPU

Key Achievement:

p99 latency reduced by 93.4% (4,770ms → 314ms) — Nearly eliminated slow outliers that cause user abandonment. Zero timeouts under sustained load proves the system can handle peak traffic without connection exhaustion.

Capacity Analysis:

Current: ~200,000 users/day (2 instances per tier)
Scaled (4 instances): ~400,000 users/day
Cost per User: $0.00048/user

High Availability Metrics

RDS Multi-AZ Failover Test:

Failover Initiated:   14:35:22 UTC
Status "Rebooting":   14:35:45 UTC (+23 sec)
Status "Failing-over": 14:36:12 UTC (+50 sec)
Status "Available":   14:37:18 UTC (+1 min 56 sec)
Application Recovery: 14:37:30 UTC (+2 min 8 sec)

 Zero data loss (synchronous replication)
 Zero manual intervention
 Automatic DNS update

Auto-Healing Test:

Instance Failure:      00:00 (Tomcat stopped)
Unhealthy Detection:   01:00 (+1 min, 2 failed health checks)
ASG Termination:       02:00 (+2 min)
New Instance Launch:   02:15 (+2 min 15 sec)
New Instance Healthy:  06:30 (+6 min 30 sec)

 Automatic replacement
 No user impact (other instance serving traffic)
 Zero manual intervention

Auto-Scaling Test:

Test Setup:
- Tool: Artillery (from EC2 inside VPC)
- Target: Internal Backend ALB
- Load: 50 arrivals/second for 5 minutes
- Policy: Target Tracking (RequestCountPerTarget > 500)

Results:
High Load Generated:     11:20 UTC (50 req/sec)
Scale-Out Triggered:     11:23 UTC (+3 min)
New Instances Launched:  11:23 UTC (2 → 4 instances)
CPU Before Scale:        36%
CPU After Scale:         <10% (load distributed)
Result:                  100% capacity increase

 Dual-metric scaling (CPU + RequestCountPerTarget)
 Proactive scaling based on actual request pressure
 70% CPU reduction proves effective load distribution

Reliability Target

Target: 99.95% uptime (21.9 minutes downtime/month)
Achievable: 99.88% uptime (52 minutes downtime/month)
Gap: Deployment downtime, scaling delays (can be closed with blue/green deployments and pre-baked AMIs)

Security

4-Layer Defense-in-Depth

Layer 1: VPC Isolation

2 separate VPCs (Management + Application)
Transit Gateway for controlled cross-VPC communication
Network segmentation by function

Layer 2: Subnet Segmentation

Public subnets: Internet-facing services only (Nginx)
Private app subnets: Application logic (Tomcat)
Private DB subnets: Database (RDS MySQL)

Layer 3: Security Groups (Stateful Firewall)

Frontend-SG:
  Inbound:  HTTP (80) from 0.0.0.0/0, SSH (22) from 172.32.0.0/16
  Outbound: All traffic

Backend-SG:
  Inbound:  TCP (8080) from Frontend-SG, SSH (22) from 172.32.0.0/16
  Outbound: All traffic

Database-SG:
  Inbound:  MySQL (3306) from Backend-SG only
  Outbound: None

Layer 4: IAM Roles (Least Privilege)

EC2-SSM-Role: SSM access, CloudWatch logs/metrics, CodeDeploy artifacts only
CodeDeployServiceRole: Describe instances, register/deregister from ALB only
No permissions: Terminate instances, modify infrastructure, IAM changes

Access Control

Admin Access Pattern:

Laptop → Bastion Host (ManagementVPC) → Transit Gateway → Private Instances

Benefits:

Single point of entry (audit trail)
IP whitelisting (only your IP)
Key-based authentication (no passwords)
Jump host pattern (can't directly SSH to private instances)

Monitoring

CloudWatch Dashboard

5 Widgets Configured:

Frontend Metrics: CPU, Network, Status Checks (Nginx ASG)
Backend Metrics: CPU, Network, Status Checks (Tomcat ASG)
ALB Performance: RequestCount, TargetResponseTime, HTTPCode_2XX/5XX
RDS Performance: CPUUtilization, DatabaseConnections, FreeableMemory, Latency
Target Health: HealthyHostCount, UnhealthyHostCount (both tiers)

Alarms Configured (6 Total)

Alarm	Metric	Threshold	Action
Backend-Unhealthy-Targets	UnhealthyHostCount	≥ 1	SNS Email
Frontend-Unhealthy-Targets	UnhealthyHostCount	≥ 1	SNS Email
Backend-High-CPU	CPUUtilization	> 80% (2 periods)	SNS Email
Frontend-High-CPU	CPUUtilization	> 80% (2 periods)	SNS Email
RDS-High-CPU	CPUUtilization	> 75% (2 periods)	SNS Email
Backend-High-Response-Time	TargetResponseTime	> 2 seconds	SNS Email

All alarms tested and confirmed working

Health Checks

ALB Health Checks:

Protocol: HTTP
Path: /health (Nginx), / (Tomcat)
Interval: 30 seconds
Timeout: 5 seconds
Healthy/Unhealthy Threshold: 2 consecutive checks

ASG Health Checks:

Type: ELB (detects application failures, not just EC2 status)
Grace Period: 300 seconds (allows User Data to complete)

Deliverables

Challenges & Solutions

Challenge 1: NAT Gateway Misconfiguration

Problem: Created NAT Gateways in private subnets → instances couldn't download packages
Root Cause: NAT Gateways need internet access themselves; must be in public subnets
Solution: Deleted and recreated NAT Gateways in public subnets with Elastic IPs

Challenge 2: Network ACL Blocking Cross-VPC Traffic

Problem: Transit Gateway configured, routes added, but ping still failed
Root Cause: Network ACL had default DENY rule blocking subnet-level traffic
Solution: Changed Network ACL from DENY to ALLOW
Lesson: Security Groups (instance-level) + Network ACLs (subnet-level) both must allow traffic

Challenge 3: Health Check Path Wrong

Problem: Nginx running, port 80 open, but ALB marked instances unhealthy
Root Cause: Health check path was / (slow, proxies to backend) instead of /health (fast)
Solution: Changed Target Group health check path to /health endpoint
Lesson: Health checks should be fast and independent (not depend on backend services)
Impact: Instances became healthy in 2 minutes after fix

Cost Analysis

6-Day Actual Costs

Component	Daily Cost	Total (6 days)	% of Total
NAT Gateway (2)	$2.20	$13.20	28.5%
Transit Gateway	$1.20	$7.20	15.5%
Load Balancers (2)	$1.33	$7.98	17.2%
RDS Multi-AZ	$1.00	$6.00	12.9%
EC2 Instances (4)	$0.60	$3.60	7.8%
Nexus (t3.medium)	$0.80	$4.80	10.4%
Bastion (t2.micro)	$0.13	$0.78	1.7%
CloudWatch	$0.05	$0.30	0.6%
TOTAL	$7.73/day	$46.40	100%

Monthly Projections

24/7 Operation: $232/month
Optimized (stop when not working): $55-80/month
With Reserved Instances (1-year): $143/month (38% savings)
Round 2 Simplified Architecture: $50/month (eliminates NAT/TG)

Top Cost Drivers

NAT Gateways (30%): $65/month → Eliminated in Round 2
Transit Gateway (17%): $36/month → Eliminated in Round 2
Load Balancers (17%): $40/month → Essential, can't optimize much

Cost Per User

Current: $0.00048 per user/day
At 1M users/day: $0.00023 per user
Competitive with Heroku ($0.00083/user), AWS Fargate ($0.00031/user)

Future Improvements

Security Enhancements

Priority: High

Add SSL/TLS certificate (ACM + HTTPS redirect)
Implement AWS Secrets Manager for credentials
Enable WAF with OWASP Top 10 rules
Enable RDS encryption at rest

Infrastructure as Code

Priority: High

Rebuild with Terraform (Round 2)
Version control infrastructure
Faster deployment (5 min vs 3 days)

Performance Optimization

Priority: Medium

Pre-bake AMI (reduce scaling time: 7.5 → 3.5 min)
Add Redis caching (reduce DB load 80%)
Implement CloudFront CDN (global distribution)

Containerization

Priority: Future

Migrate to ECS Fargate (Round 3)
Faster scaling (< 1 minute)
Modern cloud-native architecture

Getting Started

Prerequisites

AWS Account with billing enabled
AWS CLI v2 installed and configured
Basic understanding of AWS services
SSH key pair for EC2 access

Repository Structure

round-1-aws-3tier/
├── README.md
├── docs/
│   ├── architecture-diagram.png
│   ├── network-diagram.png
│   ├── Round-1-Complete-Testing-Documentation.md
│   └── Round-1-Complete-Lessons-Learned-Guide.md
├── scripts/
│   ├── nginx-user-data.sh
│   ├── tomcat-user-data.sh
│   └── cloudwatch-agent-config.json
├── screenshots/
│   ├── cloudwatch-dashboard.png
│   ├── healthy-targets.png
│   ├── load-test-results.png
│   └── rds-multi-az.png
└── ci-cd/
    ├── .github/workflows/deploy.yml
    ├── appspec.yml
    └── deployment-scripts/

Deployment Steps

Clone Repository

git clone https://github.com/yourusername/round-1-aws-3tier.git
cd round-1-aws-3tier

Review Documentation
- Read architecture overview
- Understand network design
- Review security considerations
Deploy Infrastructure
- Follow AWS Console setup (Round 1)
- Or use Terraform (Round 2)
Test Deployment
- Run load tests
- Test failover scenarios
- Verify monitoring
Clean Up
- Delete resources to avoid charges
- Follow deletion order in documentation

Contributing

Contributions, issues, and feature requests are welcome!

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Project Metrics Summary

Metric	Value
Success Rate	100% (optimized)
p99 Response Time	314 ms (was 4,771ms, 93% improvement)
Mean Response Time	232 ms (was 380ms, 39% improvement)
Timeouts	0 (eliminated 100%)
Failover Time	~60 sec (avg of 3 tests)
Auto-Scaling	2 → 4 instances (100% capacity increase)
Throughput	23 req/sec sustained
Deployment Speed	85 seconds (end-to-end)
User Capacity	200k-400k users/day
Uptime Target	99.95%
Cost	$7.73/day

⭐ Star this repo if you found it helpful!

Made with ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
Java-Login-App		Java-Login-App
Screenshots		Screenshots
infrastructure		infrastructure
scripts		scripts
README.md		README.md
appspec.yml		appspec.yml

Folders and files

Latest commit

History

Repository files navigation

AWS 3-Tier Architecture - Production-Grade Java Application Deployment

Table of Contents

Project Overview

Tech Stack

Project Timeline

Architecture

High-Level Architecture Diagram

Network Architecture

Key Features

High Availability

Scalability

Security

Monitoring & Observability

CI/CD Pipeline

Design Decisions

Why AWS?

Why EC2 over Containers?

Why RDS MySQL Multi-AZ?

Key Trade-Offs

Implementation

Phase 1: Network Foundation

Phase 2: Compute Tier

Phase 3: Load Balancing

Phase 4: Database

Phase 5: CI/CD Pipeline

Deployment Strategy

Performance Metrics

Load Test Results (Artillery)

High Availability Metrics

Reliability Target

Security

4-Layer Defense-in-Depth

Access Control

Monitoring

CloudWatch Dashboard

Alarms Configured (6 Total)

Health Checks

Deliverables

Challenges & Solutions

Challenge 1: NAT Gateway Misconfiguration

Challenge 2: Network ACL Blocking Cross-VPC Traffic

Challenge 3: Health Check Path Wrong

Cost Analysis

6-Day Actual Costs

Monthly Projections

Top Cost Drivers

Cost Per User

Future Improvements

Security Enhancements

Infrastructure as Code

Performance Optimization

Containerization

Getting Started

Prerequisites

Repository Structure

Deployment Steps

Contributing

Project Metrics Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages