Skip to content

AttaKenn/jenkins-auto-scaling-program

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Jenkins Auto-Scaling Script Documentation

Overview

This script automates scaling Jenkins build agents on AWS EC2 based on build queue length, wait times, and ongoing builds.

It is managed by a Linux cron service. The cron job is scheduled to run every minute on the Jenkins Controller instance.

Update: Slack notifications have been limited to only Critical System Failures and also rate limited to a single message per hour per error type to prevent Alert Fatigue.

Features

1. Queue-based Scaling

  • Monitors Jenkins build queue length
  • Triggers scaling based on queue length, starts:
    • agent 1 when queue length > 0
    • agent 2 when queue length > 3
    • agent 3 when queue length > 6
    • agent 4 when queue length > 9
    • agent 5 when queue length > 12

2. Wait Time-based Scaling

  • Monitors how long jobs have been waiting, starts
    • agent 1 when there's a build in the queue
    • agent 2 when wait time > 2 mins
    • agent 3 when wait time > 5 mins
    • agent 4 when wait time > 8 mins
    • agent 5 when wait time > 11 mins

3. Intelligent Agent Management

  • Monitors 5 EC2 instances as Jenkins agents
  • Tracks agent states:
    • EC2 instance status (running/stopped)
    • Jenkins connection status (connected/disconnected)
    • Build activity (idle/busy)
  • Automatically enables agents if they're temporarily offline

4. Graceful Shutdown Process

  • Checks for running builds before shutdown
  • Two shutdown modes:
    • Immediate: If no active builds
    • Graceful: Waits for builds to complete
  • Force shutdown after MAX_GRACEFUL_SHUTDOWN_MINUTES (60 minutes)

5. Cooldown Period

  • Implements COOLDOWN_MINUTES (3 minutes) between scaling actions
  • Prevents rapid start/stop cycles
  • Still allows shutdown checks during cooldown

Common Scenarios

Scenario 1: High Build Load

Condition: 
- Queue length > 5 builds OR
- Wait time > 2 minutes
Action:
- Starts 3 agents if stopped
- Enables agents if temporarily offline

Scenario 2: Moderate Build Load

Condition:
- Queue length > 0 but ≤ 3 OR
- Wait time < 2 minutes
Action:
- Starts one agent (if stopped)
- Keeps second agent stopped unless needed

Scenario 3: Low/No Build Load

Condition:
- Queue length = 0 AND
- No ongoing builds
Action:
- Initiates shutdown process for idle running agents
- Checks for running builds before shutdown

Scenario 4: Agent with Running Builds

Condition:
- Scale down triggered but builds running
Action:
- Marks agent as offline (no new builds) if there are no builds in the queue
- Creates pending shutdown record
- Monitors build completion
- Forces shutdown after 60 minutes if builds don't complete and sends a slack notification

Scenario 5: Agent Connection Issues

Condition:
- Agent temporarily offline after startup
Action:
- Detects offline status
- Attempts to enable agent

Monitoring and Notifications

Slack Notifications

  • Critical System Error conditions
  • Force Shutdowns

When a new build joins the queue.

New build in Queue

When there are no build queues.

No builds

Logging

  • Detailed logs in /var/log/jenkins-autoscale.log
  • Queue length
  • Build status
  • Agent states
  • Error conditions

Note: Later, fluentbit will be configured to send logs to our Elasticsearch Database.

Error Handling

  • Jenkins API connection failures
  • AWS API errors
  • Agent connection issues
  • Invalid JSON responses
  • Failed scaling actions

Dependencies

  • AWS CLI configured with right permissions.
  • jq for JSON parsing
  • Curl for API requests
  • Jenkins API access
  • Slack webhook for notifications

About

Intelligent Program for Autoscaling Jenkins Agents (EC2 instances)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages