Skip to content

Load Balancing

Thomas Mangin edited this page Nov 13, 2025 · 6 revisions

Load Balancing with ExaBGP

Dynamic BGP-based traffic distribution without hardware load balancers

βš–οΈ BGP-based traffic distribution - automatic failover and equal-cost multi-path routing

⚠️ Important Limitation: BGP provides equal distribution (ECMP) or primary/backup (MED). It does NOT provide weighted/proportional distribution. For weighted load balancing, use Layer 7 load balancers (HAProxy/NGINX).


Table of Contents


Overview

Load balancing with ExaBGP eliminates the need for expensive hardware load balancers by using BGP to distribute traffic across backend servers.

Traditional Load Balancing Problem

Hardware load balancer approach:

Internet β†’ [Hardware LB] β†’ Backend Servers
              (SPOF)
              (Expensive)
              (Vendor lock-in)

Issues:

  • Single point of failure
  • Expensive hardware ($10K-$100K+)
  • Vendor lock-in
  • Limited scalability
  • Manual configuration

ExaBGP Load Balancing Solution

BGP-based approach:

Internet β†’ [Network (ECMP)] β†’ Backend Servers
              (Distributed)       (ExaBGP announces routes)
              (No SPOF)           (Health-aware)

Benefits:

  • No single point of failure
  • Open source (zero licensing cost)
  • Vendor-neutral
  • Unlimited horizontal scaling
  • Application-aware (real-time metrics)
  • Dynamic (automatic failover)

Load Balancing Strategies

1. Equal Distribution (ECMP)

All servers announce same route with equal cost:

Server 1 β†’ announces 100.10.0.100/32 β†’ receives 33% traffic
Server 2 β†’ announces 100.10.0.100/32 β†’ receives 33% traffic
Server 3 β†’ announces 100.10.0.100/32 β†’ receives 34% traffic

Use case: Identical servers with equal capacity


2. Primary/Backup with MED

⚠️ Important: MED does NOT provide proportional distribution

MED (Multi-Exit Discriminator) affects BGP route selection but does NOT distribute traffic proportionally:

  • Lower MED = preferred path
  • If one route has lower MED, it receives ALL traffic (not "more" traffic)
  • ECMP (equal distribution) only works when routes have equal cost AFTER considering MED

MED is for primary/backup, not weighted load balancing.

MED for primary/backup failover:

Primary server    β†’ MED 100 β†’ receives ALL traffic (preferred)
Backup server     β†’ MED 200 β†’ receives NO traffic (unless primary fails)

Use case: Active/standby configuration with automatic failover


3. Active-Passive Failover (MED-based)

One primary, one or more backups:

Server 1: MED 100 β†’ Active (receives all traffic)
Server 2: MED 200 β†’ Standby (receives no traffic unless Server 1 fails)
Server 3: MED 300 β†’ Standby (receives no traffic unless Server 1 & 2 fail)

Use case: Traditional active-passive HA with priority ordering


4. Multi-Service Distribution

Different metrics per service IP:

Server 1:
  - Service A (100.10.0.10) β†’ MED 100 (primary)
  - Service B (100.10.0.20) β†’ MED 150 (backup)

Server 2:
  - Service A (100.10.0.10) β†’ MED 150 (backup)
  - Service B (100.10.0.20) β†’ MED 100 (primary)

Result: Service A primarily on Server 1, Service B primarily on Server 2

Use case: Load distribution across multiple services


Architecture Patterns

Pattern 1: Direct ECMP

Simple, flat architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Internet / Clients              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ Edge Router   β”‚ ← Receives routes from all servers
         β”‚ (ECMP enabled)β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β–Ό           β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Server 1β”‚ β”‚ Server 2β”‚ β”‚ Server 3β”‚
β”‚ ExaBGP  β”‚ β”‚ ExaBGP  β”‚ β”‚ ExaBGP  β”‚
β”‚ 100.10. β”‚ β”‚ 100.10. β”‚ β”‚ 100.10. β”‚
β”‚ 0.100   β”‚ β”‚ 0.100   β”‚ β”‚ 0.100   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Characteristics:

  • Direct BGP peering to edge router
  • ECMP distributes traffic equally
  • Per-flow load balancing (same source β†’ same server)
  • Simple configuration

Configuration:

# Each server runs identical ExaBGP config
neighbor 192.168.1.1 {
    router-id 192.168.1.10;
    local-address 192.168.1.10;
    local-as 65001;
    peer-as 65000;

    family {
        ipv4 unicast;
    }

    api {
        processes [ load-balancer ];
    }
}

process load-balancer {
    run /etc/exabgp/lb-health.py;
    encoder text;
}

Pattern 2: Route Server (Scalable)

For large deployments:

                  Internet
                     β”‚
                     β–Ό
             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚ Edge Routers  β”‚
             β”‚ (100s-1000s)  β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚ Route         β”‚ ← BIRD/FRRouting route reflectors
             β”‚ Reflectors    β”‚    Select best paths
             β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β–Ό           β–Ό           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Server 1β”‚ β”‚ Server 2β”‚ β”‚ Server 3β”‚
    β”‚ ExaBGP  β”‚ β”‚ ExaBGP  β”‚ β”‚ ExaBGP  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benefits:

  • Scales to thousands of servers
  • Centralized policy enforcement
  • Reduced BGP session overhead
  • Clean separation of concerns

Pattern 3: Multi-Tier Load Balancing

Four-tier architecture (Vincent Bernat pattern):

Tier 0: DNS (Geographic distribution)
           β”‚
           β–Ό
Tier 1: BGP + ExaBGP + ECMP (L3 distribution)
           β”‚
           β–Ό
Tier 2: IPVS (L4 consistent hashing)
           β”‚
           β–Ό
Tier 3: HAProxy (L7 application routing)
           β”‚
           β–Ό
       Backend Servers

Each tier serves different purpose:

  1. Tier 0 (DNS): Geographic load balancing
  2. Tier 1 (ExaBGP + ECMP): Network-level distribution
  3. Tier 2 (IPVS): Consistent hashing L4 (minimizes connection disruption)
  4. Tier 3 (HAProxy): Application-level routing (host headers, paths, etc.)

ExaBGP's role: Announce load balancer IPs to enable ECMP distribution


MED-Based Distribution

What is MED?

MED (Multi-Exit Discriminator) is a BGP attribute that influences path selection.

  • Lower MED = Preferred path (receives more traffic)
  • Higher MED = Less preferred (receives less traffic)
  • MED compared only among routes from same AS

Basic MED Example

Three servers with different capacities:

#!/usr/bin/env python3
"""
MED-based load distribution
Different servers announce with different metrics
"""
import sys
import time

SERVICE_IP = "100.10.0.100"

# Server capacity configuration
# High-end server: MED 50
# Mid-range server: MED 100
# Low-end server: MED 150
SERVER_MED = 100  # Set per server

time.sleep(2)

while True:
    sys.stdout.write(
        f"announce route {SERVICE_IP}/32 next-hop self med {SERVER_MED}\n"
    )
    sys.stdout.flush()

    time.sleep(30)  # Refresh every 30 seconds

Result: High-end server (MED 50) receives most traffic


Dynamic MED Based on Load

Adjust MED based on real-time CPU load:

#!/usr/bin/env python3
"""
Dynamic load-based traffic distribution
Higher CPU usage β†’ Higher MED β†’ Less traffic
"""
import sys
import time
import psutil

SERVICE_IP = "100.10.0.100"
BASE_MED = 100

def calculate_med():
    """Calculate MED based on current system load"""
    # Get CPU usage
    cpu_percent = psutil.cpu_percent(interval=1)

    # Get memory usage
    mem = psutil.virtual_memory()
    mem_percent = mem.percent

    # Get connection count
    connections = len(psutil.net_connections(kind='inet'))

    # Calculate load factor
    # CPU: 0-100% β†’ 0-100 points
    # Memory: 0-100% β†’ 0-50 points
    # Connections: 0-10000 β†’ 0-50 points
    load_factor = int(
        cpu_percent +
        (mem_percent * 0.5) +
        (min(connections, 10000) / 10000 * 50)
    )

    # MED = BASE_MED + load_factor
    # Low load: MED ~100
    # High load: MED ~300
    med = BASE_MED + load_factor

    return med

time.sleep(2)
sys.stderr.write("[LOAD-BALANCER] Dynamic load balancer started\n")

while True:
    med = calculate_med()

    sys.stdout.write(
        f"announce route {SERVICE_IP}/32 next-hop self med {med}\n"
    )
    sys.stdout.flush()

    sys.stderr.write(f"[LOAD] Announced with MED={med}\n")

    # Update every 30 seconds
    time.sleep(30)

How it works:

Server 1: 30% CPU β†’ MED 130 β†’ Lower metric β†’ More traffic βœ“
Server 2: 60% CPU β†’ MED 160 β†’ Medium metric β†’ Medium traffic
Server 3: 90% CPU β†’ MED 190 β†’ Higher metric β†’ Less traffic

Result: Traffic automatically distributed based on available capacity


Multi-Service Load Distribution

Distribute different services across servers:

#!/usr/bin/env python3
"""
Multi-service load distribution
Each server is primary for different service IPs
"""
import sys
import time

# Service IP configuration
# Each server has different primary service
SERVICES = [
    ("100.10.0.10", 100),  # Web service
    ("100.10.0.20", 150),  # API service
    ("100.10.0.30", 200),  # Database read replicas
]

# On Server 1: Web primary (100), API backup (150), DB backup (200)
# On Server 2: API primary (100), DB backup (150), Web backup (200)
# On Server 3: DB primary (100), Web backup (150), API backup (200)

time.sleep(2)

while True:
    for service_ip, med in SERVICES:
        sys.stdout.write(
            f"announce route {service_ip}/32 next-hop self med {med}\n"
        )

    sys.stdout.flush()
    time.sleep(30)

Result: Even load distribution across servers


ECMP Load Balancing

What is ECMP?

ECMP (Equal-Cost Multi-Path) allows routers to distribute traffic across multiple equal-cost paths.

How ECMP Works

1. Multiple servers announce same route:

Server 1 β†’ announce 100.10.0.100/32
Server 2 β†’ announce 100.10.0.100/32
Server 3 β†’ announce 100.10.0.100/32

2. Router sees 3 equal-cost paths:

Router RIB:
100.10.0.100/32 via 192.168.1.10 (Server 1)
                via 192.168.1.11 (Server 2)
                via 192.168.1.12 (Server 3)

3. Router distributes traffic:

Flow hashing (src IP, dst IP, src port, dst port, protocol)
β†’ Hash determines which path
β†’ Same flow always goes to same server (connection persistence)

Enable ECMP on Routers

Cisco IOS-XR:

router bgp 65000
 address-family ipv4 unicast
  maximum-paths ibgp 8
  maximum-paths ebgp 8
 !
!

Juniper Junos:

protocols {
    bgp {
        group servers {
            multipath;
        }
    }
}

Arista EOS:

router bgp 65000
   maximum-paths 8

ECMP Load Balancing Script

Simple health-check based announcement:

#!/usr/bin/env python3
"""
ECMP load balancing with health checks
All healthy servers announce same route
"""
import sys
import time
import socket

SERVICE_IP = "100.10.0.100"
SERVICE_PORT = 80
CHECK_INTERVAL = 5

def is_healthy():
    """Check if local service is healthy"""
    try:
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(2)
        result = sock.connect_ex(('127.0.0.1', SERVICE_PORT))
        sock.close()
        return result == 0
    except:
        return False

time.sleep(2)
announced = False

sys.stderr.write("[ECMP] Load balancer started\n")

while True:
    healthy = is_healthy()

    if healthy and not announced:
        # Service healthy, announce route
        sys.stdout.write(
            f"announce route {SERVICE_IP}/32 next-hop self\n"
        )
        sys.stdout.flush()
        sys.stderr.write(f"[ECMP] Service healthy, announcing route\n")
        announced = True

    elif not healthy and announced:
        # Service failed, withdraw route
        sys.stdout.write(
            f"withdraw route {SERVICE_IP}/32 next-hop self\n"
        )
        sys.stdout.flush()
        sys.stderr.write(f"[ECMP] Service failed, withdrawing route\n")
        announced = False

    time.sleep(CHECK_INTERVAL)

Proportional Load Distribution (NOT Possible with BGP Alone)

⚠️ Critical: BGP Cannot Do Weighted/Proportional Traffic Distribution

Reality:

  • BGP + ECMP provides equal distribution across announced routes (flow-based hashing)
  • There is NO way to make one server receive "twice as much traffic" as another via BGP
  • MED does NOT provide proportional distribution (it's for primary/backup selection)

For proportional/weighted load balancing, use:

  • Layer 7 Load Balancer (HAProxy, NGINX) with weighted backends
  • DNS-based weighted round-robin (limited, client-side caching issues)
  • Multi-tier architecture: ExaBGP β†’ L4 load balancers β†’ Layer 7 weighted distribution

Reality Check: Heterogeneous Server Pools

Problem:

Server 1: 64 GB RAM, 16 CPU cores (high-capacity)
Server 2: 32 GB RAM,  8 CPU cores (medium-capacity)
Server 3: 16 GB RAM,  4 CPU cores (low-capacity)

What you CANNOT do:

  • Make Server 1 receive 4x traffic of Server 3 via BGP
  • Proportionally distribute traffic based on capacity
  • Adjust traffic percentage dynamically

What you CAN do:

Option 1: ECMP with Load-Based Withdrawal

#!/usr/bin/env python3
"""
Binary health check - withdraw when overloaded
Each server announces same route, ECMP distributes equally
Overloaded servers withdraw to prevent failure
"""
import sys
import time
import psutil

SERVICE_IP = "100.10.0.100"

def is_overloaded():
    cpu = psutil.cpu_percent(interval=1)
    # Low-capacity server: withdraw at 80% CPU
    # High-capacity server: withdraw at 95% CPU
    threshold = 80  # Adjust per server capacity
    return cpu > threshold

announced = False
time.sleep(2)

while True:
    if is_overloaded():
        if announced:
            sys.stdout.write(f"withdraw route {SERVICE_IP}/32\n")
            sys.stdout.flush()
            announced = False
    else:
        if not announced:
            sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop 192.0.2.1\n")
            sys.stdout.flush()
            announced = True

    time.sleep(5)

Result: Equal distribution, but overloaded servers drop out

Option 2: Multiple Service IPs

Announce 3 different service IPs, assign to servers based on capacity:
- 100.10.0.10 β†’ All 3 servers announce (ECMP: equal split)
- 100.10.0.11 β†’ Only Server 1 announces (100% to Server 1)
- 100.10.0.12 β†’ Only Server 1 announces (100% to Server 1)

Client-side uses all 3 IPs (e.g., DNS returns all 3)
Rough approximation: Server 1 gets ~66%, others ~33% combined

Option 3: Multi-Tier with Layer 7

ExaBGP (BGP layer)
    ↓ ECMP (equal distribution)
HAProxy/NGINX Tier (multiple instances)
    ↓ Weighted backends (2:1:1 ratio)
Backend Servers (heterogeneous capacity)

This is the correct architecture for proportional distribution.


Multi-Tier Load Balancing

Vincent Bernat's Four-Tier Architecture

Production pattern for hyperscale deployments:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tier 0: DNS (GeoDNS)                 β”‚ ← Geographic distribution
β”‚ Returns nearest datacenter IP        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tier 1: BGP + ExaBGP + ECMP          β”‚ ← Network-level distribution
β”‚ Edge routers use ECMP                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tier 2: IPVS (L4 LB)                 β”‚ ← Consistent hashing
β”‚ Maglev scheduling minimizes          β”‚    (connection persistence)
β”‚ connection disruption                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tier 3: HAProxy (L7 LB)              β”‚ ← Application routing
β”‚ Host headers, URL paths, SSL term    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
         Backend Servers

ExaBGP Configuration for Multi-Tier

Tier 1 Load Balancer Config:

# /etc/exabgp/multi-tier-lb.conf
neighbor 192.168.1.1 {
    router-id 192.168.1.10;
    local-address 192.168.1.10;
    local-as 65001;
    peer-as 65000;

    family {
        ipv4 unicast;
        ipv6 unicast;
    }

    api {
        processes [ tier1-announcer ];
    }
}

process tier1-announcer {
    run /etc/exabgp/tier1-announcer.py;
    encoder text;
}

Health Check Script:

#!/usr/bin/env python3
"""
Multi-tier load balancer health check
Announces loopback IPs when IPVS/HAProxy are ready
"""
import sys
import os
import time
import netifaces

READY_FILE = '/etc/lb/v6-ready'
DISABLE_FILE = '/etc/lb/disable'
LOOPBACK_INTERFACE = 'lo'
CHECK_INTERVAL = 5

def get_loopback_ips():
    """Get all IPs configured on loopback interface"""
    addrs = netifaces.ifaddresses(LOOPBACK_INTERFACE)
    ips = []

    # IPv4 addresses
    if netifaces.AF_INET in addrs:
        ips.extend([a['addr'] for a in addrs[netifaces.AF_INET]])

    # IPv6 addresses
    if netifaces.AF_INET6 in addrs:
        ips.extend([a['addr'].split('%')[0] for a in addrs[netifaces.AF_INET6]])

    return [ip for ip in ips if not ip.startswith('127.') and ip != '::1']

def is_service_ready():
    """Check if service should announce routes"""
    return os.path.exists(READY_FILE) and not os.path.exists(DISABLE_FILE)

def check_ipvs_healthy():
    """Check IPVS is running and has healthy backends"""
    try:
        import subprocess
        result = subprocess.run(
            ['ipvsadm', '-L', '-n'],
            capture_output=True,
            timeout=2
        )
        # Parse output to check for active destinations
        return result.returncode == 0 and b'ActiveConn' in result.stdout
    except:
        return False

def check_haproxy_healthy():
    """Check HAProxy has healthy backends"""
    try:
        import socket
        s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        s.connect('/var/run/haproxy.sock')
        s.send(b'show stat\n')
        stats = s.recv(8192).decode()
        s.close()

        # Check for UP backends
        return 'UP' in stats
    except:
        return False

# Get service IPs to announce
service_ips = get_loopback_ips()

time.sleep(2)
sys.stderr.write(f"[TIER1] Multi-tier LB started, monitoring {len(service_ips)} IPs\n")

announced = False

while True:
    # All checks must pass
    ready = (
        is_service_ready() and
        check_ipvs_healthy() and
        check_haproxy_healthy()
    )

    if ready and not announced:
        # Announce all service IPs
        for ip in service_ips:
            if ':' in ip:
                # IPv6
                sys.stdout.write(f'announce route {ip}/128 next-hop self\n')
            else:
                # IPv4
                sys.stdout.write(f'announce route {ip}/32 next-hop self\n')
        sys.stdout.flush()
        sys.stderr.write(f"[TIER1] Services healthy, announced {len(service_ips)} routes\n")
        announced = True

    elif not ready and announced:
        # Withdraw all service IPs
        for ip in service_ips:
            if ':' in ip:
                sys.stdout.write(f'withdraw route {ip}/128\n')
            else:
                sys.stdout.write(f'withdraw route {ip}/32\n')
        sys.stdout.flush()
        sys.stderr.write(f"[TIER1] Services unhealthy, withdrew routes\n")
        announced = False

    time.sleep(CHECK_INTERVAL)

Maintenance workflow:

# Enter maintenance mode (gradual traffic drain)
touch /etc/lb/disable
# Wait for connections to drain (~60 seconds)
sleep 60

# Perform maintenance
systemctl restart ipvsadm
systemctl restart haproxy

# Exit maintenance mode
rm /etc/lb/disable

Health Check Integration

Comprehensive Health Checks

Check all dependencies before announcing:

#!/usr/bin/env python3
"""
Comprehensive health checking for load balancing
Checks web server, database, cache, disk, memory
"""
import sys
import time
import socket
import urllib.request
import psycopg2
import redis

SERVICE_IP = "100.10.0.100"
CHECK_INTERVAL = 5

def check_web_server():
    """Check web server responds"""
    try:
        response = urllib.request.urlopen('http://127.0.0.1/health', timeout=2)
        return response.getcode() == 200
    except:
        return False

def check_database():
    """Check database is accessible"""
    try:
        conn = psycopg2.connect(
            host='127.0.0.1',
            database='mydb',
            user='monitor',
            password='secret',
            connect_timeout=2
        )
        cursor = conn.cursor()
        cursor.execute('SELECT 1')
        result = cursor.fetchone()
        conn.close()
        return result[0] == 1
    except:
        return False

def check_redis():
    """Check Redis is accessible"""
    try:
        r = redis.Redis(host='127.0.0.1', port=6379, socket_timeout=2)
        return r.ping()
    except:
        return False

def check_system_resources():
    """Check disk and memory"""
    import shutil
    import psutil

    # Check disk space (at least 10% free)
    stat = shutil.disk_usage('/')
    free_percent = (stat.free / stat.total) * 100
    if free_percent < 10:
        return False

    # Check memory (at least 1 GB free)
    mem = psutil.virtual_memory()
    if mem.available < 1024 * 1024 * 1024:
        return False

    return True

def comprehensive_health_check():
    """Run all health checks"""
    checks = {
        'web': check_web_server(),
        'database': check_database(),
        'redis': check_redis(),
        'resources': check_system_resources(),
    }

    # Log individual check results
    for name, result in checks.items():
        status = "OK" if result else "FAIL"
        sys.stderr.write(f"[HEALTH] {name}: {status}\n")

    # All checks must pass
    return all(checks.values())

time.sleep(2)
announced = False

while True:
    healthy = comprehensive_health_check()

    if healthy and not announced:
        sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
        sys.stdout.flush()
        sys.stderr.write("[HEALTH] All checks passed, announcing route\n")
        announced = True

    elif not healthy and announced:
        sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
        sys.stdout.flush()
        sys.stderr.write("[HEALTH] Health checks failed, withdrawing route\n")
        announced = False

    time.sleep(CHECK_INTERVAL)

Implementation Examples

Complete ECMP Setup

Step 1: Configure loopback IP on all servers

# Add service IP to loopback
ip addr add 100.10.0.100/32 dev lo

Step 2: ExaBGP configuration

# /etc/exabgp/lb.conf
neighbor 192.168.1.1 {
    router-id 192.168.1.10;
    local-address 192.168.1.10;
    local-as 65001;
    peer-as 65000;

    family {
        ipv4 unicast;
    }

    api {
        processes [ lb-health ];
    }
}

process lb-health {
    run /etc/exabgp/lb-health.py;
    encoder text;
}

Step 3: Health check script

#!/usr/bin/env python3
import sys
import time
import socket

SERVICE_IP = "100.10.0.100"

def is_healthy():
    try:
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(2)
        result = sock.connect_ex(('127.0.0.1', 80))
        sock.close()
        return result == 0
    except:
        return False

time.sleep(2)
announced = False

while True:
    if is_healthy() and not announced:
        sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
        sys.stdout.flush()
        announced = True
    elif not is_healthy() and announced:
        sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
        sys.stdout.flush()
        announced = False

    time.sleep(5)

Step 4: Start ExaBGP

exabgp /etc/exabgp/lb.conf

Step 5: Enable ECMP on router

router bgp 65000
 maximum-paths 8

Step 6: Verify

# Check routes on router
show ip bgp 100.10.0.100

# Should see multiple paths

Best Practices

1. Configure Service IPs on Loopback

Always use loopback interface:

# Correct
ip addr add 100.10.0.100/32 dev lo

# Wrong (don't use physical interface)
# ip addr add 100.10.0.100/24 dev eth0

Why: Loopback IPs don't fail when interface goes down


2. Enable ECMP on All Routers

# Cisco
router bgp 65000
 maximum-paths ibgp 8
 maximum-paths ebgp 8

# Juniper
set protocols bgp group servers multipath

# Arista
router bgp 65000
   maximum-paths 8

3. Implement Health Check Dampening

Prevent route flapping:

RISE_THRESHOLD = 3  # 3 consecutive successes to announce
FALL_THRESHOLD = 2  # 2 consecutive failures to withdraw

rise_count = 0
fall_count = 0

while True:
    healthy = check_health()

    if healthy:
        rise_count += 1
        fall_count = 0
        if rise_count >= RISE_THRESHOLD and not announced:
            announce_route()
    else:
        fall_count += 1
        rise_count = 0
        if fall_count >= FALL_THRESHOLD and announced:
            withdraw_route()

4. Monitor BGP Sessions

import subprocess

def check_bgp_session():
    """Verify BGP session is established"""
    result = subprocess.run(
        ['exabgpcli', 'show', 'neighbor', 'summary'],
        capture_output=True
    )
    return b'Established' in result.stdout

5. Log All Route Changes

import logging

logging.basicConfig(
    filename='/var/log/exabgp-lb.log',
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(message)s'
)

def announce_route():
    sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
    sys.stdout.flush()
    logging.info(f"ANNOUNCE: {SERVICE_IP}")

Monitoring and Metrics

Key Metrics

1. Load Distribution:

  • Requests per server
  • Bandwidth per server
  • Connection count per server

2. Health Checks:

  • Success rate
  • Latency
  • Consecutive failures

3. BGP State:

  • Session status
  • Routes announced
  • Routes withdrawn

4. System Metrics:

  • CPU usage
  • Memory usage
  • Network throughput

Prometheus Integration

#!/usr/bin/env python3
from prometheus_client import start_http_server, Gauge, Counter

# Metrics
route_announced = Gauge('lb_route_announced', 'Route announcement status')
health_check_status = Gauge('lb_health_check_status', 'Health check result')
health_checks_total = Counter('lb_health_checks_total', 'Health checks', ['result'])
route_changes_total = Counter('lb_route_changes_total', 'Route changes', ['action'])

# Start metrics server
start_http_server(9100)

while True:
    healthy = check_health()

    health_check_status.set(1 if healthy else 0)
    health_checks_total.labels(result='success' if healthy else 'failure').inc()

    if healthy and not announced:
        announce_route()
        route_announced.set(1)
        route_changes_total.labels(action='announce').inc()
        announced = True

    elif not healthy and announced:
        withdraw_route()
        route_announced.set(0)
        route_changes_total.labels(action='withdraw').inc()
        announced = False

Troubleshooting

Issue 1: Uneven Traffic Distribution

Symptoms: One server receives all traffic despite ECMP

Check:

# Verify ECMP enabled
show ip bgp 100.10.0.100
# Should show "multipath" or multiple paths

# Check routing table
show ip route 100.10.0.100
# Should show multiple next-hops

Solutions:

# Enable ECMP
router bgp 65000
 maximum-paths 8

# Verify BGP best path selection
show ip bgp 100.10.0.100 bestpath

Issue 2: Route Flapping

Symptoms: Routes repeatedly announced/withdrawn

Diagnosis:

# Monitor BGP updates
show ip bgp neighbors 192.168.1.10 | include Last

Solutions:

  • Implement rise/fall thresholds
  • Increase health check interval
  • Add retry logic
  • Fix unstable service

Issue 3: Slow Failover

Symptoms: Traffic continues to failed server

Check:

# Check BGP timers
show ip bgp neighbors 192.168.1.10

# Check health check frequency
tail -f /var/log/exabgp.log

Solutions:

  • Reduce health check interval (5s recommended)
  • Tune BGP timers (keepalive 10s, hold 30s)
  • Enable BFD for fast failure detection

Next Steps

Learn More

Operations

Configuration


Ready to implement load balancing? See Quick Start β†’


πŸ‘» Ghost written by Claude (Anthropic AI)

Clone this wiki locally