Flight Control uses a Redis-compatible key-value store for two primary purposes: caching external configuration data and managing an event-driven task queue. This document describes both use cases and the resilience mechanisms that ensure system reliability.
Flight Control uses different KV store implementations based on the platform:
- Implementation: Redis 7
- CLI Tools:
redis-cli - Binary:
redis-server - Configuration Path:
/etc/redis/redis.conf
- Implementation: Valkey 8 (Redis-compatible)
- CLI Tools:
valkey-cli - Binary:
valkey-server - Configuration Path:
/etc/valkey/valkey.conf
All implementations maintain full API and protocol compatibility.
The key-value store serves as:
- Cache Layer: Stores external configuration data (Git repositories, HTTP endpoints, Kubernetes secrets)
- Event Queue: Manages asynchronous task processing through Redis Streams
- Resilience Backend: Provides automatic recovery from failures
Flight Control caches external configuration sources to improve performance and reduce load on external systems. The cache is organized by organization, fleet, and template version to ensure proper isolation.
| Data Type | Key Pattern | Description |
|---|---|---|
| Git Repository URLs | v1/{orgId}/{fleet}/{templateVersion}/repo-url/{repository} |
Repository URL mappings |
| Git Revisions | v1/{orgId}/{fleet}/{templateVersion}/git-hash/{repository}/{targetRevision} |
Git commit hashes for specific revisions |
| Git File Contents | v1/{orgId}/{fleet}/{templateVersion}/git-data/{repository}/{targetRevision}/{path} |
Actual file contents from Git repositories |
| Kubernetes Secrets | v1/{orgId}/{fleet}/{templateVersion}/k8ssecret-data/{namespace}/{name} |
Secret data from Kubernetes clusters |
| HTTP Response Data | v1/{orgId}/{fleet}/{templateVersion}/http-data/{md5(url)} |
Content fetched from HTTP endpoints |
- Cache Keys: Automatically scoped by organization, fleet, and template version
- Cache Invalidation: Keys are deleted when template versions change
- Cache Miss Handling: External sources are fetched on-demand when cache misses occur
- Atomic Operations: Uses a custom Lua script to implement get-or-set-if-not-exists behavior, preventing race conditions during concurrent access
- Before cache deletion: Some devices get
value1from cache - After cache deletion: Other devices get
value2from fresh fetch - Result: Inconsistent device configurations across the fleet
Best Practice: Always update branch names, tags, or URLs when changing external configuration content to ensure cache consistency.
Flight Control uses Redis Streams with consumer groups to process events asynchronously. Events are published to the task-queue stream and processed by worker components.
flowchart TD
A[API Event Created] --> B[Event Published to task-queue]
B --> C[Worker Consumes Event]
C --> D{Event Type Analysis}
D --> E[Fleet Rollout Task]
D --> F[Fleet Selector Matching Task]
D --> G[Fleet Validation Task]
D --> H[Device Render Task]
D --> I[Repository Update Task]
E --> J[Task Completion]
F --> J
G --> J
H --> J
I --> J
J --> K[Event Acknowledged]
K --> L[Checkpoint Advanced]
| Task | Triggering Events | Description |
|---|---|---|
| Fleet Rollout | • Device owner/labels updated • Device created • Fleet rollout batch dispatched • Fleet rollout started (immediate strategy) |
Manages device configuration updates according to fleet templates |
| Fleet Selector Matching | • Fleet label selector updated • Fleet created/deleted • Device created • Device labels updated |
Matches devices to fleets based on label selectors |
| Fleet Validation | • Fleet template updated • Fleet created • Referenced repository updated |
Validates fleet templates and creates template versions |
| Device Render | • Device spec updated • Device created • Fleet rollout device selected • Referenced repository updated |
Renders device configurations from templates |
| Repository Update | • Repository spec updated • Repository created |
Updates repository references and invalidates related caches |
- Consumer Groups: Automatic message tracking and load balancing
- Message Acknowledgment: Messages are acknowledged after successful processing
- Timeout Handling: Messages that exceed processing timeout are automatically retried
- Failed Message Handling: Failures are retried with exponential backoff until a maximum number of retries, after which an event is emitted notifying about a permanent failure
- Checkpoint Tracking: Global checkpoint ensures no message loss during failures
Flight Control implements a dual-persistence architecture to ensure no event loss during Redis failures (at-least-once delivery; duplicate processing possible):
- Redis Streams: Primary queue for fast event processing
- PostgreSQL Database: Persistent storage for events and checkpoints
- Recovery Mechanism: Automatic event republishing from database
When Redis fails or is restarted:
- Checkpoint Detection: System detects missing Redis checkpoint
- Database Checkpoint Retrieval: Last known checkpoint is retrieved from PostgreSQL
- Event Republishing: All events since the last checkpoint are republished to Redis
- Queue Restoration: Fresh Redis instance receives all missed events
- Normal Operation: Processing resumes; events since the checkpoint may be reprocessed. Handlers must be idempotent.
Note: The replay window equals “now - last persisted checkpoint”. Increase checkpoint persistence frequency to shorten replay/duplication.
- Events: Automatically republished from PostgreSQL database
- Cache Data: Must be re-fetched from external sources (Git, HTTP, Kubernetes)
- Cache Invalidation: Occurs automatically when template versions change
Flight Control uses Redis as an in-memory store, which requires careful memory management to prevent unbounded growth and ensure system stability.
Redis memory usage is controlled by two key parameters:
| Parameter | Description | Default | Tuning Guidance |
|---|---|---|---|
| maxmemory | Total memory limit for Redis | 1gb |
Set to 70-80% of available container memory |
| maxmemory-policy | Eviction policy when limit reached | allkeys-lru |
See policy recommendations below |
Choose the appropriate eviction policy based on your use case:
| Policy | Description | Use Case | Recommendation |
|---|---|---|---|
| allkeys-lru | Evict least recently used keys | General caching (default) | ✅ Recommended for most deployments |
| allkeys-lfu | Evict least frequently used keys | Long-running caches | Good for stable workloads |
| volatile-lru | Evict LRU keys with expiration | Mixed cache/queue data | Use if some keys have TTL |
| noeviction | Return errors when limit reached | Critical data preservation | ❌ Not recommended - causes failures |
Understanding Redis memory usage helps with proper sizing:
- Git repository contents: Large files, multiple versions
- HTTP response data: External API responses
- Kubernetes secrets: Configuration data
- Template rendering results: Processed configurations
- Task queue messages: Event processing data
- Failed message retry queue: Exponential backoff storage
- In-flight task tracking: Processing state management
# Set environment variables before starting containers
export REDIS_MAXMEMORY="2gb"
export REDIS_MAXMEMORY_POLICY="allkeys-lru"
export REDIS_LOGLEVEL="warning"- KV store memory usage:
- RHEL9:
redis-cli INFO memory - RHEL10:
valkey-cli INFO memory
- RHEL9:
- Evicted keys count:
- RHEL9:
redis-cli INFO stats | grep evicted - RHEL10:
valkey-cli INFO stats | grep evicted
- RHEL9:
- Cache hit ratio: Monitor cache effectiveness
- Queue depth: Monitor task processing backlog
Increase memory if:
- High eviction rates (keys being removed frequently)
- Cache hit ratio below 80%
- Queue processing delays due to memory pressure
Decrease memory if:
- Memory usage consistently below 50%
- System has memory constraints
- Other services need more memory
Recommended Redis Memory =
(Available Container Memory × 0.75) - 200MB
Where:
0.75= 75% of container memory for Redis200MB= Buffer for Redis overhead and OS
# values.yaml
kv:
enabled: true
maxmemory: "2gb"
maxmemoryPolicy: "allkeys-lru"
loglevel: "warning"
resources:
requests:
memory: "2.5Gi" # Container memory should be > maxmemory
cpu: "1000m"# flightctl-kv.container
[Container]
Environment=REDIS_MAXMEMORY=2gb
Environment=REDIS_MAXMEMORY_POLICY=allkeys-lru
Environment=REDIS_LOGLEVEL=warningProblem: Redis running out of memory
Error: OOM command not allowed when used memory > 'maxmemory'
Solution: Increase maxmemory or improve eviction policy
Problem: High eviction rates
# Check eviction stats
# RHEL9:
redis-cli INFO stats | grep evicted
# RHEL10:
valkey-cli INFO stats | grep evicted
Solution: Increase memory allocation or optimize cache usage
Problem: Slow queue processing Solution: Monitor queue depth and increase memory if needed
Problem: Redis memory overcommit warning
WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition.
Solution: To overcome this warning run sudo sysctl vm.overcommit_memory=1 and add vm.overcommit_memory = 1 to /etc/sysctl.conf to make it persistent across reboots.