Skip to content

Commit c28cecf

Browse files
authored
Blog post readonly ingester scaling (cortexproject#7054)
Signed-off-by: Daniel Deluiggi <[email protected]>
1 parent 5d23973 commit c28cecf

File tree

1 file changed

+180
-0
lines changed

1 file changed

+180
-0
lines changed
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
---
2+
date: 2025-01-14
3+
title: "Introducing READONLY State: Gradual and Safe Ingester Scaling"
4+
linkTitle: READONLY Ingester Scaling
5+
tags: [ "blog", "cortex", "ingester", "scaling" ]
6+
categories: [ "blog" ]
7+
projects: [ "cortex" ]
8+
description: >
9+
Learn about Cortex's new READONLY state for ingesters introduced in version 1.19.0 that enables gradual, safe scaling down operations without data loss or performance impact.
10+
author: Cortex Team
11+
---
12+
13+
## Introduction
14+
15+
Scaling down ingesters in Cortex has traditionally been a complex and risky operation. The conventional approach required setting `querier.query-store-after=0s`, which forces all queries to hit storage directly, significantly impacting performance. With Cortex 1.19.0, we introduced a new **READONLY state** for ingesters that changes how you can safely scale down your Cortex clusters.
16+
17+
## Why Traditional Scaling Falls Short
18+
19+
The legacy approach to ingester scaling had several issues:
20+
21+
**Performance Impact**: Setting `querier.query-store-after=0s` forces all queries to bypass ingesters entirely, increasing query latency and storage load.
22+
23+
**Operational Complexity**: Traditional scaling required coordinating configuration changes across multiple components, precise timing, manual monitoring of bucket scanning intervals, and scaling ingesters one by one with waiting periods between each shutdown.
24+
25+
**Risk of Data Loss**: Without proper coordination, scaling down could result in data loss if in-memory data wasn't properly flushed to storage before ingester termination.
26+
27+
## What is the READONLY State?
28+
29+
The READONLY state addresses these challenges. When an ingester transitions to READONLY state:
30+
31+
- **Stops accepting new writes** - Push requests are rejected and redistributed to ACTIVE ingesters
32+
- **Continues serving queries** - Existing data remains available, maintaining query performance
33+
- **Gradually ages out data** - Data naturally expires according to your retention settings
34+
- **Enables safe removal** - Ingesters can be terminated once data has aged out
35+
36+
## How to Use READONLY State
37+
38+
### Step 1: Transition to READONLY
39+
40+
```bash
41+
# Set multiple ingesters to READONLY simultaneously
42+
curl -X POST http://ingester-1:8080/ingester/mode -d '{"mode": "READONLY"}'
43+
curl -X POST http://ingester-2:8080/ingester/mode -d '{"mode": "READONLY"}'
44+
curl -X POST http://ingester-3:8080/ingester/mode -d '{"mode": "READONLY"}'
45+
```
46+
47+
### Step 2: Monitor Data Status (Optional)
48+
49+
```bash
50+
# Check user statistics and loaded blocks on the ingester
51+
curl http://ingester-1:8080/ingester/all_user_stats
52+
```
53+
54+
### Step 3: Choose Removal Strategy
55+
56+
You have three options:
57+
58+
- **Immediate removal**: Safe for service availability but may impact query performance
59+
- **Conservative removal**: Wait for `querier.query-ingesters-within` duration (recommended)
60+
- **Complete data aging**: Wait for full retention period
61+
62+
### Step 4: Remove Ingesters
63+
64+
```bash
65+
# Terminate the ingester processes
66+
kubectl delete pod ingester-1 ingester-2 ingester-3
67+
```
68+
69+
## Timeline Example
70+
71+
For a cluster with `querier.query-ingesters-within=5h`:
72+
73+
- **T0**: Set ingesters to READONLY state
74+
- **T1**: Ingesters stop receiving new data but continue serving queries
75+
- **T2 (T0 + 5h)**: Ingesters no longer receive query requests (safe to remove)
76+
- **T3 (T0 + retention_period)**: All blocks naturally removed from ingesters
77+
78+
**Any time after T2 is safe for removal without service impact.**
79+
80+
## Benefits
81+
82+
### Performance Preservation
83+
Unlike the traditional approach, READONLY ingesters continue serving queries, maintaining performance during the scaling transition.
84+
85+
### Operational Simplicity
86+
- No configuration changes required across multiple components
87+
- Batch operations supported - multiple ingesters can transition simultaneously (no more "one by one" requirement)
88+
- No waiting periods between ingester transitions
89+
- Flexible timing - remove ingesters when convenient
90+
- Reversible operations - ingesters can return to ACTIVE state if needed
91+
92+
### Enhanced Safety
93+
- Gradual data aging without manual intervention
94+
- Data remains available during transition
95+
- Monitoring capabilities with `/ingester/all_user_stats` endpoint
96+
97+
## Practical Examples
98+
99+
### Basic READONLY Scaling
100+
101+
```bash
102+
#!/bin/bash
103+
INGESTERS_TO_SCALE=("ingester-1" "ingester-2" "ingester-3")
104+
WAIT_DURATION="5h"
105+
106+
# Set ingesters to READONLY
107+
for ingester in "${INGESTERS_TO_SCALE[@]}"; do
108+
echo "Setting $ingester to READONLY..."
109+
curl -X POST http://$ingester:8080/ingester/mode -d '{"mode": "READONLY"}'
110+
done
111+
112+
# Wait for safe removal window
113+
echo "Waiting $WAIT_DURATION for safe removal..."
114+
sleep $WAIT_DURATION
115+
116+
# Remove ingesters
117+
for ingester in "${INGESTERS_TO_SCALE[@]}"; do
118+
echo "Removing $ingester..."
119+
kubectl delete pod $ingester
120+
done
121+
```
122+
123+
### Advanced: Check for Empty Users Before Removal
124+
125+
```bash
126+
#!/bin/bash
127+
check_ingester_ready() {
128+
local ingester=$1
129+
local response=$(curl -s http://$ingester:8080/ingester/all_user_stats)
130+
131+
# Empty array "[]" indicates no users/data remaining
132+
if [[ "$response" == "[]" ]]; then
133+
return 0 # Ready for removal
134+
else
135+
return 1 # Still has user data
136+
fi
137+
}
138+
139+
INGESTERS_TO_SCALE=("ingester-1" "ingester-2" "ingester-3")
140+
141+
# Set ingesters to READONLY
142+
for ingester in "${INGESTERS_TO_SCALE[@]}"; do
143+
echo "Setting $ingester to READONLY..."
144+
curl -X POST http://$ingester:8080/ingester/mode -d '{"mode": "READONLY"}'
145+
done
146+
147+
# Wait and check for data removal
148+
for ingester in "${INGESTERS_TO_SCALE[@]}"; do
149+
echo "Waiting for $ingester to be ready for removal..."
150+
while ! check_ingester_ready $ingester; do
151+
echo "$ingester still has user data, waiting 30s..."
152+
sleep 30
153+
done
154+
155+
echo "Removing $ingester (no user data remaining)..."
156+
kubectl delete pod $ingester
157+
done
158+
```
159+
160+
## Best Practices
161+
162+
- **Test in non-production first** to validate the process with your configuration
163+
- **Scale gradually** - don't remove too many ingesters simultaneously
164+
- **Monitor throughout** - watch metrics during the entire process
165+
- **Understand your query patterns** - know your `querier.query-ingesters-within` setting
166+
167+
## Emergency Rollback
168+
169+
If issues arise, return ingesters to ACTIVE state:
170+
171+
```bash
172+
# Revert to ACTIVE state
173+
curl -X POST http://ingester-1:8080/ingester/mode -d '{"mode": "ACTIVE"}'
174+
```
175+
176+
## Conclusion
177+
178+
The READONLY state improves Cortex's operational capabilities. This feature makes scaling operations safer, simpler, more flexible, and more performant than the traditional approach. Configuration changes across multiple components are no longer required - set ingesters to READONLY and remove them when convenient.
179+
180+
For detailed information and examples, check out our [Ingesters Scaling Guide](../../docs/guides/ingesters-scaling-up-and-down/).

0 commit comments

Comments
 (0)