update workload autoscaler docs

Vacant2333 · Vacant2333 · commit 6df82401f42e · 2025-09-18T15:12:51.000+08:00
Signed-off-by: Vacant2333 &lt;vacant2333@gmail.com&gt;
diff --git a/src/content/guide/workload_autoscaler/autoscaling_policy.mdx b/src/content/guide/workload_autoscaler/autoscaling_policy.mdx
@@ -7,7 +7,7 @@ title: AutoscalingPolicy
 **AutoscalingPolicy** defines **which Workloads** should have their **Requests and Limits** automatically adjusted, **when** these adjustments should occur, and **how** they should be applied.
 By properly configuring a AutoscalingPolicy, you can continuously adjust the Requests of a group of Workloads to a reasonable and efficient level.
 
-> **Note:** Fields marked with * are required.
+> **Note:** If a Pod contains sidecar containers (e.g., Istio), we won’t modify them, and they will be excluded from recommendation calculations. We detect sidecars by diffing the container names between the workload’s Pod template and the actual Pod; any names that exist only in the Pod are treated as injected sidecars.
 
 ## Enable*
 
@@ -49,6 +49,19 @@ You can configure multiple TargetRefs to cover a broader set of Workloads.
 | **Name**       | Any valid Workload name \| *empty* | No       | Name of the Workload. If left empty, it matches **all Workloads** within the namespace or cluster (depending on `Namespace`). |
 | **Namespace**  | Any valid namespace \| *empty*     | No       | Namespace of the Workload. If left empty, it matches **all namespaces** in the cluster.                                       |
 
+**Name and Namespace support shell-style glob patterns** (*, ?, and character classes like [a-z]); patterns match the entire value, and an empty field (or *) matches all.
+
+| Pattern           | Meaning                             | Matches                  | Doesn’t match      |
+|-------------------|-------------------------------------|--------------------------|--------------------|
+| `*`               | Any value                           | `web`, `ns-1`, `default` | —                  |
+| `web-*`           | Values starting with `web-`         | `web-1`, `web-prod-a`    | `api-web-1`        |
+| `*-prod`          | Values ending with `-prod`          | `core-prod`, `a-prod`    | `prod-core`        |
+| `front?`          | `front` + exactly 1 char            | `front1`, `fronta`       | `front10`, `front` |
+| `job-??`          | `job-` + exactly 2 chars            | `job-01`, `job-ab`       | `job-1`, `job-001` |
+| `ns-[0-9][0-9]-*` | `ns-` + two digits + `-` + anything | `ns-01-a`, `ns-99-x`     | `ns-1-a`           |
+| `db[0-2]`         | `db0`, `db1`, or `db2` only         | `db0`, `db2`             | `db3`, `db-2`      |
+| `[^0-9]*`         | Does **not** start with a digit     | `app1`, `ns-x`           | `9-app`            |
+
 ## Update Schedule
 
 **UpdateSchedule** defines **when** a Workload should use a particular update mode.
@@ -88,6 +101,8 @@ You can visit [here](https://crontab.cronhub.io/) to refer to how the Cron synta
 
 When the `UpdateMode` is set to either `ReCreate` or `InPlace`, the `OnCreate` mode will also be applied automatically. This ensures that when a Pod restarts normally, the newly created Pod will always receive the latest recommendations, regardless of the Drift Thresholds.
 
+For `ReCreate` operations, when attempting to evict a **single-replica** Deployment **without PVCs**, we perform a **rolling update** to avoid service interruption during the update.
+
 > **Note:** The `InPlace` mode has certain limitations and may automatically fall back to `ReCreate` in some cases. For details, see [InPlace Limitations](./best_practices_and_limitations#inplace-update-mode-limitations).
 
 ## Update Resources*
@@ -101,9 +116,9 @@ Available resources: `CPU` / `Memory`.
 - Only the selected resources will be actively updated.
 - This setting does **not** affect how recommendations are calculated.
 
-If you don’t have specific requirements or if you already use HPA, we recommend allowing **both CPU and Memory** to be managed.
+If you don’t have specific requirements or if you already use `HPA`, we recommend allowing **both `CPU` and `Memory`** to be managed.
 
-> **Note:** After optimizations have been applied, changing Update Resources will not roll back modifications that are already in effect. By default, we do not recommend updating this field. Instead, create a new AutoscalingPolicy and gradually replace the existing configuration.
+> **Note:** When you modify the `Update Resources`, an update operation may be triggered based on the deviation between the recommended value and the current value. This operation will take effect immediately once the conditions of the `Update Schedule` are met.
 
 ## Drift Thresholds
 
@@ -132,14 +147,23 @@ If the deviation for **any resource** in a Pod exceeds the threshold, the Pod wi
 | `ReCreate` | Roll back to the pre-policy **requests** by **recreating** target Workloads (rolling replace). | Restarts, brief downtime    | Cluster does not support in-place vertical changes; require scheduler to reassign resources. | Ensure safe rolling strategy. **Limits** typically remain unchanged unless your controller handles them.  |
 | `InPlace`  | Roll back to the pre-policy **requests** via **in-place** Pod updates (no recreate).           | Usually zero/low disruption | Cluster supports in-place vertical resizing; prioritize minimal disturbance.                 | Requires cluster/runtime support for in-place updates. **Limits** unchanged unless otherwise implemented. |
 
+For `ReCreate` operations, when attempting to evict a **single-replica** Deployment **without PVCs**, we perform a **rolling update** to avoid service interruption during the update.
+
 > **Note:** The `InPlace` mode has certain limitations and may automatically fall back to `ReCreate` in some cases. For details, see [InPlace Limitations](./best_practices_and_limitations#inplace-update-mode-limitations). When unexpected situations prevent us from restoring the Pod Request for 10 minutes, we will allow the configuration to be deleted directly without restoring the Pod Request.
 
 ## Limit Policy*
 
 **LimitPolicy** defines how Pod limits should be reset.
+By default, **we recommend using `RemoveLimit`** to ensure that a Workload can occasionally preempt more resources when needed.
+
+When using Multiplier, we suggest setting a reasonable lower bound for `CPU`/`Memory` recommendations. In rare cases (e.g., in testing environments where actual usage is extremely low), the recommended values may not be sufficient for stable Pod startup or handling sudden traffic spikes.
 
 | Field         | Behavior                                                                   |
 |---------------|----------------------------------------------------------------------------|
 | `RemoveLimit` | Remove Pod `limits` (no CPU/Memory caps).                                  |
 | `KeepLimit`   | Keep existing Pod `limits` unchanged.                                      |
 | `Multiplier`  | Recalculate `limits` by multiplying with the recommendation request value. |
+
+When you modify the `Limit Policy`, an update operation may be triggered. This decision is based on the deviation between the current values and the recommended values, as well as whether existing Pods have their limits set according to the expected configuration. Once the conditions of the `Update Schedule` are met, the update will take effect immediately.
+
+> **Note:** When using `KeepLimit`, the final recommended values will never exceed your configured Limits. If you want Pods to be able to use more resources in certain cases, consider using `RemoveLimit` or `Multiplier` instead.
diff --git a/src/content/guide/workload_autoscaler/best_practices_and_limitations.mdx b/src/content/guide/workload_autoscaler/best_practices_and_limitations.mdx
@@ -12,28 +12,40 @@ This document describes best practices for the Workload Autoscaler and the limit
 
 Your Kubernetes cluster version must be **1.33 or higher**.
 
-### Memory limits (decrease vs. increase)
+### Memory Limits: Decrease vs Increase
 
-**Decreasing Memory Limit is not allowed in place.**
-In InPlace mode, the Workload Autoscaler will **not** proactively reduce a Pod’s Memory Limit. Memory Limits are only reassigned when the Pod is **recreated normally**.
+#### 🔻 Decreasing Memory Limit
 
-**Increasing Memory Limit may require container restarts.**
-If a workload (e.g., **Java** applications) cannot dynamically adapt to Memory Limit changes, configure the container’s **`ResizePolicy`** so that the **memory** resource is set to **`RestartContainer`**. Attempts to increase the Memory Limit will then automatically **restart** the corresponding container to apply the new limit.
+- **Not supported in `InPlace`.**
+`InPlace` resizing does not allow lowering the Memory limit.
+
+- **Fallback behavior:**
+When a new recommendation would reduce an existing Pod’s Memory limit, the Workload Autoscaler automatically falls back to `ReCreate` mode and recreates the Pod.
+
+#### 🔺 Increasing Memory Limit
+
+- **May require container restarts.**
+Some workloads (e.g., **Java applications**) cannot dynamically adapt to Memory limit changes.
+
+- **Best practice:**
+Configure the container’s **`ResizePolicy`** so that the **Memory** resource is set to **`RestartContainer`**.
+In this case, attempts to increase the Memory limit will automatically **restart the container** to apply the new limit.
+
+#### Notes
 
-**Notes**
 - By default, the Workload Autoscaler sets the **ResizePolicy** for **all resources** of **all containers** to **`NotRequired`**.
 - If you have **manually configured** a container’s ResizePolicy for any resource, the Workload Autoscaler **will not overwrite** it. For details, see the
   [Kubernetes documentation example](https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/#example-1-resizing-cpu-without-restart).
 
 ### Pod QoS class must not change
 
-A Pod’s QoS class is determined at creation time (one of **Guaranteed**, **Burstable**, or **BestEffort**). InPlace updates must **not** cause a change in QoS class:
+A Pod’s QoS class is determined at creation time (one of **Guaranteed**, **Burstable**, or **BestEffort**). `InPlace` updates must **not** cause a change in QoS class:
 
-- **BestEffort Pods** (no CPU/memory requests or limits at startup): You **cannot** add any CPU/memory requests or limits, because adding requests would convert the Pod to **Burstable**, which is **not allowed** in InPlace updates. Therefore, BestEffort Pods **cannot** use in-place vertical scaling. If you need scaling, specify requests at creation time so the Pod is at least Burstable.
-- **Guaranteed Pods** (for every container, CPU and memory **requests equal limits**): After InPlace adjustments, each container must still satisfy **`requests == limits`**. To increase or decrease CPU/memory, you must update **both** request **and** limit to the **same** value. For example, going from 2 CPU to 3 CPU requires setting **both** request and limit to 3. You cannot change only one of them, or the Pod will no longer be Guaranteed.
-- **Burstable Pods** (have requests, but not all equal to limits, or some containers may have no requests): You may adjust CPU/memory, but **must not** turn the Pod into Guaranteed. It is forbidden to make **both CPU and memory requests equal to their limits** across all containers after the update; otherwise the Pod would become Guaranteed. You also must not clear all requests and turn the Pod into BestEffort. In short, the Pod **must keep its original QoS class** unchanged.
+- **BestEffort Pods** (no CPU/Memory requests or limits at startup): You **cannot** add any CPU/Memory requests or limits, because adding requests would convert the Pod to **Burstable**, which is **not allowed** in `InPlace` updates. Therefore, BestEffort Pods **cannot** use in-place vertical scaling. If you need scaling, specify requests at creation time so the Pod is at least Burstable.
+- **Guaranteed Pods** (for every container, CPU and Memory **requests equal limits**): After `InPlace` adjustments, each container must still satisfy **`requests == limits`**. To increase or decrease CPU/Memory, you must update **both** request **and** limit to the **same** value. For example, going from 2 CPU to 3 CPU requires setting **both** request and limit to 3. You cannot change only one of them, or the Pod will no longer be Guaranteed.
+- **Burstable Pods** (have requests, but not all equal to limits, or some containers may have no requests): You may adjust CPU/Memory, but **must not** turn the Pod into Guaranteed. It is forbidden to make **both CPU and Memory requests equal to their limits** across all containers after the update; otherwise the Pod would become Guaranteed. You also must not clear all requests and turn the Pod into BestEffort. In short, the Pod **must keep its original QoS class** unchanged.
 
-If an InPlace operation would violate any of the above QoS rules, the Workload Autoscaler **falls back to `ReCreate` mode** and explicitly recreates (re-schedules) the target Pod.
+If an `InPlace` operation violates any of the above QoS rules, the Workload Autoscaler **falls back to `ReCreate` mode** and explicitly recreates (re-schedules) the target Pod.
 
 > **Note:** Such fallback events are expected to occur only when a Workload is first configured with a AutoscalingPolicy or when certain related configurations of the AutoscalingPolicy are modified. They should not occur during normal operation.
 
@@ -45,7 +57,7 @@ In this scenario, the Workload Autoscaler will **fall back to `ReCreate` mode**
 
 ### Coexisting with HPA
 
-Using the Workload Autoscaler **together** with **HPA (Horizontal Pod Autoscaler)** can produce unexpected behavior. If you need both, configure them to manage **different resources**—for example, let HPA scale by **CPU usage**, while the Workload Autoscaler adjusts only **memory**.
+Using the Workload Autoscaler **together** with **HPA (Horizontal Pod Autoscaler)** can produce unexpected behavior. If you need both, configure them to manage **different resources**—for example, let HPA scale by **CPU usage**, while the Workload Autoscaler adjusts only **Memory**.
 
 ## Best Practices
 
@@ -57,6 +69,6 @@ Whenever possible, set **resource requests** for every container in all workload
 
 Avoid specifying **limits** whenever feasible. Instead, set **requests** to place Pods in the **Burstable** QoS class.
 
-### Set a restart policy for workloads that cannot adapt memory InPlace
+### Set a restart policy for workloads that cannot adapt Memory InPlace
 
-For workloads like **Java** that cannot adjust to Memory Limit changes dynamically, manually configure the container’s **`ResizePolicy`** so that when the InPlace update modifies the Memory Limit, the **container will restart** to apply the new limit (set memory ResizePolicy to **`RestartContainer`**).
+For workloads like **Java** that cannot adjust to Memory Limit changes dynamically, manually configure the container’s **`ResizePolicy`** so that when the `InPlace` update modifies the Memory Limit, the **container will restart** to apply the new limit (set Memory ResizePolicy to **`RestartContainer`**).
diff --git a/src/content/guide/workload_autoscaler/installation.mdx b/src/content/guide/workload_autoscaler/installation.mdx
@@ -74,3 +74,17 @@ Similarly, the `Workload Autoscaler` component is uninstalled together with the
 to uninstall the `Workload Autoscaler` component independently, please contact our technical support team for assistance.
 
 > **Note:** Before uninstalling the Workload Autoscaler, please make sure that all AutoscalingPolicies have been deleted or disabled, and confirm that all Workloads have been restored to their original state.
+
+## Configure the Update/Evict Limiter
+
+By default, the **Workload Autoscaler** enables a **Limiter** that throttles the number of **in-place updates** and **Pod evictions**. This helps prevent large clusters from becoming unstable when many Pods are updated or evicted in a short period.
+
+You can tune the Limiter with the environment variables below. If not set, the defaults apply.
+
+| ENV var                    | Default | What it controls                                                               |
+|----------------------------|---------|--------------------------------------------------------------------------------|
+| `LIMITER_QUOTA_PER_WINDOW` |   **5** | Tokens added to the bucket each window.                                        |
+| `LIMITER_BURST`            |  **10** | Maximum tokens allowed in the bucket (peak operations within a window).        |
+| `LIMITER_WINDOW_SECONDS`   |  **30** | Window length in seconds; every window adds `LIMITER_QUOTA_PER_WINDOW` tokens. |
+
+> **Note:** For eviction operations, when attempting to evict a **single-replica** Deployment **without PVCs**, we perform a **rolling update** to avoid service interruption during the update.
diff --git a/src/content/guide/workload_autoscaler/recommendation_policy.mdx b/src/content/guide/workload_autoscaler/recommendation_policy.mdx
@@ -9,7 +9,7 @@ It allows you to define the range of recommendation values, enabling more flexib
 
 This document explains the meaning and valid range of each field in the `Recommendation Policy`.
 
-> **Note:** Fields marked with * are required.
+> **Note:** For all containers, if the recommended values are below the minimums, the system automatically raises them to: CPU `20m` and Memory `20Mi`. This ensures that resource requests never fall below safe operational thresholds.
 
 ## Strategy Type*
 
@@ -86,7 +86,6 @@ For critical workloads, you can set it to 7 days to ensure recommendations accou
 ![resource_limits](./img/recommendation_policy/resource_limits.png)
 
 You can set both `Min` and `Max` limits for `CPU` and `Memory`. This ensures that the recommended values will not fall below or exceed the range you define.
-
 The `Max limit` is applied after the `Buffer` is calculated, meaning the final recommended value (including the `Buffer`) will not exceed the `Max limit`.
 
 For `Resource Limits`, you can use either percentages or absolute values:
@@ -100,6 +99,8 @@ In most cases, we recommend ***using percentages so the system can adjust based
 For example, if you set `CPU` to `30%` ~ `200%`, the final recommended value will never be lower than `30%` of the original Request,
 nor higher than `200%` of the original Request.
 
+We strongly recommend that you configure Min limits for both `CPU` and `Memory` resources to prevent recommended values from being too low in certain cases, which could cause Pods to fail to run properly.
+
 > **Note:** When using percentages for `Resource Limits`, you must ensure that all containers within the workloads governed by this `Recommendation Policy` have defined Request values for the corresponding resource. Otherwise, the system will not be able to calculate a recommendation.
 
 ## Evaluation Period*