|
| 1 | +--- |
| 2 | +sidebar_position: 6 |
| 3 | +sidebar_label: Host |
| 4 | +title: "Host" |
| 5 | +--- |
| 6 | + |
| 7 | +<head> |
| 8 | + <link rel="canonical" href="https://docs.harvesterhci.io/v1.3/troubleshooting/host"/> |
| 9 | +</head> |
| 10 | + |
| 11 | +## Node in Maintenance Mode Becomes Stuck in Cordoned State |
| 12 | + |
| 13 | +When you enable Maintenance Mode on a node using the Harvester UI, the node becomes stuck in the `Cordoned` state and the menu shows the **Enable Maintenance Mode** option instead of **Disable Maintenance Mode**. |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +The Harvester pod logs contain messages similar to the following: |
| 18 | + |
| 19 | +``` |
| 20 | +time="2024-08-05T19:03:02Z" level=info msg="evicting pod longhorn-system/instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7" |
| 21 | +time="2024-08-05T19:03:02Z" level=info msg="error when evicting pods/\"instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7\" -n \"longhorn-system\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget." |
| 22 | +
|
| 23 | +time="2024-08-05T19:03:07Z" level=info msg="evicting pod longhorn-system/instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7" |
| 24 | +time="2024-08-05T19:03:07Z" level=info msg="error when evicting pods/\"instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7\" -n \"longhorn-system\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget." |
| 25 | +
|
| 26 | +time="2024-08-05T19:03:12Z" level=info msg="evicting pod longhorn-system/instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7" |
| 27 | +time="2024-08-05T19:03:12Z" level=info msg="error when evicting pods/\"instance-manager-68cd2514dd3f6d59b95cbd865d5b08f7\" -n \"longhorn-system\" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget." |
| 28 | +``` |
| 29 | + |
| 30 | +The Longhorn Instance Manager uses a PodDisruptionBudget (PDB) to protect itself from accidental eviction, which results in loss of volume data. When the Maintenance Mode error occurs, it indicates that the `instance-manager` pod is still serving volumes or replicas. |
| 31 | + |
| 32 | +The following sections describe the known causes and their corresponding workarounds. |
| 33 | + |
| 34 | +### Manually Attached Volumes |
| 35 | + |
| 36 | +A volume that is attached to a node using the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards) can cause the error. This is because the object is attached to a node name instead of the pod name. |
| 37 | + |
| 38 | +You can check it from the [Embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards). |
| 39 | + |
| 40 | + |
| 41 | + |
| 42 | +The manually attached object is attached to a node name instead of the pod name. |
| 43 | + |
| 44 | +You can also use the CLI to retrieve the details of the CRD object `VolumeAttachment`. |
| 45 | + |
| 46 | +Example of a volume that was attached using the Longhorn UI: |
| 47 | + |
| 48 | +``` |
| 49 | +- apiVersion: longhorn.io/v1beta2 |
| 50 | + kind: VolumeAttachment |
| 51 | +... |
| 52 | + spec: |
| 53 | + attachmentTickets: |
| 54 | + longhorn-ui: |
| 55 | + id: longhorn-ui |
| 56 | + nodeID: node-name |
| 57 | +... |
| 58 | + volume: pvc-9b35136c-f59e-414b-aa55-b84b9b21ff89 |
| 59 | +``` |
| 60 | + |
| 61 | +Example of a volume that was attached using the Longhorn CSI driver: |
| 62 | + |
| 63 | +``` |
| 64 | +- apiVersion: longhorn.io/v1beta2 |
| 65 | + kind: VolumeAttachment |
| 66 | + spec: |
| 67 | + attachmentTickets: |
| 68 | + csi-b5097155cddde50b4683b0e659923e379cbfc3873b5b2ee776deb3874102e9bf: |
| 69 | + id: csi-b5097155cddde50b4683b0e659923e379cbfc3873b5b2ee776deb3874102e9bf |
| 70 | + nodeID: node-name |
| 71 | +... |
| 72 | + volume: pvc-3c6403cd-f1cd-4b84-9b46-162f746b9667 |
| 73 | +``` |
| 74 | + |
| 75 | +:::note |
| 76 | + |
| 77 | +Manually attaching a volume to the node is not recommended. |
| 78 | + |
| 79 | +Harvester automatically attaches/detaches volumes based on operations like creating or migrating VM. |
| 80 | + |
| 81 | +::: |
| 82 | + |
| 83 | +#### Workaround 1: Set `Detach Manually Attached Volumes When Cordoned` to `True` |
| 84 | + |
| 85 | +The Longhorn setting [Detach Manually Attached Volumes When Cordoned](https://longhorn.io/docs/1.6.0/references/settings/#detach-manually-attached-volumes-when-cordoned) blocks node draining when there are volumes manually attached to the node. |
| 86 | + |
| 87 | +The default value of this setting depends on the embedded Longhorn version: |
| 88 | + |
| 89 | +| Harvester version | Embedded Longhorn version | Default value | |
| 90 | +| --- | --- | --- | |
| 91 | +| v1.3.1 | v1.6.0 | `true` | |
| 92 | +| v1.4.0 | v1.7.0 | `false` | |
| 93 | + |
| 94 | +Set this option to `true` from the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards). |
| 95 | + |
| 96 | +#### Workaround 2: Manually Detach the Volume |
| 97 | + |
| 98 | +Detach the volume using the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards). |
| 99 | + |
| 100 | + |
| 101 | + |
| 102 | +Once the volume is detached, you can successfully enable Maintenance Mode on the node. |
| 103 | + |
| 104 | + |
| 105 | + |
| 106 | +### Single-Replica Volumes |
| 107 | + |
| 108 | +Harvester allows you to create custom StorageClasses that describe how Longhorn must provision volumes. If necessary, you can create a StorageClass with the [Number of Replicas](../advanced/storageclass.md#number-of-replicas) parameter set to `1`. |
| 109 | + |
| 110 | +When a volume is created using such a StorageClass and is attached to a node using the CSI driver or other methods, the lone replica stays on that node even after the volume is detached. |
| 111 | + |
| 112 | +You can check this using the CRD object `Volume`. |
| 113 | + |
| 114 | +``` |
| 115 | +- apiVersion: longhorn.io/v1beta2 |
| 116 | + kind: Volume |
| 117 | +... |
| 118 | + spec: |
| 119 | +... |
| 120 | + numberOfReplicas: 1 // the replica number |
| 121 | +... |
| 122 | + status: |
| 123 | +... |
| 124 | + ownerID: nodeName |
| 125 | +... |
| 126 | + state: attached |
| 127 | +``` |
| 128 | + |
| 129 | +#### Workaround: Set `Node Drain Policy` |
| 130 | + |
| 131 | +The Longhorn [Node Drain Policy](https://longhorn.io/docs/1.6.0/references/settings/#node-drain-policy) is set to `block-if-contains-last-replica` by default. This option forces Longhorn to block node draining when the node contains the last healthy replica of a volume. |
| 132 | + |
| 133 | +To address the issue, change the value to `allow-if-replica-is-stopped` using the [embedded Longhorn UI](./harvester.md#access-embedded-rancher-and-longhorn-dashboards). |
| 134 | + |
| 135 | +:::info important |
| 136 | + |
| 137 | +If you plan to remove the node after Maintenance Mode is enabled, back up single-replica volumes or redeploy the related workloads to other nodes in advance so that the volumes are scheduled to other nodes. |
| 138 | + |
| 139 | +::: |
| 140 | + |
| 141 | +Starting with Harvester v1.4.0, the `Node Drain Policy` is set to `allow-if-replica-is-stopped` by default. |
0 commit comments