Skip to content

Commit b4d3062

Browse files
committed
fix(bootstrap): load flannel iptables modules and fix nftables healthcheck
Flannel's embedded traffic manager in k3s v1.35.x is compiled without the nft backend — it only supports iptables-legacy, which requires kernel modules (ip_tables, iptable_nat, iptable_filter, iptable_mangle) that modern distributions (Fedora 43+, RHEL 10+) no longer load by default. Changes: - cluster-entrypoint.sh: When running under Podman, explicitly load the legacy iptables kernel modules via modprobe before starting k3s. The container already runs privileged, so modprobe works when /lib/modules is available. - docker.rs: Bind-mount /lib/modules from the host into the gateway container (read-only) so the entrypoint's modprobe calls can find the host kernel modules. - cluster-healthcheck.sh: Replace the hardcoded 127.0.0.1 NodePort check with the node's actual InternalIP. When kube-proxy runs in nftables mode, NodePort DNAT rules only match the node's real IP addresses — loopback is not in the nftables nodeport-ips set, so the old check always failed. Tested on Fedora 43 (kernel 6.19, Podman 5.8.1) with the full lifecycle: gateway start, provider create/list/delete, sandbox create/exec/delete.
1 parent 4b67305 commit b4d3062

3 files changed

Lines changed: 47 additions & 7 deletions

File tree

crates/openshell-bootstrap/src/docker.rs

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -705,7 +705,17 @@ pub async fn ensure_container(
705705
privileged: Some(true),
706706
cgroupns_mode: Some(cgroupns),
707707
port_bindings: Some(port_bindings),
708-
binds: Some(vec![format!("{}:/var/lib/rancher/k3s", volume_name(name))]),
708+
binds: Some({
709+
let mut binds = vec![format!("{}:/var/lib/rancher/k3s", volume_name(name))];
710+
// Mount host kernel modules so the entrypoint can load iptable_nat
711+
// and related modules required by flannel. Modern distributions
712+
// (Fedora 43+, RHEL 10+) no longer load legacy iptables modules by
713+
// default, and the container image doesn't ship its own modules.
714+
if std::path::Path::new("/lib/modules").exists() {
715+
binds.push("/lib/modules:/lib/modules:ro".to_string());
716+
}
717+
binds
718+
}),
709719
network_mode: Some(network_name(name)),
710720
// Automatically restart the container when Docker restarts, unless the
711721
// user explicitly stopped it with `gateway stop`.

deploy/docker/cluster-entrypoint.sh

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -675,12 +675,32 @@ fi
675675
# Select kube-proxy mode
676676
# ---------------------------------------------------------------------------
677677
# Under Podman, use native nftables kube-proxy mode so no legacy iptables
678-
# kernel modules (ip_tables, iptable_nat, etc.) are required on the host.
679-
# Docker retains the default iptables mode for maximum compatibility.
678+
# kernel modules are needed for kube-proxy service routing.
679+
#
680+
# Flannel's embedded traffic manager in k3s v1.35.x still uses the iptables
681+
# binary (no nft backend compiled in). The iptables binary inside the
682+
# container is iptables-legacy, which requires the iptable_nat, iptable_filter,
683+
# and ip_tables kernel modules. Modern distributions (Fedora 43+, RHEL 10+)
684+
# no longer load these modules by default. Since the container runs
685+
# privileged with /lib/modules mounted from the host, load them explicitly
686+
# so flannel masquerade rules work.
687+
#
688+
# Docker retains the default iptables kube-proxy mode for maximum compatibility.
680689
EXTRA_KUBE_PROXY_ARGS=""
681690
if [ "${CONTAINER_RUNTIME:-}" = "podman" ]; then
682691
echo "Podman detected — using nftables kube-proxy mode"
683692
EXTRA_KUBE_PROXY_ARGS="--kube-proxy-arg=proxy-mode=nftables"
693+
694+
# Load legacy iptables kernel modules required by flannel.
695+
# The container runs privileged with /lib/modules bind-mounted from the
696+
# host, so modprobe works. These modules are needed because flannel's
697+
# traffic manager calls iptables-legacy for masquerade rules, which
698+
# requires the kernel-side iptable_nat and related modules.
699+
for _mod in ip_tables iptable_nat iptable_filter iptable_mangle; do
700+
if ! modprobe "$_mod" 2>/dev/null; then
701+
echo "Warning: could not load kernel module $_mod — flannel masquerade may fail" >&2
702+
fi
703+
done
684704
fi
685705

686706
# Execute k3s with explicit resolv-conf passed as a kubelet arg.

deploy/docker/cluster-healthcheck.sh

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,18 @@ kubectl -n openshell get secret openshell-ssh-handshake >/dev/null 2>&1 || exit
7575
# ---------------------------------------------------------------------------
7676
# Verify the gateway NodePort (30051) is actually accepting TCP connections.
7777
# After a container restart, kube-proxy may need extra time to re-program
78-
# iptables rules for NodePort routing. Without this check the health check
79-
# can pass before the port is routable, causing "Connection refused" on the
80-
# host-mapped port.
78+
# iptables/nftables rules for NodePort routing. Without this check the
79+
# health check can pass before the port is routable, causing "Connection
80+
# refused" on the host-mapped port.
81+
#
82+
# When kube-proxy runs in nftables mode (Podman), NodePort DNAT rules only
83+
# match traffic destined to the node's real IP addresses — loopback
84+
# (127.0.0.1) is not in the nodeport-ips set. Use the node's InternalIP
85+
# so the check works with both iptables and nftables kube-proxy modes.
8186
# ---------------------------------------------------------------------------
82-
timeout 2 bash -c 'echo >/dev/tcp/127.0.0.1/30051' 2>/dev/null || exit 1
87+
NODEPORT_CHECK_IP="127.0.0.1"
88+
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}' 2>/dev/null || true)
89+
if [ -n "$NODE_IP" ]; then
90+
NODEPORT_CHECK_IP="$NODE_IP"
91+
fi
92+
timeout 2 bash -c "echo >/dev/tcp/${NODEPORT_CHECK_IP}/30051" 2>/dev/null || exit 1

0 commit comments

Comments
 (0)