Skip to content

Commit f02afd0

Browse files
committed
Filesystem: on stop, try umount directly, before scanning for users
48ed6e6 (Filesystem: improve stop-action and allow setting term/kill signals and signal_delay for large filesystems, 2023-07-04) changed the logic from "try umount; if that fails, find and kill users; repeat" to "try to find and kill users; then try umount; repeat" But even just walking /proc may take "a long time" on busy systems, and may still turn up with "no users found". It will take even longer for "force_umount=safe" (observed 8 to 10 seconds just for "get_pids() with "safe" to return nothing) than for "force_umount=yes" (still ~ 2 to 3 seconds), but it will take "a long time" in any case. (BTW, that may be longer than the hardcoded default of 6 seconds for "fast_stop", which is also the default on many systems now) If the dependencies are properly configured, there should be no users left, and the umount should just work. Revert back to "try umount first", and only then try to find "rogue" users.
1 parent e3ba7ba commit f02afd0

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

heartbeat/Filesystem

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -732,6 +732,11 @@ fs_stop() {
732732
local SUB="$1" timeout=$2 grace_time ret
733733
grace_time=$((timeout/2))
734734

735+
# Just walking /proc may take "a long time", even if we don't find any users of this FS.
736+
# If dependencies are properly configured, umount should just work.
737+
# Only if that fails, try to find and kill processes that still use it.
738+
try_umount "" "$SUB" && return $OCF_SUCCESS
739+
735740
# try gracefully terminating processes for up to half of the configured timeout
736741
fs_stop_loop "" "$SUB" "$OCF_RESKEY_term_signals" &
737742
timeout_child $! $grace_time

0 commit comments

Comments
 (0)