Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmount disconnected filesystem when quorum lost #14

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

beornf
Copy link
Owner

@beornf beornf commented May 6, 2021

This addresses errors when unmounting a Linstor docker volume after quorum is lost on the primary. The kernel logs in this edge case are as follows:

XFS (drbd1005): metadata I/O error in "xlog_iodone" at daddr 0x1a15a len 64 error 5
XFS (drbd1005): xfs_do_force_shutdown(0x2) called from line 1271 of file fs/xfs/xfs_log.c.  Return address = 000000002625d9d1
XFS (drbd1005): Log I/O Error Detected.  Shutting down filesystem
XFS (drbd1005): Please umount the filesystem and rectify the problem(s)

When terminating the container the resource definition may not return due to loss of connection to the controller. The presence of the mount path is checked as a fallback.

The error returned from checking the mount point indicates the filesystem is corrupted and hence is mounted with os.Stat returning input/output error. See https://github.com/kubernetes/kubernetes/blob/v1.14.6/pkg/util/mount/mount_helper_unix.go#L27.

@beornf beornf requested a review from rck May 6, 2021 13:16
@rck
Copy link
Collaborator

rck commented May 7, 2021

I hoped to get that done today but did not have time, Mon I'm on vacation, so I'm afraid this has to wait till Tue.

@beornf
Copy link
Owner Author

beornf commented May 7, 2021

All good, have a nice vacation.

@rck
Copy link
Collaborator

rck commented May 11, 2021

hm, if I get that right you basically assume that after quorum loss the FS is corrupted, right? That is true by default as of now, but even might change.

little backstory:
when you lose quorum DRBD can do 2 things: freeze (suspend-io policy) or throw IO errors (io-error policy). The DRBD default is suspend-io, but the LINSTOR default is io-error. And usually this is a good setting as applications usually handle IO errors because they are used to. And they usually don't handle freeze to well because that is uncommon. But we also saw that users have horrible networks and quorum loss + gain slightly after loss is pretty common. And now assume you have such a network and 100 VMs and all disks throw IO errors + remount the FS RO because of a transient network hick up. We have seen that too often, and probably we change the default LINSTOR sets in the plugins back to freeze.

long story short: There are two settings and io-error is just the current default. If we rely on that, then we should probably document it/even make sure that volumes are configured like this for sure by setting what is the default now explicitly.

All in all I guess this improves the situation, but maybe it is not complete yet (and even if it is just setting io-error for every volume this plugin creates).

@beornf
Copy link
Owner Author

beornf commented May 11, 2021

My motivation here is I did encounter the issue that my VM had a network loss and the only way to recover was to SSH and manually unmount the FS. I first tried attempting using the suspend-io policy in LINSTOR which freezes the Docker container even after network recovers and forces the user to do a full system restart.

This change only applies when io-error is set but you'd be free to use the suspend-io policy as the FS is unlikely to be corrupted in that case. Although I couldn't get suspend-io to resume operation during my testing.

@rck
Copy link
Collaborator

rck commented May 11, 2021

There have been fixes in this area, does that reproduce with latest DRBD (9.1 or 9.0)? And that was triggered by on-no-quorum=suspend-io (i.e., qouorum loss) and not on-no-data-accessible=suspend-io (i.e., data was accessible)? How did you trigger quorum loss/restore quorum?

@beornf
Copy link
Owner Author

beornf commented May 18, 2021

I upgraded to the latest DRBD version 9.0.29-1. Then set the controller config option DrbdOptions/auto-quorum to suspend-io which automatically sets DrbdOptions/Resource/on-no-quorum to suspend-io.

To reproduce the frozen container I ran on the node:

docker volume create --driver lade/linstor:latest --opt fs=xfs --opt size=100MB test
docker run --rm --mount type=volume,volume-driver=lade/linstor:latest,src=test,target=/data -it alpine

Then in another session dropping packets triggering loss of quorum:

sudo iptables -A OUTPUT -d 172.16.0.0/12 -j DROP

In the docker container running:

rm /data/foo

Then restoring quorum with:

sudo iptables -D OUTPUT 1

The syslog after the quorum is restored:

May 18 05:51:50 nodeA kernel: [ 9074.551892] drbd test nodeB: Handshake to peer 0 successful: Agreed network protocol version 120
May 18 05:51:50 nodeA kernel: [ 9074.553265] drbd test nodeB: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
May 18 05:51:50 nodeA kernel: [ 9074.556615] drbd test nodeB: Peer authenticated using 20 bytes HMAC
May 18 05:51:50 nodeA kernel: [ 9074.558693] drbd test nodeB: Starting ack_recv thread (from drbd_r_test [7267])
May 18 05:51:50 nodeA kernel: [ 9074.637121] drbd test nodeB: Preparing remote state change 3158070484
May 18 05:51:50 nodeA kernel: [ 9074.642511] drbd test nodeB: Committing remote state change 3158070484 (primary_nodes=4)
May 18 05:51:50 nodeA kernel: [ 9074.643894] drbd test nodeB: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
May 18 05:51:50 nodeA kernel: [ 9074.645262] drbd test/0 drbd1005 nodeB: pdsk( DUnknown -> Consistent ) repl( Off -> Established )
May 18 05:51:50 nodeA kernel: [ 9074.649387] drbd test/0 drbd1005 nodeB: pdsk( Consistent -> UpToDate )
May 18 05:51:50 nodeA kernel: [ 9074.934988] drbd test nodeC: Handshake to peer 1 successful: Agreed network protocol version 120
May 18 05:51:50 nodeA kernel: [ 9074.936396] drbd test nodeC: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
May 18 05:51:50 nodeA kernel: [ 9074.941936] drbd test nodeC: Peer authenticated using 20 bytes HMAC
May 18 05:51:50 nodeA kernel: [ 9074.943481] drbd test nodeC: Starting ack_recv thread (from drbd_r_test [7269])
May 18 05:51:50 nodeA kernel: [ 9074.997572] drbd test nodeC: Preparing remote state change 2190151639
May 18 05:51:50 nodeA kernel: [ 9075.004081] drbd test nodeC: Committing remote state change 2190151639 (primary_nodes=4)
May 18 05:51:50 nodeA kernel: [ 9075.005585] drbd test: susp-io( quorum -> no)
May 18 05:51:50 nodeA kernel: [ 9075.007035] drbd test nodeC: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
May 18 05:51:50 nodeA kernel: [ 9075.008522] drbd test/0 drbd1005: quorum( no -> yes )
May 18 05:51:50 nodeA kernel: [ 9075.010004] drbd test/0 drbd1005 nodeC: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
May 18 05:51:50 nodeA kernel: [ 9075.014163] drbd test/0 drbd1005 nodeC: pdsk( Outdated -> UpToDate )
May 18 05:52:52 nodeA dockerd[832]: time="2021-05-18T05:52:52.463155213-04:00" level=info msg="Container 51a087e885b1 failed to exit within 10 seconds of kill - trying direct SIGKILL"
May 18 05:52:52 nodeA kernel: [ 9137.133162] drbd test/0 drbd1005: Remote failed to finish a request within 85624ms > ko-count (7) * timeout (60 * 0.1s)
May 18 05:52:52 nodeA kernel: [ 9137.134871] drbd test/0 drbd1005: Would lose quorum, but using tiebreaker logic to keep
May 18 05:52:52 nodeA kernel: [ 9137.136574] drbd test nodeC: conn( Connected -> Timeout ) peer( Secondary -> Unknown )
May 18 05:52:52 nodeA kernel: [ 9137.138284] drbd test/0 drbd1005 nodeC: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 18 05:52:52 nodeA kernel: [ 9137.140069] drbd test nodeC: ack_receiver terminated
May 18 05:52:52 nodeA kernel: [ 9137.141821] drbd test nodeC: Terminating ack_recv thread
May 18 05:52:52 nodeA kernel: [ 9137.143608] drbd test nodeC: Terminating sender thread
May 18 05:52:52 nodeA kernel: [ 9137.145320] drbd test nodeC: Starting sender thread (from drbd_r_test [7269])
May 18 05:52:52 nodeA kernel: [ 9137.150127] drbd test/0 drbd1005: sending new current UUID: 62D168568B078825
May 18 05:52:52 nodeA kernel: [ 9137.151804] drbd test nodeC: Connection closed
May 18 05:52:52 nodeA kernel: [ 9137.153872] drbd test nodeC: helper command: /sbin/drbdadm disconnected
May 18 05:52:52 nodeA kernel: [ 9137.156760] drbd test nodeC: helper command: /sbin/drbdadm disconnected exit code 0
May 18 05:52:52 nodeA kernel: [ 9137.158837] drbd test nodeC: conn( Timeout -> Unconnected )
May 18 05:52:52 nodeA kernel: [ 9137.160478] drbd test nodeC: Restarting receiver thread
May 18 05:52:52 nodeA kernel: [ 9137.162100] drbd test nodeC: conn( Unconnected -> Connecting )
May 18 05:52:52 nodeA kernel: [ 9137.163759] drbd test nodeB: Preparing remote state change 1602432883
May 18 05:52:52 nodeA kernel: [ 9137.172010] drbd test nodeB: Committing remote state change 1602432883 (primary_nodes=4)
May 18 05:52:52 nodeA kernel: [ 9137.173651] drbd test/0 drbd1005 nodeC: pdsk( DUnknown -> Outdated )
May 18 05:52:53 nodeA kernel: [ 9137.719141] drbd test nodeC: Handshake to peer 1 successful: Agreed network protocol version 120
May 18 05:52:53 nodeA kernel: [ 9137.720739] drbd test nodeC: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
May 18 05:52:53 nodeA kernel: [ 9137.725094] drbd test nodeC: Peer authenticated using 20 bytes HMAC
May 18 05:52:53 nodeA kernel: [ 9137.726950] drbd test nodeC: Starting ack_recv thread (from drbd_r_test [7269])
May 18 05:52:53 nodeA kernel: [ 9137.745121] drbd test nodeC: Preparing remote state change 4189286633
May 18 05:52:53 nodeA kernel: [ 9137.752558] drbd test nodeC: Committing remote state change 4189286633 (primary_nodes=4)
May 18 05:52:53 nodeA kernel: [ 9137.754178] drbd test nodeC: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
May 18 05:52:53 nodeA kernel: [ 9137.755743] drbd test/0 drbd1005 nodeC: repl( Off -> Established )
May 18 05:52:53 nodeA kernel: [ 9137.807392] drbd test/0 drbd1005 nodeC: pdsk( Outdated -> Inconsistent ) resync-susp( no -> peer )
May 18 05:52:53 nodeA kernel: [ 9137.817622] drbd test/0 drbd1005 nodeC: pdsk( Inconsistent -> UpToDate ) resync-susp( peer -> no )
May 18 05:53:39 nodeA kernel: [ 9184.237299] INFO: task rm:7380 blocked for more than 120 seconds.
May 18 05:53:39 nodeA kernel: [ 9184.238900]       Tainted: G           OE     4.19.0-11-amd64 #1 Debian 4.19.146-1
May 18 05:53:39 nodeA kernel: [ 9184.240460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 18 05:53:39 nodeA kernel: [ 9184.242028] rm              D    0  7380   7311 0x00000104
May 18 05:53:39 nodeA kernel: [ 9184.243563] Call Trace:
May 18 05:53:39 nodeA kernel: [ 9184.245110]  __schedule+0x29f/0x840
May 18 05:53:39 nodeA kernel: [ 9184.246667]  ? drbd_queue_unplug+0x4f/0x60 [drbd]
May 18 05:53:39 nodeA kernel: [ 9184.248174]  schedule+0x28/0x80
May 18 05:53:39 nodeA kernel: [ 9184.249683]  schedule_timeout+0x26d/0x3b0
May 18 05:53:39 nodeA kernel: [ 9184.251120]  ? blk_finish_plug+0x21/0x2e
May 18 05:53:39 nodeA kernel: [ 9184.252552]  ? _xfs_buf_ioapply+0x30d/0x470 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.253935]  wait_for_completion+0x11f/0x190
May 18 05:53:39 nodeA kernel: [ 9184.255294]  ? wake_up_q+0x70/0x70
May 18 05:53:39 nodeA kernel: [ 9184.256662]  ? __xfs_buf_submit+0x122/0x230 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.258034]  ? xfs_buf_read_map+0x106/0x170 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.259372]  ? xfs_trans_read_buf_map+0xaa/0x2e0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.260688]  xfs_buf_iowait+0x22/0xd0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.262048]  __xfs_buf_submit+0x122/0x230 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.263379]  xfs_buf_read_map+0x106/0x170 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.264664]  xfs_trans_read_buf_map+0xaa/0x2e0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.265938]  xfs_btree_read_buf_block.constprop.42+0x95/0xd0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.267142]  xfs_btree_lookup_get_block+0x95/0x170 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.268336]  ? kmem_zone_alloc+0x61/0xe0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.269503]  xfs_btree_lookup+0xd0/0x460 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.270703]  ? kmem_zone_alloc+0x61/0xe0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.271888]  xfs_difree_inobt.isra.14+0x85/0x370 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.273082]  xfs_difree+0xac/0x190 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.274268]  xfs_ifree+0x76/0x140 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.275411]  xfs_inactive_ifree+0xa1/0x1b0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.276539]  xfs_inactive+0x9e/0x140 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.277708]  xfs_fs_destroy_inode+0xa8/0x1c0 [xfs]
May 18 05:53:39 nodeA kernel: [ 9184.278835]  do_unlinkat+0x263/0x300
May 18 05:53:39 nodeA kernel: [ 9184.279988]  do_syscall_64+0x53/0x110
May 18 05:53:39 nodeA kernel: [ 9184.281108]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 18 05:53:39 nodeA kernel: [ 9184.282244] RIP: 0033:0x7fed1d9c9a2a
May 18 05:53:39 nodeA kernel: [ 9184.283333] Code: Bad RIP value.
May 18 05:53:39 nodeA kernel: [ 9184.284414] RSP: 002b:00007ffd93dad900 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
May 18 05:53:39 nodeA kernel: [ 9184.285525] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fed1d9c9a2a
May 18 05:53:39 nodeA kernel: [ 9184.286610] RDX: 0000000000008000 RSI: 0000000000000002 RDI: 00007ffd93dadf63
May 18 05:53:39 nodeA kernel: [ 9184.287629] RBP: 00007ffd93dadf63 R08: 00000000ffffff9c R09: 00007ffd93dadf63
May 18 05:53:39 nodeA kernel: [ 9184.288628] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
May 18 05:53:39 nodeA kernel: [ 9184.289636] R13: 0000000000000001 R14: 00007fed1da07b48 R15: 0000000000000000

@Philipp-Reisner
Copy link

Hi @beornf,

I am trying to reproduce the issue. (Without docker, just an XFS). So far I am failing to reproduce what you are describing. In my logs it looks like that:

[543099.990742] drbd r0 d3: sock was shut down by peer
[543099.991520] drbd r0 d3: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown )
[543099.992618] drbd r0/0 drbd0 d3: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
[543099.993885] drbd r0 d3: ack_receiver terminated
[543099.994562] drbd r0 d3: Terminating ack_recv thread
[543099.995799] drbd r0 d3: Terminating sender thread
[543099.998621] drbd r0 d3: Starting sender thread (from drbd_r_r0 [3720])
[543100.013776] drbd r0 d3: Connection closed
[543100.014797] drbd r0 d3: helper command: /sbin/drbdadm disconnected
[543100.017492] drbd r0 d2: PingAck did not arrive in time.
[543100.018103] drbd r0: susp-io( no -> quorum)
[543100.018571] drbd r0 d2: conn( Connected -> NetworkFailure ) peer( Secondary -> Unknown )
[543100.019440] drbd r0/0 drbd0: quorum( yes -> no )
[543100.019940] drbd r0/0 drbd0 d2: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
[543100.021005] drbd r0 d2: ack_receiver terminated
[543100.021529] drbd r0 d2: Terminating ack_recv thread
[543100.022509] drbd r0 d2: Terminating sender thread
[543100.023159] drbd r0 d2: Starting sender thread (from drbd_r_r0 [3719])
[543100.026202] drbd r0/0 drbd0: helper command: /sbin/drbdadm quorum-lost
[543100.029864] drbd r0/0 drbd0: helper command: /sbin/drbdadm quorum-lost exit code 0
[543100.030428] drbd r0 d3: helper command: /sbin/drbdadm disconnected exit code 127 (0x7f00)
[543100.031078] drbd r0 d3: conn( BrokenPipe -> Unconnected )
[543100.031493] drbd r0 d3: Restarting receiver thread
[543100.031834] drbd r0 d3: conn( Unconnected -> Connecting )
[543100.036416] drbd r0 d2: Connection closed
[543100.036864] drbd r0 d2: helper command: /sbin/drbdadm disconnected
[543100.045984] drbd r0 d2: helper command: /sbin/drbdadm disconnected exit code 127 (0x7f00)
[543100.047392] drbd r0 d2: conn( NetworkFailure -> Unconnected )
[543100.048112] drbd r0 d2: Restarting receiver thread
[543100.048727] drbd r0 d2: conn( Unconnected -> Connecting )

In this case I interrupted the connection using iptables -A OUTPUT -d x.x.x.x -j DROP. (I also used drbdadm --force disconnect <res> on the peer side and freezing the backend block device on the peer site)

After restoring Quorum it looks like that in the logs:

[543265.377878] drbd r0 d3: Handshake to peer 3 successful: Agreed network protocol version 120
[543265.379403] drbd r0 d3: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[543265.381364] drbd r0 d3: Starting ack_recv thread (from drbd_r_r0 [3720])
[543265.457611] drbd r0: Preparing cluster-wide state change 1758432408 (1->3 499/145)
[543265.468689] drbd r0/0 drbd0 d3: drbd_sync_handshake:
[543265.469251] drbd r0/0 drbd0 d3: self 0CB18188DD9EC474:0000000000000000:0000000000000000:0000000000000000 bits:76 flags:520
[543265.470408] drbd r0/0 drbd0 d3: peer 0CB18188DD9EC474:0000000000000000:942F5D2215562A00:0000000000000000 bits:0 flags:1120
[543265.471605] drbd r0/0 drbd0 d3: uuid_compare()=no-sync by rule=reconnected
[543265.472346] drbd r0/0 drbd0 d3: bitmap content (76 bits)
[543265.473092] drbd r0: State change 1758432408: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFF5
[543265.474009] drbd r0: Committing cluster-wide state change 1758432408 (16ms)
[543265.474882] drbd r0: susp-io( quorum -> no)
[543265.475452] drbd r0 d3: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[543265.476523] drbd r0/0 drbd0: quorum( no -> yes )
[543265.477196] drbd r0/0 drbd0 d3: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[543268.388086] drbd r0 d2: Handshake to peer 2 successful: Agreed network protocol version 120
[543268.389658] drbd r0 d2: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[543268.391657] drbd r0 d2: Starting ack_recv thread (from drbd_r_r0 [3719])
[543268.513631] drbd r0: Preparing cluster-wide state change 4188264683 (1->2 499/145)
[543268.520736] drbd r0/0 drbd0 d2: drbd_sync_handshake:
[543268.521674] drbd r0/0 drbd0 d2: self 0CB18188DD9EC474:0000000000000000:0000000000000000:0000000000000000 bits:6249 flags:120
[543268.523625] drbd r0/0 drbd0 d2: peer 0CB18188DD9EC474:0000000000000000:942F5D2215562A00:0000000000000000 bits:0 flags:120
[543268.525634] drbd r0/0 drbd0 d2: uuid_compare()=no-sync by rule=reconnected
[543268.526442] drbd r0/0 drbd0 d2: bitmap content (6249 bits)
[543268.527139] drbd r0: State change 4188264683: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFF1
[543268.528050] drbd r0: Committing cluster-wide state change 4188264683 (14ms)
[543268.528958] drbd r0 d2: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[543268.530029] drbd r0/0 drbd0 d2: pdsk( DUnknown -> Outdated ) repl( Off -> Established )
[543268.533584] drbd r0/0 drbd0 d2: pdsk( Outdated -> UpToDate )

and rm simply unfreezes.
What seems different in your log is that the request timer fires: Remote failed to finish a request within 85624ms > ko-count (7) * timeout (60 * 0.1s). I am using the same value of 7 for ko-count. I will continue to look into this tomorrow. Please double check that you are on drbd-9.0.29 and not by mistake have 9.0.28 loaded!
(Maybe I need to make one of the 3 nodes diskless, but that is something for tomorrow)

@beornf
Copy link
Owner Author

beornf commented May 18, 2021

Hi @Philipp-Reisner,

The output from /proc/drbd is:

version: 9.0.29-1 (api:2/proto:86-120)
GIT-hash: 9a7bc817880ab1ac800f4c53f2b832ddd5da87c5 build by root@nodeA, 2021-05-18 03:15:02
Transports (api:17): tcp (9.0.29-1)

I probably should have mentioned the DRBD admin status from nodeA is:

test role:Primary
  disk:Diskless
  nodeB role:Secondary
    peer-disk:UpToDate
  nodeC role:Secondary
    peer-disk:UpToDate

I'm running the XFS mount on a diskless node so it can be unmounted and mounted on another node quickly. I'll try reproducing the issue without Docker and compare logs.

Thanks,
Beorn

@beornf
Copy link
Owner Author

beornf commented May 19, 2021

I've done further testing by running drbdadm --force disconnect test on both peers. This time rm doesn't freeze instead triggering an XFS I/O error. A manual mount or Docker volume both trigger the same error:

May 19 02:37:31 nodeA kernel: [46725.896654] drbd test nodeB: sock was shut down by peer
May 19 02:37:31 nodeA kernel: [46725.898244] drbd test/0 drbd1005: Would lose quorum, but using tiebreaker logic to keep
May 19 02:37:31 nodeA kernel: [46725.899778] drbd test nodeB: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown )
May 19 02:37:31 nodeA kernel: [46725.901320] drbd test/0 drbd1005 nodeB: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 19 02:37:31 nodeA kernel: [46725.902894] drbd test nodeB: ack_receiver terminated
May 19 02:37:31 nodeA kernel: [46725.904416] drbd test nodeB: Terminating ack_recv thread
May 19 02:37:31 nodeA kernel: [46725.908249] drbd test nodeB: Terminating sender thread
May 19 02:37:31 nodeA kernel: [46725.910031] drbd test nodeB: Starting sender thread (from drbd_r_test [4396])
May 19 02:37:31 nodeA kernel: [46725.911696] drbd test nodeB: Connection closed
May 19 02:37:31 nodeA kernel: [46725.913238] drbd test nodeB: helper command: /sbin/drbdadm disconnected
May 19 02:37:31 nodeA kernel: [46725.916464] drbd test nodeB: helper command: /sbin/drbdadm disconnected exit code 0
May 19 02:37:31 nodeA kernel: [46725.918232] drbd test nodeB: conn( BrokenPipe -> Unconnected )
May 19 02:37:31 nodeA kernel: [46725.919656] drbd test nodeB: Restarting receiver thread
May 19 02:37:31 nodeA kernel: [46725.921049] drbd test nodeB: conn( Unconnected -> Connecting )
May 19 02:37:36 nodeA kernel: [46730.817409] drbd test nodeC: sock was shut down by peer
May 19 02:37:36 nodeA kernel: [46730.818811] drbd test: susp-io( no -> quorum)
May 19 02:37:36 nodeA kernel: [46730.820172] drbd test nodeC: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown )
May 19 02:37:36 nodeA kernel: [46730.821572] drbd test/0 drbd1005: quorum( yes -> no )
May 19 02:37:36 nodeA kernel: [46730.822942] drbd test/0 drbd1005 nodeC: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
May 19 02:37:36 nodeA kernel: [46730.824370] drbd test nodeC: ack_receiver terminated
May 19 02:37:36 nodeA kernel: [46730.825753] drbd test nodeC: Terminating ack_recv thread
May 19 02:37:36 nodeA kernel: [46730.827146] drbd test/0 drbd1005 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr
May 19 02:37:36 nodeA kernel: [46730.830637] drbd test nodeC: Terminating sender thread
May 19 02:37:36 nodeA kernel: [46730.832054] drbd test nodeC: Starting sender thread (from drbd_r_test [4398])
May 19 02:37:36 nodeA kernel: [46730.834925] drbd test/0 drbd1005 nodeB: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
May 19 02:37:36 nodeA kernel: [46730.836347] drbd test/0 drbd1005 nodeC: helper command: /sbin/drbdadm pri-on-incon-degr
May 19 02:37:36 nodeA kernel: [46730.839387] drbd test/0 drbd1005 nodeC: helper command: /sbin/drbdadm pri-on-incon-degr exit code 0
May 19 02:37:36 nodeA kernel: [46730.840818] drbd test/0 drbd1005: helper command: /sbin/drbdadm quorum-lost
May 19 02:37:36 nodeA kernel: [46730.842839] drbd test nodeC: Connection closed
May 19 02:37:36 nodeA kernel: [46730.844300] drbd test nodeC: helper command: /sbin/drbdadm disconnected
May 19 02:37:36 nodeA kernel: [46730.847088] drbd test/0 drbd1005: helper command: /sbin/drbdadm quorum-lost exit code 0
May 19 02:37:36 nodeA kernel: [46730.849688] drbd test nodeC: helper command: /sbin/drbdadm disconnected exit code 0
May 19 02:37:36 nodeA kernel: [46730.851019] drbd test nodeC: conn( BrokenPipe -> Unconnected )
May 19 02:37:36 nodeA kernel: [46730.852323] drbd test nodeC: Restarting receiver thread
May 19 02:37:36 nodeA kernel: [46730.853655] drbd test nodeC: conn( Unconnected -> Connecting )
May 19 02:37:45 nodeA kernel: [46740.292677] XFS (drbd1005): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x8 len 8 error 5
May 19 02:37:45 nodeA kernel: [46740.294013] XFS (drbd1005): xfs_do_force_shutdown(0x1) called from line 300 of file fs/xfs/xfs_trans_buf.c.  Return address = 00000000bbc1a790
May 19 02:37:45 nodeA kernel: [46740.296652] XFS (drbd1005): metadata I/O error in "xlog_iodone" at daddr 0x1a10c len 64 error 5
May 19 02:37:45 nodeA kernel: [46740.298016] XFS (drbd1005): xfs_do_force_shutdown(0x2) called from line 1271 of file fs/xfs/xfs_log.c.  Return address = 00000000664d2879
May 19 02:37:45 nodeA kernel: [46740.300718] XFS (drbd1005): Log I/O Error Detected.  Shutting down filesystem
May 19 02:37:45 nodeA kernel: [46740.302097] XFS (drbd1005): Please umount the filesystem and rectify the problem(s)
May 19 02:37:45 nodeA kernel: [46740.303488] XFS (drbd1005): xfs_difree_inobt: xfs_inobt_lookup() returned error -5.

@Philipp-Reisner
Copy link

Hi @beornf,

Good catch!
What happens here, you have the settings
[...]
on-no-data-accessible io-error; # default
on-no-quorum suspend-io;
[...]

It might be unintuitive, but the on-no-data setting has precedence in this case.

This needs to be mentioned in the documentation. (Or changed in the code if someone has good reasons)
...and LINSTOR needs to do this in a better way. (Well, I need to tell the LINSTOR devs how to do it...)

Still, what you posted on May 18 was something different. I continue to investigate the May 18 issue. Just letting you know what was the reason for the IO error when you are on a diskless primary...

@beornf
Copy link
Owner Author

beornf commented May 19, 2021

Hi @Philipp-Reisner,

I've set both settings to suspend-io to reproduce the May 18 issue. This time I mounted to a folder on the diskless primary at ~/demo.

After interrupting the connection with iptables -A OUTPUT -d x.x.x.x -j DROP I ran the command cd ~/demo/data which was frozen as expected.

On restoring the connection the command did not unfreeze. The relevant errors from the kernel were:

[  967.988763] INFO: task bash:2423 blocked for more than 120 seconds.
[  967.990382]       Tainted: G           OE     4.19.0-14-cloud-amd64 #1 Debian 4.19.171-2
[  967.992427] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.994651] bash            D    0  2423   2422 0x00000004

@beornf beornf force-pushed the unmount-quorum-lost branch from f2f2ef7 to c14374f Compare July 7, 2021 13:10
@beornf
Copy link
Owner Author

beornf commented Jul 17, 2021

Hi @Philipp-Reisner,

Just wondering if the new DRBD versions 9.0.30-1 or 9.1.3 might have addressed the frozen XFS device issue.

Thanks,
Beorn

@Philipp-Reisner
Copy link

Hi @Philipp-Reisner,

Just wondering if the new DRBD versions 9.0.30-1 or 9.1.3 might have addressed the frozen XFS device issue.

Thanks,
Beorn

HI @beornf ,

there was not patch merged in this regard.

But let me share, what I did:
I created a new test for out testsuite, called quorum-suspend-timeout2. It examines about what is described here. I was able to reproduce an issue where DRBD reports missmathicng "barrier ack number" after unfreeze.
Then I had to switch to other tasks.
I will return to that, fix the bug and bring both (the fix to DRBD and the new test) to the repositories where they should be.

Just to be sure: The stack trace you see in the kernel logs is not a problem or bug. Since IO is frozen a task that waits for IO completion is blocked for more that 120 seconds. The kernel prints this warning with a full stack trace. That is expected and okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants