You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems our NSD secondary has triggered some sort of intermittent bug
This happens several weeks/months of running nsd stops forking with the new zone data.
A manual nsd-control transfer some.zone. or even nsd-control force_transfer some.zone won’t work.
Only restart of nsd solves the problem.
The logs doesn’t give a hint other than process nsd[3185521] stopped reporting updates.
[...]
Dec 27 05:41:04 sunic nsd[1413]: zone some.zone. serial 1735277408 is updated to 1735278004
Dec 27 05:46:55 sunic nsd[1413]: xfrd: zone some.zone. committed "received update to serial 1735278005 at 2024-12-27T05:46:55 from 2001:6b0:: TSIG verified with key tsig-sunet"
Dec 27 05:46:58 sunic nsd[3185521]: zone some.zone. received update to serial 1735278005 at 2024-12-27T05:46:55 from 2001:6b0:: TSIG verified with key tsig-sunet of 1369 bytes in 9.7e-05 seconds
Dec 27 05:47:02 sunic nsd[1413]: zone some.zone. serial 1735278004 is updated to 1735278005
Dec 27 05:07:59 sunic nsd[1413]: zone some.zone. serial 1735275607 is updated to 1735275608
Dec 27 05:10:42 sunic nsd[1413]: zone some.zone. serial 1735275608 is updated to 1735275609
Dec 27 05:11:04 sunic nsd[1413]: zone some.zone. serial 1735275609 is updated to 1735276204
[...]
And nsd-control zone status says state: ok, (which seems odd since several updates has failed to update the zone.)
zone: some.zone
state: ok
served-serial: "1735279204 since 2024-12-27T06:01:01"
commit-serial: "1735311605 since 2024-12-27T15:01:10"
wait: "3510 sec between attempts"
Unfortunately, we restarted nsd since nothing else worked.
So we don't know if nsd[3185521] died or got in a defunct state.
Currently we run nsd 4.11.0 based on debians build/package , but this issue has happened with nsd 4.10.x too.
# nsd -v
NSD version 4.11.0
Written by NLnet Labs.
Configure line: --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-configdir=/etc/nsd --with-nsd_conf_file=/etc/nsd/nsd.conf --with-pidfile=/run/nsd/nsd.pid --with-dbfile=/var/lib/nsd/nsd.db --with-zonesdir=/etc/nsd --with-xfrdfile=/var/lib/nsd/xfrd.state --disable-largefile --disable-recvmmsg --enable-root-server --enable-mmap --enable-ratelimit --enable-zone-stats --enable-systemd --enable-checking --enable-dnstap
Event loop: libevent 2.1.12-stable (uses epoll)
Linked with OpenSSL 3.0.2 15 Mar 2022
Copyright (C) 2001-2024 NLnet Labs. This is free software.
There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.
Something that perhaps is less common with our nsd configuration, is that we use nsd's cpu-affinity and we also have catalog zones enabled. Let me know if more data (and what data) is needed.
The text was updated successfully, but these errors were encountered:
Thanks @pettai ,
As @ttyS4 pointed out on the mailing-list, this is indeed quite similar to issue #338 .
I plan to implement some watchdog processes that can log, and potentially terminate, the stage in the reload process when it is taking long.
It seems our NSD secondary has triggered some sort of intermittent bug
This happens several weeks/months of running nsd stops forking with the new zone data.
A manual
nsd-control transfer some.zone
. or evennsd-control force_transfer some.zone
won’t work.Only restart of nsd solves the problem.
The logs doesn’t give a hint other than process
nsd[3185521]
stopped reporting updates.And
nsd-control zone status
saysstate: ok
, (which seems odd since several updates has failed to update the zone.)Unfortunately, we restarted nsd since nothing else worked.
So we don't know if
nsd[3185521]
died or got in a defunct state.Currently we run nsd 4.11.0 based on debians build/package , but this issue has happened with nsd 4.10.x too.
Something that perhaps is less common with our nsd configuration, is that we use nsd's
cpu-affinity
and we also have catalog zones enabled. Let me know if more data (and what data) is needed.The text was updated successfully, but these errors were encountered: