Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsd stops forking with new zone data #417

Open
pettai opened this issue Dec 27, 2024 · 1 comment
Open

nsd stops forking with new zone data #417

pettai opened this issue Dec 27, 2024 · 1 comment

Comments

@pettai
Copy link

pettai commented Dec 27, 2024

It seems our NSD secondary has triggered some sort of intermittent bug
This happens several weeks/months of running nsd stops forking with the new zone data.

A manual nsd-control transfer some.zone. or even nsd-control force_transfer some.zone won’t work.
Only restart of nsd solves the problem.

The logs doesn’t give a hint other than process nsd[3185521] stopped reporting updates.

[...]
Dec 27 05:41:04 sunic nsd[1413]: zone some.zone. serial 1735277408 is updated to 1735278004
Dec 27 05:46:55 sunic nsd[1413]: xfrd: zone some.zone. committed "received update to serial 1735278005 at 2024-12-27T05:46:55 from 2001:6b0:: TSIG verified with key tsig-sunet"
Dec 27 05:46:58 sunic nsd[3185521]: zone some.zone. received update to serial 1735278005 at 2024-12-27T05:46:55 from 2001:6b0:: TSIG verified with key tsig-sunet of 1369 bytes in 9.7e-05 seconds
Dec 27 05:47:02 sunic nsd[1413]: zone some.zone. serial 1735278004 is updated to 1735278005
Dec 27 05:07:59 sunic nsd[1413]: zone some.zone. serial 1735275607 is updated to 1735275608
Dec 27 05:10:42 sunic nsd[1413]: zone some.zone. serial 1735275608 is updated to 1735275609
Dec 27 05:11:04 sunic nsd[1413]: zone some.zone. serial 1735275609 is updated to 1735276204
[...]

And nsd-control zone status says state: ok, (which seems odd since several updates has failed to update the zone.)

zone:	some.zone
	state: ok
	served-serial: "1735279204 since 2024-12-27T06:01:01"
	commit-serial: "1735311605 since 2024-12-27T15:01:10"
	wait: "3510 sec between attempts"

Unfortunately, we restarted nsd since nothing else worked.
So we don't know if nsd[3185521] died or got in a defunct state.

Currently we run nsd 4.11.0 based on debians build/package , but this issue has happened with nsd 4.10.x too.

# nsd -v
NSD version 4.11.0
Written by NLnet Labs.

Configure line: --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-configdir=/etc/nsd --with-nsd_conf_file=/etc/nsd/nsd.conf --with-pidfile=/run/nsd/nsd.pid --with-dbfile=/var/lib/nsd/nsd.db --with-zonesdir=/etc/nsd --with-xfrdfile=/var/lib/nsd/xfrd.state --disable-largefile --disable-recvmmsg --enable-root-server --enable-mmap --enable-ratelimit --enable-zone-stats --enable-systemd --enable-checking --enable-dnstap
Event loop: libevent 2.1.12-stable (uses epoll)
Linked with OpenSSL 3.0.2 15 Mar 2022

Copyright (C) 2001-2024 NLnet Labs.  This is free software.
There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.

Something that perhaps is less common with our nsd configuration, is that we use nsd's cpu-affinity and we also have catalog zones enabled. Let me know if more data (and what data) is needed.

@wtoorop
Copy link
Member

wtoorop commented Dec 30, 2024

Thanks @pettai ,
As @ttyS4 pointed out on the mailing-list, this is indeed quite similar to issue #338 .
I plan to implement some watchdog processes that can log, and potentially terminate, the stage in the reload process when it is taking long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants