Skip to content

net/frr: OSPF CARP interface costs don't survive a service restart #4702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
AndyX90 opened this issue May 14, 2025 · 6 comments · May be fixed by #4712
Open
3 tasks done

net/frr: OSPF CARP interface costs don't survive a service restart #4702

AndyX90 opened this issue May 14, 2025 · 6 comments · May be fixed by #4712

Comments

@AndyX90
Copy link
Contributor

AndyX90 commented May 14, 2025

Important notices
Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug
The FRR Plugin has multiple ways to interact with carp.

  1. CARP failover mode
  2. CARP demote
  3. Influence interface cost based on CARP

This issue is about number 3.

If you choose a carp vip to depend on for an ospf interface, it works as long as the frr daemons are running.
If a deamon which is in carp-state backup ends and starts again, it starts with the normal interface costs, not the demoted ones.
This leads to same path costs and therefor routing problems.
To get it working again, you have to move the vips, then the costs get corrected for both firewalls.

Also if you manually trigger python3 /usr/local/opnsense/scripts/frr/carp_event_handler the correct costs are getting applied.

To Reproduce
Steps to reproduce the behavior:

  1. Use 2 OPNSense Firewalls in HA
  2. Create an interface in OSPF and set a Carp VIP to track [depend on (carp)], also set interface costs and demoted costs
  3. Move VIPs to second firewall (costs get the demoted ones on first firewall which is now in carp-state BACKUP)
  4. Reboot first firewall which is now in carp-state BACKUP
  5. After a reboot, the rebooted backup firewall gets the default costs of the interface and not the demoted ones

Expected behavior
The firewall in state carp-backup should get the demoted costs after a reboot (or a service restart) and not the default ones.

Screenshots
none

Relevant log files
Error on system startup:
>>> Error in start script '50-frr'

Additional context
Interestingly, this log line never appears.

Environment
OPNsense 25.1.6_4 (amd64).
Virtual testing appliance

@AndyX90
Copy link
Contributor Author

AndyX90 commented May 15, 2025

The error on system startup is not the cause. It is because of this startup-hook, which i think can be removed, but it doesn't matter in that case.

The error for this issue is, that the override of start_postcmd is not working anymore.

Maybe someone has an idea how to solve this in an elegant way?

@AdSchellevis
Copy link
Member

@AndyX90 I suppose you mean the script parts inside postcmd, when I place an echo at that spot, it does output on start.

@AndyX90
Copy link
Contributor Author

AndyX90 commented May 15, 2025

@AdSchellevis You are totally right, the override works.
I did the same test but i must have overseen the logentry.
In its current form, "Starting CARP event handler now" never appears, but carp_event_handler is fired 9 times at startup on my side.

Also the relevant log line is present in frr: ospfd demote interface vtnet2 (cost 1000). but it has no effect.
Maybe it gets overridden afterwards.

2025-05-15T17:50:05+02:00 fw.localdomain zebra 97570 - [QS0NJ-H5QKJ] Zebra final shutdown
2025-05-15T17:50:28+02:00 fw.localdomain frr_carp 58103 - FRR received carp configuration event
2025-05-15T17:50:29+02:00 fw.localdomain frr_carp 47538 - FRR received carp configuration event.
2025-05-15T17:50:29+02:00 fw.localdomain frr_carp 47538 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:29+02:00 fw.localdomain ospfd 47422 - [YWPB2-VEAQY] ASBR[default:Status:1]: Update
2025-05-15T17:50:29+02:00 fw.localdomain zebra 46576 -[VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025-05-15T17:50:29+02:00 fw.localdomain ospfd 47422 - [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025-05-15T17:50:30+02:00 fw.localdomain ospfd 47422 - [S5PCG-77H23] Packet[DD]: Neighbor 192.168.122.92 Negotiation done (Master).
2025-05-15T17:50:30+02:00 fw.localdomain frr_carp 58103 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:30+02:00 fw.localdomain frr_carp 58103 - ospfd demote interface vtnet2 (cost 1000).
2025-05-15T17:50:31+02:00 fw.localdomain frr_carp 51073 - FRR received carp configuration event.
2025-05-15T17:50:31+02:00 fw.localdomain frr_carp 51073 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:31+02:00 fw.localdomain frr_carp 56837 - FRR received carp configuration event.
2025-05-15T17:50:31+02:00 fw.localdomain frr_carp 56837 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 64780 - FRR received carp configuration event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 64780 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 70775 - FRR received carp configuration event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 70775 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 76358 - FRR received carp configuration event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 76358 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 82511 - FRR received carp configuration event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 82511 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 89237 - FRR received carp configuration event.
2025-05-15T17:50:32+02:00 fw.localdomain frr_carp 89237 - FRR trigger OspfdEventHandler event.
2025-05-15T17:50:34+02:00 fw.localdomain watchfrr 37123 - [QDG3Y-BY5TN] zebra state -> up : connect succeeded
2025-05-15T17:50:34+02:00 fw.localdomain watchfrr 37123 - [QDG3Y-BY5TN] ospfd state -> up : connect succeeded
2025-05-15T17:50:34+02:00 fw.localdomain watchfrr 37123 - [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
2025-05-15T17:50:34+02:00 fw.localdomain kernel - <118>WARNING: Old rc.d/watchfrr detected, this file must be deleted
2025-05-15T17:50:34+02:00 fw.localdomain kernel - <118>Checking intergrated config...
2025-05-15T17:50:34+02:00 fw.localdomain kernel - <118>watchfrr already running? (pid=37123).
2025-05-15T17:50:34+02:00 fw.localdomain kernel - <118>>>> Error in start script '50-frr'
2025-05-15T17:50:34+02:00 fw.localdomain zebra 46576 - [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025-05-15T17:50:34+02:00 fw.localdomain ospfd 47422 - [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025-05-15T17:50:34+02:00 fw.localdomain ospfd 47422 - [JPMW2-G68GC] Zebra[Redistribute]: distribute-list update timer fired!
2025-05-15T17:50:34+02:00 fw.localdomain zebra 46576 - [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2025-05-15T17:50:34+02:00 fw.localdomain ospfd 47422 - [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00

@AndyX90 AndyX90 linked a pull request May 17, 2025 that will close this issue
@AndyX90
Copy link
Contributor Author

AndyX90 commented May 18, 2025

Summarizing things up:
Our carp startup logic comes from a time where watchfrr was not used and the routing daemons were started directly through rc. But nowadays watchfrr is enabled by default (and recommended) and handles the daemons.
More information on actual startup behavior:
https://cgit.freebsd.org/ports/tree/net/frr8/files/frr.in#n15
https://cgit.freebsd.org/ports/tree/net/frr8/files/watchfrr.in#n25

Noticed 2 independent cosmetics btw:

  • failing startup hook introduced in f26a704 I think this can be removed nowadays?
  • frr complains about an old watchfrr file

@Monviech
Copy link
Member

For reference regarding the old watchfrr file, it should be removed automatically when the setup.sh is called:

#4552

@devinbarry
Copy link

Probably not related to the original issue but I am also seeing Error in start script '50-frr' and also WARNING: Old rc.d/watchfrr detected, this file must be deleted. I am not using CARP. Just OSPF between a few routers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants