On rack2 I noticed there was only an active BGP session on one switch. I saw that BGP was configured in the port settings so I looked at the maghemite logs and saw this. In the same second, we're sending an apply request with a peer and an apply request with no peers, causing mgd to flap the peer.
02:11:17.844Z INFO slog-rs: apply: ApplyRequest {
asn: 65002,
originate: [
Prefix4 {
value: 172.20.26.0,
length: 24,
},
],
checker: None,
shaper: None,
peers: {
"qsfp18": [
BgpPeerConfig {
host: 172.20.15.43:179,
name: "172.20.15.43",
hold_time: 6,
idle_hold_time: 0,
delay_open: 3,
connect_retry: 3,
keepalive: 2,
resolution: 100,
passive: false,
remote_asn: None,
min_ttl: None,
md5_auth_key: None,
multi_exit_discriminator: None,
communities: [],
local_pref: None,
enforce_first_as: false,
allow_import: NoFiltering,
allow_export: Allow(
{
V4(
Prefix4 {
value: 172.20.26.0,
length: 24,
},
),
},
),
vlan_id: None,
},
],
},
}
02:11:17.844Z INFO slog-rs: nbr: current []
02:11:17.844Z INFO slog-rs: nbr: adding [
Nbr {
addr: 172.20.15.43,
asn: 65002,
},
]
02:11:17.844Z INFO slog-rs: nbr: removing []
02:11:17.844Z INFO slog-rs: add neighbor: Neighbor {
asn: 65002,
name: "172.20.15.43",
host: 172.20.15.43:179,
hold_time: 6,
idle_hold_time: 0,
delay_open: 3,
connect_retry: 3,
keepalive: 2,
resolution: 100,
group: "qsfp18",
passive: false,
remote_asn: None,
min_ttl: None,
md5_auth_key: None,
multi_exit_discriminator: None,
communities: [],
local_pref: None,
enforce_first_as: false,
allow_import: NoFiltering,
allow_export: Allow(
{
V4(
Prefix4 {
value: 172.20.26.0,
length: 24,
},
),
},
),
vlan_id: None,
}
02:11:17.844Z INFO slog-rs: spawning new session
02:11:17.844Z DEBG slog-rs: [172.20.15.43] starting peer state machine
02:11:17.844Z INFO slog-rs: [172.20.15.43] idle -> connect
02:11:17.844Z INFO slog-rs: request completed
latency_us = 351
local_addr = [::]:4676
method = POST
remote_addr = [fd00:1122:3344:103::3]:56172
req_id = d3ad7f7f-c7ec-4262-a43f-1277832b3187
response_code = 204
unit = api-server
uri = /bgp/omicron/apply
02:11:17.844Z INFO slog-rs: apply: ApplyRequest {
asn: 65002,
originate: [
Prefix4 {
value: 172.20.26.0,
length: 24,
},
],
checker: None,
shaper: None,
peers: {},
}
02:11:17.844Z INFO slog-rs: nbr: current [
BgpNeighborInfo {
asn: 65002,
name: "172.20.15.43",
host: 172.20.15.43:179,
hold_time: 6,
idle_hold_time: 0,
delay_open: 3,
connect_retry: 3,
keepalive: 2,
resolution: 100,
group: "qsfp18",
passive: false,
remote_asn: None,
min_ttl: None,
md5_auth_key: None,
multi_exit_discriminator: None,
communities: [],
local_pref: None,
enforce_first_as: false,
allow_import: NoFiltering,
allow_export: Allow(
{
V4(
Prefix4 {
value: 172.20.26.0,
length: 24,
},
),
},
),
vlan_id: None,
},
]
02:11:17.844Z INFO slog-rs: nbr: adding []
02:11:17.844Z INFO slog-rs: nbr: removing [
Nbr {
addr: 172.20.15.43,
asn: 65002,
},
]
02:11:17.844Z INFO slog-rs: remove neighbor: 172.20.15.43
02:11:17.844Z INFO slog-rs: request completed
error_message_external = Not Found
error_message_internal = not found: no bgp router configured
latency_us = 139
local_addr = [::]:4676
method = POST
remote_addr = [fd00:1122:3344:103::3]:56172
req_id = 8c0c7e35-30e0-471a-b885-620afcc33f12
response_code = 404
unit = api-server
uri = /bgp/omicron/apply
02:11:17.845Z INFO slog-rs: apply: ApplyRequest {
asn: 65002,
originate: [
Prefix4 {
value: 172.20.26.0,
length: 24,
},
],
checker: None,
shaper: None,
peers: {},
}
02:11:17.845Z INFO slog-rs: request completed
error_message_external = Not Found
error_message_internal = not found: no bgp router configured
latency_us = 27
local_addr = [::]:4676
method = POST
remote_addr = [fd00:1122:3344:103::3]:56172
req_id = c3ed8177-4140-481e-8f85-f3c930414093
response_code = 404
unit = api-server
uri = /bgp/omicron/apply
On rack2 I noticed there was only an active BGP session on one switch. I saw that BGP was configured in the port settings so I looked at the maghemite logs and saw this. In the same second, we're sending an apply request with a peer and an apply request with no peers, causing
mgdto flap the peer.