-
Couldn't load subscription status.
- Fork 597
Clear suppressed notifications only after the TimePeriod is resumed #10613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a6ba476 to
20560af
Compare
|
|
||
| if ((suppressedTypes & type) && !checkable->NotificationReasonApplies(type)) { | ||
| subtract |= type; | ||
| suppressedTypes &= ~type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't read the issue, but the fix shouldn't hurt. After all, this is also the code branch where notifications are sent.
1fd58ee to
22d7a0d
Compare
c08038d to
fea4554
Compare
fea4554 to
9fa3acc
Compare
This includes a few common scenarios and a reproduction of the current behavior affected by the underlying bug of issue #10575. This is done both to document the change in behavior, as well as to ensure the behavior of the other scenarios stays the same before and after the fix is applied.
Without this commit, every time the NotificationTimerHandler runs it will discard notifications that don't apply to the reason of the latest check result. This is probably intended to clear outdated suppressed notifications immediately when the TimePeriod resumes, but it also clears out important ones (see the test case). This commit fixes that by clearing out inapplicable notifications when the timer runs the first time after the TimePeriod resumes. By that time we can expect that no new suppressed notifications will be added and all notifications that don't conflict with the last check-result can still be run. Fixes #10575
9fa3acc to
41b5383
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely certain what the purpose of that check is, but the only scenario I could think of is when a
TimePeriodbegins and a check result produces a notification beforeNotificationTimerHandlercan run and send out the suppressed notifications.
That's pretty much the reasoning behind it. Its purpose is to avoid sending notifications that are no longer applicable, e.g. a problem notification when the checkable has already recovered. For instance, if a problem notification was suppressed due to being outside the time period, but recovers right after the time period starts and before the NotificationTimerHandler runs, it won't clear the suppressed problem notification, thus sending out the suppressed problem notification would be incorrect.
| if (!(suppressedTypes & type) || checkable->NotificationReasonSuppressed(type)) | ||
| continue; | ||
|
|
||
| if ((suppressedTypes & type) && !checkable->NotificationReasonApplies(type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first check of this condition is superfluous and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and the whole if (suppressedTypes) ... surrounding this loop has also become redundant, thus you can either remove the whole if or the other if (!suppressedTypes) ... at the beginning of this function. Your choice.
| notification-notificationcomponent.cpp | ||
| remote-certificate-fixture.cpp | ||
| remote-filterutility.cpp | ||
| remote-configpackageutility.cpp | ||
| remote-httpserverconnection.cpp | ||
| remote-httpmessage.cpp | ||
| remote-url.cpp | ||
| ${base_OBJS} | ||
| $<TARGET_OBJECTS:config> | ||
| $<TARGET_OBJECTS:remote> | ||
| $<TARGET_OBJECTS:icinga> | ||
| $<TARGET_OBJECTS:methods> | ||
| $<TARGET_OBJECTS:notification> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add these dependencies conditionally based on the ICINGA2_WITH_NOTIFICATION variable.
| // black magic | ||
| template<auto prvPtr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's indeed a black magic, but describing it a bit more what it does and it's used for would help readers. Also, does the abbreviated prv of the template param name stand for private or previous? Maybe use a bit of descriptive name instead.
| template<auto prvPtr> | ||
| struct InvokeTimerHandlerImpl | ||
| { | ||
| friend void InvokeTimerHandler(const NotificationComponent::Ptr& nc) { return ((*nc).*prvPtr)(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| friend void InvokeTimerHandler(const NotificationComponent::Ptr& nc) { return ((*nc).*prvPtr)(); } | |
| friend void InvokeTimerHandler(const NotificationComponent::Ptr& nc) { return (*nc.*prvPtr)(); } |
| // Store the old periods from the config snippets to reuse them later. | ||
| m_AllTimePeriod = m_TimePeriod->GetRanges(); | ||
|
|
||
| Service::OnNotificationSentToUser.connect(NotificationSentToUserHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using service to register the signal here looks odd! Why not just use Checkable instead?
| // Rerunning the timer before the next interval should not trigger a reminder notification. | ||
| NotificationTimerHandler(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the interval set to 0.15 at the beginning of this test case? How can you be sure that it's going to call this function before the 15ms interval has passed? Apart from that, the WaitForExpectedNotificationCount call above can block 5ms in the worst case, so I don't get how this all fit with what this comment says.
If I understand what this test is supposed to achieve, then I would just call NotificationTimerHandler() repeatedly before performing any assertions above instead, so you only get a very tight time window for race conditions and these assertions can be merged with the above ones.
Bug Description 🐞
When multiple state changes happen outside a
TimePeriod,NotificationComponent::FireSuppressedNotifications()will clear notifications that don't apply too soon, leading the state of the suppressed notifications to become inconsistent.icinga2/lib/notification/notificationcomponent.cpp
Lines 86 to 91 in 5d46ca4
Ultimately this will cause any suppressed notifications to be discarded when the
TimePeriodresumes.Fix Description 🔧
The trivial fix is to just move the check on
NotificationReasonApplies()inside thetp->IsInside()condition, thereby removing them after theTimePeriodresumes and leaving the list of suppressed notifications intact until no further notifications will be added to it.I'm not entirely certain what the purpose of that check is, but the only scenario I could think of is when a
TimePeriodbegins and a check result produces a notification beforeNotificationTimerHandlercan run and send out the suppressed notifications. I've added a unit-test to ensure that behavior is the same as before.@Al2Klimov, since you're the one who wrote the original code, maybe you can give other scenarios where this check might be affecting and that could potentially stop working with this PR. In that case we can add additional test-cases and look for a more complex fix.
Unit-Tests ✅
I've added tests for some common scenarios including the bugged behavior of the issue this PR fixes. This way the behavior of the
NotificationComponentcan be verified before and after this PR. I'm happy to add additional test-cases for behavior that might be affected by this issue and the fix.Fixes #10575