Skip to content

Conversation

@yhabteab
Copy link
Member

@yhabteab yhabteab commented Sep 22, 2025

So, after the long discussions in #10082 that didn't lead to a final conclusion, here's a minimal implementation without too many refactorings and changes to existing logic. This PR changes the checker queueing behaviour a bit to never reschedule checkables with ongoing checks. Thus, we avoid piling up checkables with running checks in the queue that will never be executed anyway. Though, since this change affects a critical part of Icinga 2, we should be very careful with this not to introduce any regressions or performance degradations. Thus far, I've let it run some checks with 500ms intervals and I didn't notice any issues (see the attached graph), though the graph shows at "10:35" a small hiccup caused by some API requests triggered by me. Another, behaviour change is that previously the checker didn't trigger the now dropped Checkable::OnNextCheckUpdated signal when active checks were disabled. However, since the Icinga DB SendNextUpdate function already has a logic to send a ZREM Redis command when active checks are disabled, the checker now also triggers the Checkable::OnNextCheckChanged signal unconditionally.

Plot Icinga 2

Now, to the individual changes, each commit is self explaining what it does and why it's needed and contains some more details about the reasoning behind it. So, please have a look at the individual commits.

Tests

Just like before a simple /v1/actions/reschedule-check request triggers two icinga:nextupdate:host events.

docker exec -i redis redis-cli -p 6380 -n 12 monitor
OK
----
1758547635.252940 [12 172.18.0.1:57800] "HSET" "icinga:host:state" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "{\"affects_children\":false,\"check_attempt\":1,\"check_source\":\"mbp-yhabteab.fritz.box\",\"check_timeout\":60000,\"environment_id\":\"30e34a7a6c432baf29e051ed428aee3825f25e67\",\"execution_time\":0,\"hard_state\":0,\"host_id\":\"4c80e4ad47034480b3c3bb28885970b6bb50983e\",\"id\":\"4c80e4ad47034480b3c3bb28885970b6bb50983e\",\"in_downtime\":false,\"is_acknowledged\":false,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":true,\"is_reachable\":true,\"is_sticky_acknowledgement\":false,\"last_state_change\":1758545183248,\"last_update\":1758545183248,\"latency\":0,\"next_check\":1758547635218,\"next_update\":1758547695218,\"normalized_performance_data\":\"wrong=85;0;0;0 ok=1;0;0;0 max=215.600000;0;0;0\",\"output\":\" /dev/dm-13: 1.0MiB/s read1, 4.3KiB/s write1, 1.0MiB/s total, 215.6MiB/s max\",\"performance_data\":\"wrong=85;0;0;0; ok=1;0;0;0; max=215.6;0;0;0;\",\"previous_hard_state\":99,\"previous_soft_state\":0,\"scheduling_source\":\"mbp-yhabteab.fritz.box\",\"severity\":2080,\"soft_state\":1,\"state_type\":\"soft\"}"
1758547635.252982 [12 172.18.0.1:57800] "HSET" "icinga:checksum:host:state" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "{\"checksum\":\"b4f84390d527074ed45087fc3ca7bf581dda9522\"}"
1758547635.253002 [12 172.18.0.1:57800] "ZADD" "icinga:nextupdate:host" "1758547695.218023" "4c80e4ad47034480b3c3bb28885970b6bb50983e"
1758547635.255312 [12 172.18.0.1:57800] "HSET" "icinga:host:state" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "{\"affects_children\":false,\"check_attempt\":1,\"check_commandline\":\"'dummy'\",\"check_source\":\"mbp-yhabteab.fritz.box\",\"check_timeout\":60000,\"environment_id\":\"30e34a7a6c432baf29e051ed428aee3825f25e67\",\"execution_time\":0,\"hard_state\":0,\"host_id\":\"4c80e4ad47034480b3c3bb28885970b6bb50983e\",\"id\":\"4c80e4ad47034480b3c3bb28885970b6bb50983e\",\"in_downtime\":false,\"is_acknowledged\":false,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":false,\"is_reachable\":true,\"is_sticky_acknowledgement\":false,\"last_state_change\":1758547635219,\"last_update\":1758547635219,\"latency\":1,\"next_check\":1758547930999,\"next_update\":1758548231003,\"output\":\"Check was successful.\",\"previous_hard_state\":0,\"previous_soft_state\":1,\"scheduling_source\":\"mbp-yhabteab.fritz.box\",\"severity\":0,\"soft_state\":0,\"state_type\":\"hard\"}"
1758547635.255358 [12 172.18.0.1:57800] "HSET" "icinga:checksum:host:state" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "{\"checksum\":\"23738343385294cead94b823a79916c6620dd3b4\"}"
1758547635.255374 [12 172.18.0.1:57800] "ZADD" "icinga:nextupdate:host" "1758548231.003308" "4c80e4ad47034480b3c3bb28885970b6bb50983e"
1758547635.255827 [12 172.18.0.1:57800] "XADD" "icinga:runtime:state" "MAXLEN" "~" "1000000" "*" "runtime_type" "upsert" "redis_key" "icinga:host:state" "checksum" "23738343385294cead94b823a79916c6620dd3b4" "affects_children" "0" "check_attempt" "1" "check_commandline" "'dummy'" "check_source" "mbp-yhabteab.fritz.box" "check_timeout" "60000" "environment_id" "30e34a7a6c432baf29e051ed428aee3825f25e67" "execution_time" "0" "hard_state" "0" "host_id" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "id" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "in_downtime" "0" "is_acknowledged" "0" "is_active" "1" "is_flapping" "0" "is_handled" "0" "is_problem" "0" "is_reachable" "1" "is_sticky_acknowledgement" "0" "last_state_change" "1758547635219" "last_update" "1758547635219" "latency" "1" "next_check" "1758547930999" "next_update" "1758548231003" "output" "Check was successful." "previous_hard_state" "0" "previous_soft_state" "1" "scheduling_source" "mbp-yhabteab.fritz.box" "severity" "0" "soft_state" "0" "state_type" "hard"

... while a /v1/actions/process-check-result only triggers a single icinga:nextupdate:event.

1758547833.559175 [12 172.18.0.1:57800] "HSET" "icinga:host:state" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "{\"affects_children\":false,\"check_attempt\":1,\"check_source\":\"mbp-yhabteab.fritz.box\",\"check_timeout\":60000,\"environment_id\":\"30e34a7a6c432baf29e051ed428aee3825f25e67\",\"execution_time\":0,\"hard_state\":0,\"host_id\":\"4c80e4ad47034480b3c3bb28885970b6bb50983e\",\"id\":\"4c80e4ad47034480b3c3bb28885970b6bb50983e\",\"in_downtime\":false,\"is_acknowledged\":false,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":true,\"is_reachable\":true,\"is_sticky_acknowledgement\":false,\"last_state_change\":1758547833522,\"last_update\":1758547833522,\"latency\":0,\"next_check\":1758548133522,\"next_update\":1758548193522,\"normalized_performance_data\":\"wrong=85;0;0;0 ok=1;0;0;0 max=215.600000;0;0;0\",\"output\":\" /dev/dm-13: 1.0MiB/s read1, 4.3KiB/s write1, 1.0MiB/s total, 215.6MiB/s max\",\"performance_data\":\"wrong=85;0;0;0; ok=1;0;0;0; max=215.6;0;0;0;\",\"previous_hard_state\":0,\"previous_soft_state\":0,\"scheduling_source\":\"mbp-yhabteab.fritz.box\",\"severity\":2080,\"soft_state\":1,\"state_type\":\"soft\"}"
1758547833.559289 [12 172.18.0.1:57800] "HSET" "icinga:checksum:host:state" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "{\"checksum\":\"0ed63c1311be890a7ad5ffae340a3489db7d57ca\"}"
1758547833.559305 [12 172.18.0.1:57800] "ZADD" "icinga:nextupdate:host" "1758548193.522986" "4c80e4ad47034480b3c3bb28885970b6bb50983e"
1758547833.559430 [12 172.18.0.1:57800] "XADD" "icinga:runtime:state" "MAXLEN" "~" "1000000" "*" "runtime_type" "upsert" "redis_key" "icinga:host:state" "checksum" "0ed63c1311be890a7ad5ffae340a3489db7d57ca" "affects_children" "0" "check_attempt" "1" "check_source" "mbp-yhabteab.fritz.box" "check_timeout" "60000" "environment_id" "30e34a7a6c432baf29e051ed428aee3825f25e67" "execution_time" "0" "hard_state" "0" "host_id" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "id" "4c80e4ad47034480b3c3bb28885970b6bb50983e" "in_downtime" "0" "is_acknowledged" "0" "is_active" "1" "is_flapping" "0" "is_handled" "0" "is_problem" "1" "is_reachable" "1" "is_sticky_acknowledgement" "0" "last_state_change" "1758547833522" "last_update" "1758547833522" "latency" "0" "next_check" "1758548133522" "next_update" "1758548193522" "normalized_performance_data" "wrong=85;0;0;0 ok=1;0;0;0 max=215.600000;0;0;0" "output" " /dev/dm-13: 1.0MiB/s read1, 4.3KiB/s write1, 1.0MiB/s total, 215.6MiB/s max" "performance_data" "wrong=85;0;0;0; ok=1;0;0;0; max=215.6;0;0;0;" "previous_hard_state" "0" "previous_soft_state" "0" "scheduling_source" "mbp-yhabteab.fritz.box" "severity" "2080" "soft_state" "1" "state_type" "soft"

fixes #10082

@cla-bot cla-bot bot added the cla/signed label Sep 22, 2025
@yhabteab yhabteab added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) ref/IP labels Sep 22, 2025
Copy link
Contributor

@jschmidt-icinga jschmidt-icinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically, the DB will only be updated in these cases:

  • In the non-forced External Command Pipe commands
  • Unconditionally via the RescheduleCheck ApiAction
  • When scheduling a check that is not forced, is active, and enable_service_checks is true

In all other cases we just trigger internal state-changes.

I'm not sure the SetNextCheck() is the right place to make the decision about which event to persist to the DB, but I get that this is a low impact change that fixes the issue, and so I don't want to restart the bikeshedding from #10082.

Comment on lines 154 to 158
if (host && !service && (!checkable->GetEnableActiveChecks() || !icingaApp->GetEnableHostChecks())) {
activeChecksDisabled = !checkable->GetEnableActiveChecks() || !icingaApp->GetEnableServiceChecks();
if (activeChecksDisabled && host && !service) {
Log(LogNotice, "CheckerComponent")
<< "Skipping check for host '" << host->GetName() << "': active host checks are disabled";
check = false;
}
if (host && service && (!checkable->GetEnableActiveChecks() || !icingaApp->GetEnableServiceChecks())) {
} else if (activeChecksDisabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain this diff a bit more?

Why are we now checking icingaApp->GetEnableServiceChecks() in the host case? Or is that a mistake?

Is this change meaningful for what this PR does or just a small refactoring since we need the information for the activeChecksDisabled flag?

Also, since we're touching this anway, maybe it can be simplified to have less duplication, since all that's different is the word "host"/"service" in the log message.

@yhabteab yhabteab removed the request for review from julianbrost September 24, 2025 07:09
@yhabteab yhabteab marked this pull request as draft September 24, 2025 07:09
This commit changes the ordering of CheckableScheduleInfo in the
multi-index container to ensure that checkables with running checks are
pushed to the end of the ordering. This prevents them from being
prioritized for scheduling ahead of others, which could lead to
unnecessary CPU load due to repeated scheduling attempts. By using a
very large value for the index of checkables with running checks, they
are effectively deprioritized until their current check is completed and
they can be reinserted with their actual next check time.
Since the scheduler accounts for already running checks, we only need to
update the `next_check` timestamp in `Checkable::ExecuteCheck()` only
where it actually makes sense to do so, and as for local checks this
doesn't make sense at all. There only two cases where we need to update
the next check beforehand:

1) The execute command event is sent to a connected remote endpoint, so
   we need to set the next check to a time in the future until we actually
   receive the check result back from the remote endpoint. However, it must
   not be too far in the future to avoid that the check is not re-run for
   too long in case the remote endpoint never responds.
2) The check is a remote check, but either the endpoint is currently syncing
   replay logs or not connected at all, and we are within the magical 5min
   cold startup window. In these cases, the check is effectively skipped, and
   there will be no check result for it coming in, we manually update the next
   check normally as if the check was executed.

In the other cases, either the check is executed locally, which means the
`m_RunningCheck` flag already prevents the scheduler from re-running the check,
or this is a remote check and the endpoint is not connected, but we are outside
the cold startup window, in which case we also don't do anything as we've already
called `Checkable::ProcessCheckResult()` with an appropriate error state, which
in turn will call `Checkable::UpdateNextCheck()`.
@yhabteab
Copy link
Member Author

yhabteab commented Sep 24, 2025

I've changed the implementation according to our offline discussion with @jschmidt-icinga yesterday and with a bunch of comments added, the change list is a bit bigger than before but it's way better, I think. So please have a look!

PS: I've also updated the PR description!

@yhabteab yhabteab marked this pull request as ready for review September 24, 2025 09:54
@jschmidt-icinga
Copy link
Contributor

jschmidt-icinga commented Sep 24, 2025

Though, since this change affects a critical part of Icinga 2, we should be very careful with this not to introduce any regressions or performance degradations.

Would it be possible to add test-cases for all permutations of state changes? The checker component doesn't have any dependencies on its own and you could make a simple mock object connecting to the same events the databases receive. Then reschedule checks on checkables and verify everything is updated and consistent.

Also note that Unit-Tests act as a kind of documentation too. Well commented tests sometimes describe what that code does better than the code itself. (Speaking as someone who still doesn't grasp the entirety of check scheduling in its complexity)

Edit: To clarify: I don't mean a full unit-test suite for the entire checker component, though that might be useful in the future, but for now, maybe just a few cases for the specific issue fixed in this PR.

This commit introduces a new kinda special `OnRescheduleCheck` signal
that is emitted whenever we want to inform the checker to reschedule the
checkable at a specific timestamp without actually changing the next
check time. Previously, we called `SetNextCheck` with some random
timestamp just to enforce the checker to either pick it up immediately
or at a specific time. Then at some point in time, subscribing to the
`OnNextCheckChanged` signal became effectively unusable for any other
purpose than to inform the checker about a new next check time. Thus,
it resulted in introducing a new signal that is solely responsible for
informing the Icigna DB and IDO about a new next check time in places
where calling `SetNextCheck` did make sense. This commit does quite the
opposite: it replaces all calls to `SetNextCheck` that were only used to
inform the checker about a new next check time wit `OnRescheduleCheck`
calls. Only places where we actually wanted to change the next check time
still call `SetNextCheck` and thus inform the checker and all other listeners
about the new next check time. And as a bonus point, we now got rid of the
two object locks for child and parent at the same time.
It also removes the extra `SendNextUpdate()` call from the
`NewCheckResultHandler` handler in Icinga DB, since it's subscribed to
the `NextCheckChanged` event anyway and that event is always emitted
before the `NewCheckResult` event gets triggered. This call became
redundant.
@yhabteab
Copy link
Member Author

I've added basic unittests for the checker now as requested, so the PR size has drastically increased now :)! But I hope, the tests gives you now a little overview how the checker works or is actually supposed to work.

Copy link
Contributor

@jschmidt-icinga jschmidt-icinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, amazing work adding these tests with all that infrastructure in what must only have been a few hours of time.

I'm not at all concerned with the increase in diff size due to the tests, as it is pretty clear that the production code actually gets a reduction in complexity from this PR.

For now I'm only going to comment on some of the test methods used, but I'm also continuing to look at what they do and the rest of the PR, which looks very promising.

If the logger is started with `Activate()` before `SetActive()`, it won't
log anything, as the logger updates the "min severity" value of loggers
only when starting them, and if they're not active at that point, they
will just be ignored, so the min severity remains at info.
@yhabteab
Copy link
Member Author

I think, I've addressed all your requested changes. Plus, our checker should now also have a roughly 70-90% code coverage. I've added some more unit tests to cover for remotely executed checks aka command_endpoint.

Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test: redis-cli MONITOR with one check.

c2c9b6b

1759312891.029505 [0 192.168.65.1:60842] "HSET" "icinga:host:state" "d7e93a933920b2a35baf2479691400a961d92311" "{\"affects_children\":false,\"check_attempt\":1,\"check_commandline\":\"'dummy'\",\"check_source\":\"ws-aklimov7777777.local\",\"check_timeout\":60000,\"environment_id\":\"824bc16396c1df319c0a1e50a335de0df3d2ce9d\",\"execution_time\":0,\"hard_state\":0,\"host_id\":\"d7e93a933920b2a35baf2479691400a961d92311\",\"id\":\"d7e93a933920b2a35baf2479691400a961d92311\",\"in_downtime\":false,\"is_acknowledged\":false,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":false,\"is_reachable\":true,\"is_sticky_acknowledgement\":false,\"last_state_change\":1759312863888,\"last_update\":1759312891027,\"latency\":7,\"next_check\":1759312901019,\"next_update\":1759312911035,\"output\":\"Check was successful.\",\"previous_hard_state\":99,\"previous_soft_state\":99,\"scheduling_source\":\"ws-aklimov7777777.local\",\"severity\":0,\"soft_state\":0,\"state_type\":\"hard\"}"
1759312891.029533 [0 192.168.65.1:60842] "HSET" "icinga:checksum:host:state" "d7e93a933920b2a35baf2479691400a961d92311" "{\"checksum\":\"acb9e73a6beec90b4ed53726d693d85f30085e77\"}"
1759312891.029545 [0 192.168.65.1:60842] "ZADD" "icinga:nextupdate:host" "1759312911.035350" "d7e93a933920b2a35baf2479691400a961d92311"

5f3b7d2

1759313097.299299 [0 192.168.65.1:32736] "HSET" "icinga:host:state" "d7e93a933920b2a35baf2479691400a961d92311" "{\"affects_children\":false,\"check_attempt\":1,\"check_commandline\":\"'dummy'\",\"check_source\":\"ws-aklimov7777777.local\",\"check_timeout\":60000,\"environment_id\":\"824bc16396c1df319c0a1e50a335de0df3d2ce9d\",\"execution_time\":0,\"hard_state\":0,\"host_id\":\"d7e93a933920b2a35baf2479691400a961d92311\",\"id\":\"d7e93a933920b2a35baf2479691400a961d92311\",\"in_downtime\":false,\"is_acknowledged\":false,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":false,\"is_reachable\":true,\"is_sticky_acknowledgement\":false,\"last_state_change\":1759312863888,\"last_update\":1759313097297,\"latency\":6,\"next_check\":1759313106638,\"next_update\":1759313116651,\"output\":\"Check was successful.\",\"previous_hard_state\":99,\"previous_soft_state\":99,\"scheduling_source\":\"ws-aklimov7777777.local\",\"severity\":0,\"soft_state\":0,\"state_type\":\"hard\"}"
1759313097.299331 [0 192.168.65.1:32736] "HSET" "icinga:checksum:host:state" "d7e93a933920b2a35baf2479691400a961d92311" "{\"checksum\":\"6a8714aa442b2d076193cfa3baf4603a35cf5fec\"}"
1759313097.299346 [0 192.168.65.1:32736] "ZADD" "icinga:nextupdate:host" "1759313116.651438" "d7e93a933920b2a35baf2479691400a961d92311"
1759313097.300295 [0 192.168.65.1:32736] "HSET" "icinga:host:state" "d7e93a933920b2a35baf2479691400a961d92311" "{\"affects_children\":false,\"check_attempt\":1,\"check_commandline\":\"'dummy'\",\"check_source\":\"ws-aklimov7777777.local\",\"check_timeout\":60000,\"environment_id\":\"824bc16396c1df319c0a1e50a335de0df3d2ce9d\",\"execution_time\":0,\"hard_state\":0,\"host_id\":\"d7e93a933920b2a35baf2479691400a961d92311\",\"id\":\"d7e93a933920b2a35baf2479691400a961d92311\",\"in_downtime\":false,\"is_acknowledged\":false,\"is_active\":true,\"is_flapping\":false,\"is_handled\":false,\"is_problem\":false,\"is_reachable\":true,\"is_sticky_acknowledgement\":false,\"last_state_change\":1759312863888,\"last_update\":1759313097297,\"latency\":6,\"next_check\":1759313106638,\"next_update\":1759313116651,\"output\":\"Check was successful.\",\"previous_hard_state\":99,\"previous_soft_state\":99,\"scheduling_source\":\"ws-aklimov7777777.local\",\"severity\":0,\"soft_state\":0,\"state_type\":\"hard\"}"
1759313097.300320 [0 192.168.65.1:32736] "HSET" "icinga:checksum:host:state" "d7e93a933920b2a35baf2479691400a961d92311" "{\"checksum\":\"6a8714aa442b2d076193cfa3baf4603a35cf5fec\"}"

Conclusion

Smells like #9501, I didn't even test state changes yet.

Consider a dead simple additional OnNextCheckUpdated cluster event instead.

* the magical 5min cold startup window. In both cases, we just don't do anything and wait for
* the next check interval to re-try the check again. So, this check is effectively skipped.
*/
UpdateNextCheck();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad if my check runs once a day. (The else if case at least schedules the next check just one retry interval in the future.)

if (child->GetProblem() && child->GetEnableActiveChecks()) {
auto nextCheck (now + Utility::Random() % 60);

ObjectLock oLock (child);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure the lock is no longer needed? Below you still have a get and kind of a set operation.

continue;

if (parent->GetNextCheck() >= now + parent->GetRetryInterval()) {
ObjectLock olock(parent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. While on it, I guess it makes sense for the lock to cover both operations, get and set. Just as in the child case above.

delta *= (double)std::rand() / RAND_MAX;
SetNextCheck(now + delta);
// We only want to jitter the next check a bit, and inform the scheduler about it, so not setting it directly.
SetNextCheck(now + delta, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To "inform the scheduler about it" you suppress the signal?

});

Checkable::OnNextCheckUpdated.connect([](const Checkable::Ptr& checkable) { NextCheckUpdatedHandler(checkable); });
Checkable::OnNextCheckChanged.connect([](const Checkable::Ptr& checkable, const Value&) { NextCheckUpdatedHandler(checkable); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you missed to rename NextCheckUpdatedHandler.


checkable->SetNextCheck(params->Get("next_check"), false, origin);
bool isOriginClientOld = params->Get("origin_client_old").ToBool();
if (isOriginClientOld || endpoint->GetIcingaVersion() < 21600) { // TODO: Exact required version??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to worry about exact versions if you introduce a new capability.

* queues and ensures that checks are not fired multiple times. ProcessCheckResult()
* is called too late. See #6421.
*/
UpdateNextCheck();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this no longer necessary, especially in the local check case? Sure, now you return early if !!m_CheckRunning. But where is this checkable re-indexed inside the scheduler queue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working cla/signed ref/IP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overdue state doesn't honor set time periods

3 participants