Skip to content

Scheduler instance should update heartbeat in every iteration #396

@kevinwallimann

Description

@kevinwallimann

Additional context

Please read about the job scheduler first: https://github.com/AbsaOSS/hyperdrive-trigger/wiki/How-the-scheduler-works

A PR for this issue already exists, see #415. The logic is fine, but the tests should not be commented out and fixed if needed.

Describe the bug

Currently, the scheduler instance does not necessarily update its heartbeat in every iteration.
In JobScheduler, if runningAssignWorkflows is not completed, the heartbeat is not updated. If runningAssignWorkflows takes more time than the configured lagThreshold, the instance will be wrongly determined to be lagging behind and deactivated by another instance. An instance should only be deactivated if it isn't responding at all (e.g. due to network problems), but not if it's just under high load.

To Reproduce

It's hard to reproduce this issue, as it will occur in reality only if the database connection is very slow. For testing purposes, the issue could be reproduced by instrumenting the code, e.g. by adding an application property that sleeps for 5 seconds in the WorkflowBalancer.getAssignedWorkflows method

Expected behavior

The scheduler instance heartbeat should be written to the database every 5 seconds, even if runningAssignWorkflows is not finished yet.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions