Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 33 additions & 4 deletions docs/web/background_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,13 @@ Tasks are generally spawned by API handlers, executed in the control flow of a T

1. An **API** request arrives (later, this might be extended with a _`cron`_ -like scheduler) which exercises an endpoint that results in the need for a task.
2. _(Optionally)_ some conformance checks are executed on the input, in order to not even create the task if the input is ill-formed.
3. A task **`token`** is _`ALLOCATED`_: the record is written into the database, and now we have a unique identifier for the task.
4. The task is **pushed** to the _task queue_ of the CodeChecker server, resulting in the _`ENQUEUED`_ status.
5. The task's identifier **`token`** is returned to the user.
3. A task **`token`** is _`ALLOCATED`_: the **`BackgroundTask`** record is written into the database, and now we have a unique identifier for the task.
4. The task is **pushed** to a shared, synchronised _task queue_ of the CodeChecker server, resulting in the _`ENQUEUED`_ status.
* `AbstractTask` subclasses **MUST** be `pickle`-able and reasonably small.
* The library offers means to store additional large data on the file system, in a temporary directory specific to the task.
5. The **`task token`** is returned to the user via the RPC API call, and the API worker is free too respond to other requests.
6. The API hander exits and the Thrift RPC connection is terminated.
7. In a loop with some frequency, the user exercises the `getTaskInfo()` API (executed in the context of any _API worker_ process, synchronised over the database) to query whether the task was completed, if the user wishes to receive this information.

The API request dispatching of the CodeChecker server has a **`TaskManager`** instance which should be passed to the API handler implementation, if not already available.
Then, you can use this _`TaskManager`_ object to perform the necessary actions to enqueue the execution of a task:
Expand Down Expand Up @@ -118,7 +121,7 @@ The business logic of tasks are implemented by subclassing the _`AbstractTask`_
4. The implementation does its thing, periodically calling _`task_manager.heartbeat()`_ to update the progress timestamp of the task, and, if appropriate, checking with _`task_manager.should_cancel()`_ whether the admins requested the task to cancel or the server is shutting down.
5. If _`should_cancel()`_ returned `True`, the task does some appropriate clean-up, and exits by raising the special _`TaskCancelHonoured`_ exception, indicating that it responded to the request. (At this point, the status becomes either _`CANCELLED`_ or _`DROPPED`_, depending on the circumstances of the service.)
6. Otherwise, or if the task is for some reason not cancellable without causing damage, the task executes its logic.
7. If the task's _`_implementation()`_ method exits cleanly, it reaches the _`COMPLETED`_ status; otherwise, if any exception escapes from the _`_implementation()`_ method, the task becomes _`FAILED`_.
7. If the task's _`_implementation()`_ method exits cleanly, it reaches the _`COMPLETED`_ status; otherwise, if any exception escapes from the _`_implementation()`_ method, the task becomes _`FAILED`_, and exception information is logged into the `BackgroundTask.comments` column of the database.

**Caution!** Tasks, executing in a separate background process part of the many processes spawned by a CodeChecker server, no longer have the ability to synchronously communicate with the user!
This also includes the lack of ability to "return" a value: tasks **only exercise side-effects**, but do not calculate a "result".
Expand Down Expand Up @@ -170,6 +173,32 @@ class MyTask(AbstractTask):
foo(element)
```

### Abnormal path 1: admin cancellation

At any point following _`ALLOCATED`_ status, but most likely in the _`ENQUEUED`_ and _`RUNNING`_ statuses, a **`SUPERUSER`** may issue a _`cancelTask()`_ order.
This will set `BackgroundTask.cancel_flag`, and the task is expected (although not required!) to poll its own _`should_cancel()`_ status internally in checkpoints, and terminate gracefully to this request. This is done by **`_implementation()`** exiting by raising a **`TaskCancelHonoured`** exception.
(If the task does not raise one, it will be allowed to conclude normally, or fail in some other manner.
Tasks cancelled gracefully will have the _`CANCELLED`_ status.

For example, a background task that performs an action over a set of input files generally should be implemented like this:

```py3
def _implementation(tm: TaskManager):
for file in INPUTS:
if tm.should_cancel(self):
ROLLBACK()
raise TaskCancelHonoured(self)

DO_LOGIC(file)
```

### Abnormal path 2: server shutdown

Alternatively, at any point in this life cycle, the server might receive the command to terminate itself (kill signals `SIGINT`, `SIGTERM`; alternatively caused by `CodeChecker server --stop`). Following the termination of _API workers_, the _background workers_ will also shut down one by one.
At this point, the default behaviour is to cause a special _cancel event_ which tasks currently _`RUNNING`_ may still gracefully honour, as-if it was a `SUPERUSER`'s single-task cancel request. All other tasks that have not started executing yet and are in the _`ALLOCATED`_ or _`ENQUEUED`_ status will never start.

All tasks not in a _normal termination state_ will be set to the _`DROPPED`_ status, with the `comments` field containing a log about the specifics of in which state the task was dropped, and why. (Together, _`CANCELLED`_ and _`DROPPED`_ are the _"abnormal termination states"_, indicating that the task terminated due to some external influence.)

Client-side handling
--------------------

Expand Down
46 changes: 45 additions & 1 deletion docs/web/server_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,28 @@ using the package's installed `config/server_config.json` as a template.

Table of Contents
=================
* [Task handling](#task-handling)
* [Number of API worker processes](#number-of-api-worker-processes)
* [Number of task worker processes](#number-of-task-worker-processes)

* [Run limitation](#run-limitations)
* [Storage](#storage)
* [Directory of analysis statistics](#directory-of-analysis-statistics)
* [Limits](#Limits)
* [Maximum size of failure zips](#maximum-size-of-failure-zips)
* [Size of the compilation database](#size-of-the-compilation-database)
* [Keepalive](#keepalive)
* [Idle time](#idle-time)
* [Interval time](#interval-time)
* [Probes](#probes)
* [Authentication](#authentication)
* [Secrets](#secrets)
* [server_secrets.json](#server_secretsjson)
* [Environmental variables](#environmental-variables)

## Task handling

## Number of API worker processes
### Number of API worker processes
The `worker_processes` section of the config file controls how many processes
will be started on the server to process API requests.

Expand All @@ -33,6 +46,37 @@ processes will be started on the server to process background jobs.

The server needs to be restarted if the value is changed in the config file.

### `--machine-id`
Unfortunately, servers don't always terminate gracefully (cue the aforementioned
`SIGKILL`, but also the container, VM, or the host machine could simply die
during execution, in ways the server is not able to handle). Because tasks are
not shared across server processes, and there are crucial bits of information in
the now dead process's memory which would have been needed to execute the task,
a server later restarting in place of a previously dead one should be able to
identify which tasks its "predecessor" left behind without clean-up.

This is achieved by storing the running computer's identifier, configurable via
`CodeChecker server --machine-id`, as an additional piece of information for
each task. By default, the machine ID is constructed from
`gethostname():portnumber`, e.g., `cc-server:80`.

In containerised environments, relying on `gethostname()` may not be entirely
stable! For example, Docker exposes the first 12 digits of the container's
unique hash as the _"hostname"_ of the insides of the container. If the
container is started with `--restart always` or `--restart unless-stopped`, then
this is fine, however, more advanced systems, such as _Docker swarm_ will
**create a new container** in case the old one died (!), resulting in a new
value of `gethostname()`.

In such environments, service administrators must pay additional caution and
configure their instances by setting `--machine-id` for subsequent executions of
the "same" server accordingly. If a server with machine ID **`M`** starts up
(usually after a container or "system" restart), it will set every task not in
any "termination states" and associated with machine ID **`M`** to the
_`DROPPED`_ status (with an appropriately formatted comment accompanying),
signifying that the _previous instance_ "dropped" these tasks, but had no chance
of recording this fact.

## Run limitation
The `max_run_count` section of the config file controls how many runs can be
stored on the server for a product.
Expand Down
Loading