Skip to content

Commit bf94fdd

Browse files
Merge pull request #558 from liangxia/timeout
Add doc for longer timeout
2 parents a7a7381 + abb5cd4 commit bf94fdd

File tree

1 file changed

+32
-4
lines changed

1 file changed

+32
-4
lines changed

content/en/docs/architecture/timeouts.md

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,13 +64,14 @@ In OpenShift CI, this timeout and grace period apply to the `ci-operator` orches
6464

6565
```yaml
6666
plank: # Prow's controller to launch Pods for jobs
67-
default_decoration_configs:
68-
'*':
69-
grace_period: 30m0s
67+
default_decoration_config_entries:
68+
- config:
69+
grace_period: 1h0m0s
7070
timeout: 4h0m0s
71-
'org/repo': # overwrite the job timeout at repo level
71+
- config:
7272
grace_period: 45m0s
7373
timeout: 6h0m0s
74+
repo: org1/repo1 # overwrite the job timeout at repo level
7475
```
7576

7677
In special cases, long-running, generated jobs can raise the cap with job-specific configuration [like][generated-timeout-example]:
@@ -148,6 +149,33 @@ ref:
148149
The `pod.spec.activeDeadlineSeconds` setting on a `Pod` only implicitly bounds the amount of time that a `Pod` executes for on a Kubernetes cluster. The active deadline begins at the first moment that a `kubelet` acknowledges the `Pod`, which is after it is scheduled to a specific node but before it pulls images, sets up a container sandbox, _etc_. It is therefore possible to exceed the active deadline without ever having a container in the `Pod` execute. Please see the [API documentation](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#podspec-v1-core) for more details. For these reasons, no timeout configured in the system makes use of this setting, instead relying on a thin wrapper around the executing code that's injected by Prow itself.
149150
{{< /alert >}}
150151

152+
#### How to configure a customized timeout
153+
154+
If you need a longer timeout than the default 24 hours, but no more than 72 hours,
155+
At [repository level](https://github.com/openshift/release/blob/6a5999d35c9bedca66a608cf5a9a2ad6bff49712/core-services/prow/02_config/_config.yaml#L442), add a `config` section for your repo as below,
156+
```yaml
157+
plank:
158+
default_decoration_config_entries:
159+
...
160+
- config:
161+
grace_period: 1h30m0s
162+
timeout: 36h0m0s
163+
repo: org2/repo2 # overwrite the job timeout at repo level
164+
```
165+
At [job level](https://github.com/openshift/release/blob/5f3a72424aeee5027525e6dd471235139ef77108/ci-operator/config/openshift/release/openshift-release-master__ci-4.21.yaml#L88), add a `timeout` field for your job as below,
166+
```yaml
167+
- as: any-job-name-you-have
168+
interval: 4h
169+
steps:
170+
cluster_profile: aws-2
171+
workflow: openshift-upgrade-aws-ovn
172+
timeout: 36h0m0s
173+
```
174+
175+
{{< alert title="Note" color="info" >}}
176+
If you use a longer timeout, you might also need to reach to [DPP team](https://devservices.dpp.openshift.com/support/) to make sure your cloud account allows running OCP clusters longer than this timeout.
177+
{{< /alert >}}
178+
151179
## How Interruptions May Be Handled
152180

153181
Two main approaches exist to handling interruptions for a test process: first, the test process itself may listen for and handle `SIGTERM`; second, `post` steps may be declared in a test `workflow` to be run after an interruption occurs. The first approach is most useful when relevant state for responding to the interrupt exists only in the test process itself, and the response is fairly short. This approach has the downside of requiring complex test process code and signal handling implementation. The second approach is suggested as it is more robust and tunable. In this approach, state needed to respond to the interrupt should be stored in the [`${SHARED_DIR}`](/docs/architecture/step-registry/#sharing-data-between-steps) for use by the `post` step. The `post` step may be marked as [best-effort](/docs/architecture/step-registry/#marking-post-steps-best-effort) if it only gathers artifacts or cleans up resources. Examples of both approaches follow.

0 commit comments

Comments
 (0)