Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a decision task for taskcluster #19401

Merged
merged 1 commit into from
Nov 22, 2019
Merged

Create a decision task for taskcluster #19401

merged 1 commit into from
Nov 22, 2019

Conversation

jgraham
Copy link
Contributor

@jgraham jgraham commented Sep 30, 2019

This replaces the current scheme in which all tasks are specified in the .taskcluster.yml file with a setup in which only an initial "decision" task is specified in that file. The job of the decision task is to examine the event that caused it to be scheduled and the corresponding changes (if any) and select an appropriate set of jobs to run.

Compared to the previous setup of having a single set of tasks this provides some new features:

  • Ability to only schedule those tasks that should actually be run. Previously for PRs we were using a scheme inherited from travis in which all the jobs would be scheduled but a subset of them would bail out without doing anything. This made for inefficient use of resources and poor UI since users would have to look at the results of jobs that didn't run, alongside those of jobs that actually did run.
  • Ability to specify dependencies between tasks. This allows tasks that don't start until another task completes. An example of this partially implemented in this PR is downloading a single browser binary used for all subsequent runs of that browser, ensuring constant versions both for the initial task and also for any retries. So far the dependency is implemented, but not the logic to actually use the downloaded Firefox for subsequent tasks.

There are also various things that now become possible that previously were hard to implement. For example we could invent a tc-try:<jobs> syntax to allow trying specific master runs on a PR without having to push to the triggers/ branches.

The current state of the feature is that tasks are described in a custom YAML format file in tools/ci/tasks/test.yml (there is some intent to allow using multiple tasks in the future). The format of this file and processing model is described in tools/ci/README.md. The setup has been verified on the jgraham/web-platform-tests fork of this repo.

One problem with landing this is that taskcluster always reads the .taskcluster.yml from the branch head, whereas PR tests are run on the merge of the base branch with the PR head. This means that some care is required not to break anything in such merges, or alternatively to rebase all PRs once the change lands.

@ghost
Copy link

ghost commented Sep 30, 2019

Submitting the task to Taskcluster failed. Details

InterpreterError at template.tasks[0].payload.command[3]: object has no property after

@foolip
Copy link
Member

foolip commented Oct 22, 2019

One problem with landing this is that taskcluster always reads the .taskcluster.yml from the branch head, whereas PR tests are run on the merge of the base branch with the PR head. This means that some care is required not to break anything in such merges, or alternatively to rebase all PRs once the change lands.

What's your preference here? Rebasing all PRs would cause a huge spike in CI resource usage and noise that'll probably cause people to mark all their notifications as read for a few days.

What does the "care is required" approach look like in practice? What are the sorts of changes we can't make in the setup after this is landed that would break existing PRs?

See also web-platform-tests/wpt-pr-bot#81 for an idea I proposed yesterday.

Copy link
Member

@foolip foolip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to and including tools/ci/tc/decision.py, need a break :)

owner: "${event.sender.login}@users.noreply.github.com"
source: ${event.repository.clone_url}
payload:
image: harjgam/web-platform-tests:0.33
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be updated if #19754 is merge before this. It'd be a conflict, but the conflicts in this file will be huge.

@jgraham
Copy link
Contributor Author

jgraham commented Nov 4, 2019

I updated this, and replied to some comments, so I think it's ready for a second round of review.

@foolip
Copy link
Member

foolip commented Nov 4, 2019

@jgraham there are still conflicts, can you resolve those first?

@jgraham
Copy link
Contributor Author

jgraham commented Nov 4, 2019

s/still/again/ since all the conflicts were resolved in the last push and then more changes happened on master.

@foolip
Copy link
Member

foolip commented Nov 6, 2019

I didn't notice the intermediate state of no conflicts :)

@jgraham jgraham force-pushed the decision_task branch 6 times, most recently from 620562b to e6e2412 Compare November 11, 2019 15:04
@jgraham jgraham closed this Nov 11, 2019
@jgraham jgraham reopened this Nov 11, 2019
@community-tc-integration
Copy link

Submitting the task to Taskcluster failed. Details

InterpreterError at template.tasks[0].payload.command[3]: object has no property after

@foolip
Copy link
Member

foolip commented Nov 15, 2019

@jgraham more conflicts and CI failures :)

Maybe we should try to work on this sync on IRC at some point to get it green and then immediately reviewed and merged?

@jgraham
Copy link
Contributor Author

jgraham commented Nov 18, 2019

@foolip I found the parts of this that got lost, so AFAIK it's now up to date with master, and the resolved review issues are correct. There are still some open questions, so I anticipate needing at least one more round of fixes.

@jgraham jgraham closed this Nov 18, 2019
@jgraham jgraham reopened this Nov 18, 2019
@community-tc-integration
Copy link

Submitting the task to Taskcluster failed. Details

InterpreterError at template.tasks[0].payload.command[3]: object has no property after

A decision task is the first task run, and ensures that we only
schedule subsequent tasks which are relevant to the current push or
pull request. This dynamic scheduling helps reduce load since we avoid
spinning up workers for tasks that ultimately don't run, and unlocks
new possibilities since we are able to schedule tasks that are
dependent on other tasks.

The tasks, their scheduling criteria and their dependnecies are
specified in a a YAML format configuration file in
tools/ci/tc/tasks/test.yml. This has a bespoke format, adopting some
ideas from Azure and Gecko's taskcluster integration. The format is
documented in `tools/ci/tc/README.md`. The data in that file undergoes
trandformations to produce a set of tasks, which are then filtered
according to the event that caused the decision task to run.

To initially prove out the implementation of dependent tasks we make
the Firefox tasks depend on a download task. But this does not yet
include the work to actually make the dependent tasks download Firefox
from the parent task.
@jgraham jgraham merged commit f57ead1 into master Nov 22, 2019
@jgraham jgraham deleted the decision_task branch November 22, 2019 16:47
@jgraham jgraham restored the decision_task branch November 22, 2019 16:47
@gsnedders gsnedders deleted the decision_task branch January 24, 2020 18:05
@gsnedders gsnedders restored the decision_task branch January 24, 2020 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci infra Taskcluster tox.ini webdriver wg-testing_browser wpt wptrunner The automated test runner, commonly called through ./wpt run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants