-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47582: [CI][Packaging] Move linux-packaging tasks to apache/arrow repository #47600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kou I've started working on this and I think I have a working base that could work. I wanted to discuss it with you before moving more packages to validate we are all ok with the approach.
Let me know your thoughts. I have a couple of questions to discuss too, let me know what you think on those too.
.github/workflows/cpp_extra.yml
Outdated
uses: ./.github/workflows/check_labels.yml | ||
secrets: inherit | ||
with: | ||
label: "CI: Extra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to move these labels to either:
CI Extra: cpp
CI Extra: CPP
CI: cpp
CI: CPP
So we can have the others the same like:
CI Extra: linux-packaging
CI Extra: Linux Packaging
CI: linux-packaging
CI: Linux Packaging
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like CI: Extra: C++
/CI: Extra: Package: Linux
or CI: cpp
/CI: package: linux
/CI: package: linux: apt
.
…he/arrow repository
…y are present on arm64 too and add folder for specific job artifact on check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
.github/workflows/cpp_extra.yml
Outdated
uses: ./.github/workflows/check_labels.yml | ||
secrets: inherit | ||
with: | ||
label: "CI: Extra" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like CI: Extra: C++
/CI: Extra: Package: Linux
or CI: cpp
/CI: package: linux
/CI: package: linux: apt
.
# apache/arrow-adbc uses debian-bookworm. So the following | ||
# glob must much both of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
But I can change ADBC to simplify this script.
run: | | ||
set -ex | ||
pushd dev/tasks/linux-packages | ||
tar cvzf ${{ matrix.id }}.tar.gz */${TASK_NAMESPACE}/repositories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify paths more to simplify binary-task.rb
:
tar cvzf ${{ matrix.id }}.tar.gz */${TASK_NAMESPACE}/repositories | |
mkdir -p artifacts | |
cp -a */${TASK_NAMESPACE}/repositories/* artifacts/ | |
tar cvzf ${{ matrix.id }}.tar.gz -C artifacts . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to simplify the paths slightly more but I am finding quite difficult to understand how all tangles together. How do we generate the repositories from those rpms, how we merge these files with other existing files from the repo, how do we create the testing repositories, etcetera so I am reverting my latest changes. I am happy to leave it like this (current generated structure) and try to simplify on a future PR or if you want to guide me on how to do it or if you want to push to this branch.
|
||
# Wait for the GitHub Workflow that creates the Linux packages | ||
# to finish before downloading the artifacts. | ||
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need .
here because all information is passed via arguments:
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" | |
"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" |
dev/release/binary-task.rb
Outdated
# glob must much both of them. | ||
Dir.glob("#{source_dir_prefix}*/*") do |path| | ||
base_name = File.basename(path) | ||
destination_prefix = "#{incoming_dir}/#{distribution}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing Ruby!
Sure. I can help. Can I push something to this branch if it's needed?
BTW, destination_dir
is better for this variable name because this is a directory name not a prefix of path. (In this code, "prefix" is used for abc
for abcde
directory and "dir" is used for abcde
directory.)
# Wait for the GitHub Workflow that creates the GitHub Release | ||
# to finish before updating the release notes. | ||
"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}" | ||
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need .
here.
download_artifacts() { | ||
RUN_ID=$(get_run_id) | ||
echo "Downloading artitfacts for workflow with ID: ${RUN_ID}" | ||
gh run download \ | ||
${RUN_ID} \ | ||
--repo "${REPOSITORY}" \ | ||
--dir "$1" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Defining this function in this file isn't good because this file name is watch-gh-workflow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could rename the file to utils-gh-workflow.sh
but if we use release for the artifacts this won't be necessary.
# to finish before downloading the artifacts. | ||
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml" | ||
|
||
RUN_ID=$(get_run_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you want to use a function defined in utils-watch-gh-workflow.sh
.
|
||
RUN_ID=$(get_run_id) | ||
# Download the artifacts created by the package_linux.yml workflow | ||
download_artifacts "${SOURCE_DIR}/../../packages/${CROSSBOW_JOB_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using GitHub Release for RC instead of GitHub Actions artifacts? If we use GitHub Release, we don't need RUN_ID
: gh release download apache-arrow-${VERSION}-rc${RC} ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you mean, for scheduled or PR/push use artifacts but if workflow_dispatch
was used, which is trigger of RC, upload to release? I don't want to create a release every time the job is executed but that could work. In that case we might use artifacts to test and upload to release if tests are successful and job type was workflow_dispatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to upload artifacts to GitHub Release only when apache-arrow-${VERSION}-rc${RC}
tag is pushed. Can we use on.push.tags
instead of workflow_dispatch
for RC? (I can provide changes for the approach.)
…m/repositories subfolders Try fixing yum tests
Co-authored-by: Sutou Kouhei <[email protected]>
@kou I think there are two main things missing here:
I can work on the second point regardless of the approach (unless you want to do the tag part). |
Can I work on this next week? I'll push something to this branch. |
Absolutely. I will try to deeply review the changes for simplifying the repositories structure so I can learn more about how we build those packages. It would be good to spread the knowledge to help maintaining those. |
TBD
Rationale for this change
What changes are included in this PR?
check_labels
andreport_ci
jobs to independent reusable workflows.cpp_extra
to use those.linux_packaging.yml
workflow replicating work that was done on crossbow. Integrate that workflow withcheck_labels
andreport_ci
Are these changes tested?
Some via CI on fork and some manual testing.
Are there any user-facing changes?
No