Skip to content

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented Sep 19, 2025

TBD

Rationale for this change

What changes are included in this PR?

  • Move check_labels and report_ci jobs to independent reusable workflows.
  • Update cpp_extra to use those.
  • Create new linux_packaging.yml workflow replicating work that was done on crossbow. Integrate that workflow with check_labels and report_ci
  • Update release binary submit and binary download to run workflow and download the artifacts from the job instead of from the crossbow repository.

Are these changes tested?

Some via CI on fork and some manual testing.

Are there any user-facing changes?

No

Copy link

⚠️ GitHub issue #47582 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Sep 19, 2025
Copy link
Member Author

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kou I've started working on this and I think I have a working base that could work. I wanted to discuss it with you before moving more packages to validate we are all ok with the approach.
Let me know your thoughts. I have a couple of questions to discuss too, let me know what you think on those too.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 19, 2025
uses: ./.github/workflows/check_labels.yml
secrets: inherit
with:
label: "CI: Extra"
Copy link
Member Author

@raulcd raulcd Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to move these labels to either:

  • CI Extra: cpp
  • CI Extra: CPP
  • CI: cpp
  • CI: CPP

So we can have the others the same like:

  • CI Extra: linux-packaging
  • CI Extra: Linux Packaging
  • CI: linux-packaging
  • CI: Linux Packaging

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like CI: Extra: C++/CI: Extra: Package: Linux or CI: cpp/CI: package: linux/CI: package: linux: apt.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 19, 2025
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Sep 19, 2025
…y are present on arm64 too and add folder for specific job artifact on check
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

uses: ./.github/workflows/check_labels.yml
secrets: inherit
with:
label: "CI: Extra"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like CI: Extra: C++/CI: Extra: Package: Linux or CI: cpp/CI: package: linux/CI: package: linux: apt.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 19, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 19, 2025
@github-actions github-actions bot added the awaiting changes Awaiting changes label Sep 23, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 23, 2025
Comment on lines -1654 to -1655
# apache/arrow-adbc uses debian-bookworm. So the following
# glob must much both of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

But I can change ADBC to simplify this script.

run: |
set -ex
pushd dev/tasks/linux-packages
tar cvzf ${{ matrix.id }}.tar.gz */${TASK_NAMESPACE}/repositories
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify paths more to simplify binary-task.rb:

Suggested change
tar cvzf ${{ matrix.id }}.tar.gz */${TASK_NAMESPACE}/repositories
mkdir -p artifacts
cp -a */${TASK_NAMESPACE}/repositories/* artifacts/
tar cvzf ${{ matrix.id }}.tar.gz -C artifacts .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to simplify the paths slightly more but I am finding quite difficult to understand how all tangles together. How do we generate the repositories from those rpms, how we merge these files with other existing files from the repo, how do we create the testing repositories, etcetera so I am reverting my latest changes. I am happy to leave it like this (current generated structure) and try to simplify on a future PR or if you want to guide me on how to do it or if you want to push to this branch.


# Wait for the GitHub Workflow that creates the Linux packages
# to finish before downloading the artifacts.
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need . here because all information is passed via arguments:

Suggested change
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml"
"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml"

# glob must much both of them.
Dir.glob("#{source_dir_prefix}*/*") do |path|
base_name = File.basename(path)
destination_prefix = "#{incoming_dir}/#{distribution}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing Ruby!

Sure. I can help. Can I push something to this branch if it's needed?

BTW, destination_dir is better for this variable name because this is a directory name not a prefix of path. (In this code, "prefix" is used for abc for abcde directory and "dir" is used for abcde directory.)

# Wait for the GitHub Workflow that creates the GitHub Release
# to finish before updating the release notes.
"${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}"
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${TAG}" "${WORKFLOW}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need . here.

Comment on lines +43 to +50
download_artifacts() {
RUN_ID=$(get_run_id)
echo "Downloading artitfacts for workflow with ID: ${RUN_ID}"
gh run download \
${RUN_ID} \
--repo "${REPOSITORY}" \
--dir "$1"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Defining this function in this file isn't good because this file name is watch-gh-workflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename the file to utils-gh-workflow.sh but if we use release for the artifacts this won't be necessary.

# to finish before downloading the artifacts.
. "${SOURCE_DIR}/utils-watch-gh-workflow.sh" "${release_tag}" "package_linux.yml"

RUN_ID=$(get_run_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you want to use a function defined in utils-watch-gh-workflow.sh.


RUN_ID=$(get_run_id)
# Download the artifacts created by the package_linux.yml workflow
download_artifacts "${SOURCE_DIR}/../../packages/${CROSSBOW_JOB_ID}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using GitHub Release for RC instead of GitHub Actions artifacts? If we use GitHub Release, we don't need RUN_ID: gh release download apache-arrow-${VERSION}-rc${RC} ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you mean, for scheduled or PR/push use artifacts but if workflow_dispatch was used, which is trigger of RC, upload to release? I don't want to create a release every time the job is executed but that could work. In that case we might use artifacts to test and upload to release if tests are successful and job type was workflow_dispatch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to upload artifacts to GitHub Release only when apache-arrow-${VERSION}-rc${RC} tag is pushed. Can we use on.push.tags instead of workflow_dispatch for RC? (I can provide changes for the approach.)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 23, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Sep 23, 2025
…m/repositories subfolders

Try fixing yum tests
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 25, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 25, 2025
Co-authored-by: Sutou Kouhei <[email protected]>
@raulcd
Copy link
Member Author

raulcd commented Sep 25, 2025

@kou I think there are two main things missing here:

  • whether we want to further simplify the repositories structure and the .tar.gz (this can be done on a separate PR)
  • whether we want to use the tag instead of workflow dispatch for the release and use gh release download instead of gh run download. If we don't want to do this on this PR I should probably rename utils-watch-gh-workflow.sh to utils-gh-workflow.sh so I can source the file and add the new function.

I can work on the second point regardless of the approach (unless you want to do the tag part).

@kou
Copy link
Member

kou commented Sep 25, 2025

Can I work on this next week? I'll push something to this branch.

@raulcd
Copy link
Member Author

raulcd commented Sep 25, 2025

Can I work on this next week? I'll push something to this branch.

Absolutely. I will try to deeply review the changes for simplifying the repositories structure so I can learn more about how we build those packages. It would be good to spread the knowledge to help maintaining those.
Thank you very much @kou .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting change review Awaiting change review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants