generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 7
A script for setting the pool #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
68cd948
the smallest thing that creates a pool
natemcintosh c833b25
added image and autoscale
natemcintosh 97ffe8a
forgot pool id
natemcintosh 1666c7f
more progress
natemcintosh a1c7dc9
Innovate team says to leave commented for now
natemcintosh 33fb5e6
mostly working, but not how I envisioned it
natemcintosh 6040972
Merge branch 'main' into nam-create-pool-script
natemcintosh 5dc3fba
replace dependency with constant string
natemcintosh 8549fc0
attempt to refactor create pool workflow
natemcintosh f0f6d31
needed to run az login?
natemcintosh 62b26d2
maybe this login is not necessary
natemcintosh c268b04
make cred names match GH creds
natemcintosh 1cc381f
make secrets available as env vars
natemcintosh 0e9b797
try different resource group
natemcintosh 55daa4f
Merge branch 'main' into nam-create-pool-script
natemcintosh 5bf7b92
using wrong env
natemcintosh e273538
Merge branch 'main' into nam-create-pool-script
natemcintosh bc3f47f
try printing out the response
natemcintosh e9de774
Merge branch 'main' into nam-create-pool-script
natemcintosh b332dc1
try just calling the az command
natemcintosh 4fa6f4f
Merge branch 'main' into nam-create-pool-script
natemcintosh 09f2139
use env pool id
natemcintosh a94c1c5
This works, all the way through post
natemcintosh bb27868
try getting rid of the config toml stuff
natemcintosh 6d656b1
once again using the wrong secret
natemcintosh 9188dae
updating subnet_id var
giomrella 03d1246
updating container_image_name var
giomrella 511ee77
use url not server
natemcintosh 586ae59
use server not url
natemcintosh 190f8c0
moving from self hosted runner to runner-action (#253)
giomrella cf47711
keep env vars up to date
natemcintosh 161331f
Removed invalid condition referencing steps.check_pool_id.outputs.poo…
giomrella ddf8495
Merge branch 'main' into nam-create-pool-script
micahwiesner67 9825eec
update docs on script
natemcintosh 0335295
remove unused files
natemcintosh 203ecc9
Merge branch 'main' into nam-create-pool-script
natemcintosh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# /// script | ||
# requires-python = ">=3.13" | ||
# dependencies = [ | ||
# "azure-batch", | ||
# "azure-identity", | ||
# "azure-mgmt-batch", | ||
# "msrest", | ||
# ] | ||
# /// | ||
""" | ||
If running locally, use: | ||
uv run --env-file .env .github/scripts/create_pool.py | ||
Requires a `.env` file with at least the following: | ||
BATCH_ACCOUNT="<batch account name>" | ||
SUBSCRIPTION_ID="<azure subscription id>" | ||
BATCH_USER_ASSIGNED_IDENTITY="<user assigned identity>" | ||
AZURE_BATCH_ACCOUNT_CLIENT_ID="<azure client id>" | ||
PRINCIPAL_ID="<principal id>" | ||
CONTAINER_REGISTRY_SERVER="<container registry server>" | ||
CONTAINER_IMAGE_NAME="https://full-cr-server/<container image name>:tag" | ||
POOL_ID="<pool id>" | ||
SUBNET_ID="<subnet id>" | ||
RESOURCE_GROUP="<resource group name>" | ||
|
||
If running in CI, all of the above environment variables should be set in the repo | ||
secrets. | ||
""" | ||
|
||
import os | ||
|
||
from azure.identity import DefaultAzureCredential | ||
from azure.mgmt.batch import BatchManagementClient | ||
|
||
AUTO_SCALE_FORMULA = """ | ||
// In this example, the pool size | ||
// is adjusted based on the number of tasks in the queue. | ||
// Note that both comments and line breaks are acceptable in formula strings. | ||
|
||
// Get pending tasks for the past 5 minutes. | ||
$samples = $ActiveTasks.GetSamplePercent(TimeInterval_Minute * 5); | ||
// If we have fewer than 70 percent data points, we use the last sample point, otherwise we use the maximum of last sample point and the history average. | ||
$tasks = $samples < 70 ? max(0, $ActiveTasks.GetSample(1)) : | ||
max( $ActiveTasks.GetSample(1), avg($ActiveTasks.GetSample(TimeInterval_Minute * 5))); | ||
// If number of pending tasks is not 0, set targetVM to pending tasks, otherwise half of current dedicated. | ||
$targetVMs = $tasks > 0 ? $tasks : max(0, $TargetDedicatedNodes / 2); | ||
// The pool size is capped at 100, if target VM value is more than that, set it to 100. | ||
cappedPoolSize = 100; | ||
$TargetDedicatedNodes = max(0, min($targetVMs, cappedPoolSize)); | ||
// Set node deallocation mode - keep nodes active only until tasks finish | ||
$NodeDeallocationOption = taskcompletion; | ||
""" | ||
|
||
|
||
def main() -> None: | ||
# Create the BatchManagementClient | ||
batch_mgmt_client = BatchManagementClient( | ||
credential=DefaultAzureCredential(), | ||
subscription_id=os.environ["SUBSCRIPTION_ID"], | ||
) | ||
|
||
# Assemble the pool parameters | ||
pool_parameters = { | ||
"identity": { | ||
"type": "UserAssigned", | ||
"userAssignedIdentities": { | ||
os.environ["BATCH_USER_ASSIGNED_IDENTITY"]: { | ||
"clientId": os.environ["AZURE_BATCH_ACCOUNT_CLIENT_ID"], | ||
"principalId": os.environ["PRINCIPAL_ID"], | ||
} | ||
}, | ||
}, | ||
"properties": { | ||
"vmSize": "STANDARD_d4d_v5", | ||
"interNodeCommunication": "Disabled", | ||
"taskSlotsPerNode": 1, | ||
"taskSchedulingPolicy": {"nodeFillType": "Spread"}, | ||
"deploymentConfiguration": { | ||
"virtualMachineConfiguration": { | ||
"imageReference": { | ||
"publisher": "microsoft-dsvm", | ||
"offer": "ubuntu-hpc", | ||
"sku": "2204", | ||
"version": "latest", | ||
}, | ||
"nodeAgentSkuId": "batch.node.ubuntu 22.04", | ||
"containerConfiguration": { | ||
"type": "dockercompatible", | ||
"containerImageNames": [os.environ["CONTAINER_IMAGE_NAME"]], | ||
"containerRegistries": [ | ||
{ | ||
"identityReference": { | ||
"resourceId": os.environ[ | ||
"BATCH_USER_ASSIGNED_IDENTITY" | ||
] | ||
}, | ||
"registryServer": os.environ[ | ||
"CONTAINER_REGISTRY_SERVER" | ||
], | ||
} | ||
], | ||
}, | ||
} | ||
}, | ||
"networkConfiguration": { | ||
"subnetId": os.environ["SUBNET_ID"], | ||
"publicIPAddressConfiguration": {"provision": "NoPublicIPAddresses"}, | ||
"dynamicVnetAssignmentScope": "None", | ||
}, | ||
"scaleSettings": { | ||
"autoScale": { | ||
"evaluationInterval": "PT5M", | ||
"formula": AUTO_SCALE_FORMULA, | ||
} | ||
}, | ||
"resizeOperationStatus": { | ||
"targetDedicatedNodes": 1, | ||
"nodeDeallocationOption": "Requeue", | ||
"resizeTimeout": "PT15M", | ||
"startTime": "2023-07-05T13:18:25.7572321Z", | ||
}, | ||
"currentDedicatedNodes": 0, | ||
"currentLowPriorityNodes": 0, | ||
"targetNodeCommunicationMode": "Simplified", | ||
"currentNodeCommunicationMode": "Simplified", | ||
}, | ||
} | ||
|
||
batch_mgmt_client.pool.create( | ||
resource_group_name=os.environ["RESOURCE_GROUP"], | ||
account_name=os.environ["BATCH_ACCOUNT"], | ||
pool_name=os.environ["POOL_ID"], | ||
parameters=pool_parameters, | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,10 @@ env: | |
jobs: | ||
|
||
build-pipeline-image: | ||
permissions: | ||
id-token: write # This is required for requesting the JWT | ||
contents: read # This is required for actions/checkout | ||
packages: write # This is required for ACR import | ||
runs-on: ubuntu-latest | ||
name: Build image | ||
|
||
|
@@ -84,40 +88,66 @@ jobs: | |
|
||
acr-import: | ||
needs: build-pipeline-image | ||
runs-on: cfa-cdcgov-aca | ||
runs-on: ubuntu-latest | ||
environment: production | ||
permissions: | ||
id-token: write # This is required for requesting the JWT | ||
contents: read # This is required for actions/checkout | ||
packages: write # This is required for ACR import | ||
|
||
name: Copy image from GHCR to ACR | ||
outputs: | ||
tag: ${{ needs.build-pipeline-image.outputs.tag }} | ||
steps: | ||
|
||
- name: Azure login with OIDC | ||
uses: azure/login@v2 | ||
# From: https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-cloud-providers#requesting-the-jwt-using-the-actions-core-toolkit | ||
- name: Install OIDC Client from Core Package | ||
run: npm install @actions/[email protected] @actions/http-client | ||
- name: Get Id Token | ||
uses: actions/github-script@v7 | ||
id: idtoken | ||
with: | ||
creds: ${{ secrets.EDAV_CFA_PREDICT_NNHT_SP }} | ||
script: | | ||
const coredemo = require('@actions/core') | ||
const id_token = await coredemo.getIDToken('api://AzureADTokenExchange') | ||
coredemo.setOutput('id_token', id_token) | ||
|
||
- name: Copy Image | ||
run: | | ||
IMAGE_TAG=${{ env.IMAGE_NAME }}:${{ needs.build-pipeline-image.outputs.tag }} | ||
az acr import --name ${{ env.REGISTRY }} \ | ||
--source "ghcr.io/cdcgov/$IMAGE_TAG" \ | ||
--username ${{ github.actor }} \ | ||
--password ${{ secrets.GITHUB_TOKEN }} \ | ||
--image "$IMAGE_TAG" \ | ||
--force && echo 'Copied image!' | ||
if [ $? -ne 0 ]; then | ||
echo "Failed to copy image" | ||
fi | ||
- name: ACR Import | ||
uses: CDCgov/cfa-actions/[email protected] | ||
with: | ||
github_app_id: ${{ secrets.CDCENT_ACTOR_APP_ID }} | ||
github_app_pem: ${{ secrets.CDCENT_ACTOR_APP_PEM }} | ||
wait_for_completion: true | ||
print_logs: true | ||
script: | | ||
echo "Logging into Azure CLI" | ||
az login --service-principal \ | ||
--username ${{ secrets.AZURE_NNHT_SP_CLIENT_ID }} \ | ||
--tenant ${{ secrets.TENANT_ID }} \ | ||
--federated-token ${{ steps.idtoken.outputs.id_token }} \ | ||
--output none | ||
|
||
IMAGE_TAG=${{ env.IMAGE_NAME }}:${{ needs.build-pipeline-image.outputs.tag }} | ||
az acr import --name ${{ env.REGISTRY }} \ | ||
--source "ghcr.io/cdcgov/$IMAGE_TAG" \ | ||
--username ${{ github.actor }} \ | ||
--password ${{ secrets.GITHUB_TOKEN }} \ | ||
--image "$IMAGE_TAG" \ | ||
--force && echo 'Copied image!' | ||
|
||
if [ $? -ne 0 ]; then | ||
echo "Failed to copy image" | ||
fi | ||
|
||
batch-pool: | ||
|
||
name: Create Batch Pool and Submit Jobs | ||
runs-on: cfa-cdcgov-aca | ||
runs-on: ubuntu-latest | ||
needs: acr-import | ||
|
||
environment: production | ||
permissions: | ||
contents: read | ||
packages: write | ||
id-token: write | ||
|
||
env: | ||
TAG: ${{ needs.acr-import.outputs.tag }} | ||
|
@@ -136,65 +166,77 @@ jobs: | |
id: checkout_repo | ||
uses: actions/checkout@v4 | ||
|
||
# This step is only needed during the action to write the | ||
# config file. Users can have a config file stored in their VAP | ||
# sessions. In the future, we will have the config.toml file | ||
# distributed with the repo (encrypted). | ||
- name: Writing out config file | ||
run: | | ||
cat <<EOF > pool-config-${{ github.sha }}.toml | ||
${{ secrets.POOL_CONFIG_TOML }} | ||
EOF | ||
|
||
# Replacing placeholders in the config file | ||
sed -i 's|{{ IMAGE_NAME }}|${{ env.REGISTRY }}${{ env.IMAGE_NAME }}:${{ env.TAG }}|g' pool-config-${{ github.sha }}.toml | ||
sed -i 's|{{ VM_SIZE }}|${{ env.VM_SIZE }}|g' pool-config-${{ github.sha }}.toml | ||
sed -i 's|{{ BATCH_SUBNET_ID }}|${{ env.BATCH_SUBNET_ID }}|g' pool-config-${{ github.sha }}.toml | ||
sed -i 's|{{ POOL_ID }}|${{ env.POOL_ID }}|g' pool-config-${{ github.sha }}.toml | ||
|
||
|
||
- name: Login to Azure with NNH Service Principal | ||
id: azure_login_2 | ||
uses: azure/login@v2 | ||
# From: https://stackoverflow.com/a/58035262/2097171 | ||
- name: Extract branch name | ||
shell: bash | ||
run: echo "branch=${GITHUB_HEAD_REF:-${GITHUB_REF#refs/heads/}}" >> $GITHUB_OUTPUT | ||
id: get-branch | ||
|
||
# From: https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-cloud-providers#requesting-the-jwt-using-the-actions-core-toolkit | ||
- name: Install OIDC Client from Core Package | ||
run: npm install @actions/[email protected] @actions/http-client | ||
- name: Get Id Token | ||
uses: actions/github-script@v7 | ||
id: idtoken | ||
with: | ||
# managed by EDAV. Contact Amit Mantri or Jon Kislin if you have issues. | ||
creds: ${{ secrets.EDAV_CFA_PREDICT_NNHT_SP }} | ||
|
||
######################################################################### | ||
# Checking if the pool exists | ||
# This is done via az batch pool list. If there is no pool matching the | ||
# pool id (which is a function of the tag, i.e., branch name), then we | ||
# pool-exists will be ''. | ||
######################################################################### | ||
- name: Check if pool exists | ||
id: check_pool_id | ||
run: | | ||
|
||
az batch account login \ | ||
--resource-group ${{ secrets.PRD_RESOURCE_GROUP }} \ | ||
--name "${{ env.BATCH_ACCOUNT }}" | ||
script: | | ||
const coredemo = require('@actions/core') | ||
const id_token = await coredemo.getIDToken('api://AzureADTokenExchange') | ||
coredemo.setOutput('id_token', id_token) | ||
|
||
az batch pool list \ | ||
--output tsv \ | ||
--filter "(id eq '${{ env.POOL_ID }}')" \ | ||
--query "[].[id, allocationState, creationTime]" > \ | ||
pool-list-${{ github.sha }} | ||
|
||
echo "pool-exists=$(cat pool-list-${{ github.sha }})" >> \ | ||
$GITHUB_OUTPUT | ||
|
||
- name: Create cfa-epinow2-pipeline Pool | ||
id: create_batch_pool | ||
|
||
# This is a conditional step that will only run if the pool does not | ||
# exist | ||
if: ${{ steps.check_pool_id.outputs.pool-exists == '' }} | ||
|
||
# The call to the az cli that actually generates the pool | ||
run: | | ||
# Running the python script azure/pool.py passing the config file | ||
# as an argument | ||
pip install -r azure/requirements.txt | ||
python3 azure/pool.py \ | ||
pool-config-${{ github.sha }}.toml \ | ||
batch-autoscale-formula.txt | ||
# Removed invalid condition referencing steps.check_pool_id.outputs.pool-exists | ||
uses: CDCgov/cfa-actions/[email protected] | ||
with: | ||
github_app_id: ${{ secrets.CDCENT_ACTOR_APP_ID }} | ||
github_app_pem: ${{ secrets.CDCENT_ACTOR_APP_PEM }} | ||
wait_for_completion: true | ||
print_logs: true | ||
script: | | ||
echo "Setting env vars" | ||
export BATCH_ACCOUNT=${{ secrets.BATCH_ACCOUNT }} | ||
export SUBSCRIPTION_ID=${{ secrets.SUBSCRIPTION_ID }} | ||
export BATCH_USER_ASSIGNED_IDENTITY=${{ secrets.BATCH_USER_ASSIGNED_IDENTITY }} | ||
export AZURE_BATCH_ACCOUNT_CLIENT_ID=${{ secrets.AZURE_BATCH_ACCOUNT_CLIENT_ID }} | ||
export PRINCIPAL_ID=${{ secrets.PRINCIPAL_ID }} | ||
export CONTAINER_REGISTRY_SERVER=${{ secrets.CONTAINER_REGISTRY_SERVER }} | ||
export CONTAINER_REGISTRY_USERNAME=${{ secrets.CONTAINER_REGISTRY_USERNAME }} | ||
export CONTAINER_REGISTRY_PASSWORD=${{ secrets.CONTAINER_REGISTRY_PASSWORD }} | ||
export CONTAINER_REGISTRY_URL=${{ secrets.CONTAINER_REGISTRY_URL }} | ||
export CONTAINER_IMAGE_NAME=${{ env.REGISTRY }}${{ env.IMAGE_NAME }}:${{ env.TAG }} | ||
export POOL_ID=${{ env.POOL_ID }} | ||
export SUBNET_ID=${{ secrets.BATCH_SUBNET_ID }} | ||
export RESOURCE_GROUP=${{ secrets.RESOURCE_GROUP }} | ||
|
||
|
||
echo "Logging into Azure CLI" | ||
az login --service-principal \ | ||
--username ${{ secrets.AZURE_NNHT_SP_CLIENT_ID }} \ | ||
--tenant ${{ secrets.TENANT_ID }} \ | ||
--federated-token ${{ steps.idtoken.outputs.id_token }} \ | ||
--output none | ||
|
||
echo "Logging into batch" | ||
az batch account login \ | ||
--resource-group ${{ secrets.PRD_RESOURCE_GROUP }} \ | ||
--name "${{ env.BATCH_ACCOUNT }}" | ||
|
||
echo "Listing batch pools" | ||
az batch pool list \ | ||
--output tsv \ | ||
--filter "(id eq '${{ env.POOL_ID }}')" \ | ||
--query "[].[id, allocationState, creationTime]" > pool-list-${{ github.sha }} | ||
|
||
if [ -s pool-list-${{ github.sha }} ]; then | ||
echo "Pool already exists!" | ||
else | ||
CURRENT_BRANCH="${{ steps.get-branch.outputs.branch }}" | ||
echo "Cloning repo at branch '$CURRENT_BRANCH'" | ||
git clone -b "$CURRENT_BRANCH" https://github.com/${{ github.repository }}.git | ||
cd cfa-epinow2-pipeline | ||
|
||
echo "Running create pool script" | ||
uv run .github/scripts/create_pool.py | ||
fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -389,3 +389,6 @@ azure/*.toml | |
# Careful with Secrets! | ||
*.env | ||
*.env.gpg | ||
|
||
# vscode settings | ||
.vscode/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.