Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
696b820
bump version
amogkam Mar 24, 2021
c018908
[tune] Limit maximum number of pending trials. Add convergence test. …
krfricke Mar 24, 2021
6422956
Revert "[core] Set a configurable max memory for fetched objects (#14…
stephanie-wang Mar 24, 2021
f729046
[RLlib] Issue 14533: `tf.enable_eager_execution()` must be called at …
sven1977 Mar 24, 2021
cd86578
[autoscaler][aws] Use subnets in only one VPC (#14868)
DmitriGekhtman Mar 24, 2021
e0fe20b
[tune] fix long running release test WIP (#14866)
krfricke Mar 25, 2021
8d52281
Revert "[RLlib] Issue 14533: `tf.enable_eager_execution()` must be ca…
rkooo567 Mar 25, 2021
c957786
[tune] fix long running release test WIP (#14866)
krfricke Mar 25, 2021
ff08daf
Merge branch 'releases/1.3.0' of https://github.com/ray-project/ray i…
amogkam Mar 25, 2021
b669fcd
[HotFix] Avoid pushing nightly tags without py in them (#14916)
ijrsvt Mar 25, 2021
6a8e5dc
Fixed Dask on Ray for dask>=2021.3.1 which dropped Python 3.6 (#14991)
tgaddair Mar 30, 2021
f480943
Revert "[core] Fix worker type in python (#14823)" (#14910)
fishbone Mar 24, 2021
de90942
[tune] Reconcile placement groups every N seconds to avoid bottleneck…
krfricke Apr 1, 2021
43d4128
[core] Fix UTIL worker issue (#14925)
fishbone Apr 2, 2021
9a78641
[core] Fix placement group GPU assignment bug (#15049)
Apr 2, 2021
994976f
Fix ray[full] -> ray[cluster] #15112
richardliaw Apr 5, 2021
62e4bfe
[core] Internal kv support in gcs (#14656)
fishbone Apr 5, 2021
7d5f057
[core] Internal kv support in gcs (#14656)
fishbone Apr 5, 2021
eb4f34b
Merge branch 'releases/1.3.0' of github.com:ray-project/ray into rele…
fishbone Apr 5, 2021
cb3661e
[dask-on-ray] Fix Dask-on-Ray scheduler break caused by changing inte…
clarkzinzow Apr 6, 2021
db0b6a6
[RLlib] Minor release 1.3 warnings cleanups. (#15272)
sven1977 Apr 14, 2021
5d02860
ray[cluster] -> ray[default] (#15251)
richardliaw Apr 14, 2021
4e5b702
[Stats] Basic implementation for the the periodic asio stats printing…
rkooo567 Mar 30, 2021
df67214
[core] Cap total memory used by executing tasks' arguments (#15027)
stephanie-wang Mar 31, 2021
de41c1a
Take care of failed killing request (#15313)
fishbone Apr 15, 2021
f0652ae
[Log] Fix log monitor issue. (#15302)
rkooo567 Apr 15, 2021
61b875e
[Release branch only] Revert unhandled exception PR for 1.3.0 tempora…
rkooo567 Apr 15, 2021
c300eb2
[core] Log warning on bad max task args value (#15314)
stephanie-wang Apr 15, 2021
d6bd56b
Merge branch 'releases/1.3.0' of https://github.com/ray-project/ray i…
amogkam Apr 15, 2021
0b4b444
[RLlib] APEX returns incorrect default resources (PleacementGroupFact…
sven1977 Apr 15, 2021
a277aca
[tune] Allow 0 CPU head bundles in for placement group factories (#15…
krfricke Apr 15, 2021
cb25437
[Client] Add metadata to Terminate Calls to make ray.kill() and ray.c…
ijrsvt Apr 13, 2021
cdfdde5
Fix release test -- client remote put (#15325)
richardliaw Apr 15, 2021
8ed3de0
rllib hotfix
amogkam Apr 15, 2021
f60f1bd
[autoscaler] Do not divide by zero in resource demand scheduler (#15323)
DmitriGekhtman Apr 16, 2021
1c0f103
Revert "[autoscaler] Do not divide by zero in resource demand schedul…
amogkam Apr 18, 2021
d5c46d4
[minor] improve warning message for Ray (#15005)
richardliaw Apr 5, 2021
197fa3e
[autoscaler] Do not divide by zero in resource demand scheduler (#15323)
DmitriGekhtman Apr 16, 2021
9f45548
[core] Fixing of actor creation failure (#15411)
fishbone Apr 20, 2021
2a02b97
Move scalability envelope back down to 250 nodes (#15381)
Apr 17, 2021
1231f35
bump java version (#15471)
chaokunyang Apr 26, 2021
412fd55
Edpalenc/1.3.0 bonsai sync (#69)
Edilmo May 14, 2021
9f429e0
No Case: Support the FCNet to work with list of activation functions.…
RuofanKong May 20, 2021
34b82fc
CQL-DQN support 1.x (#72)
Edilmo Jun 2, 2021
f2e7cc9
revert cql minor change (#76)
Edilmo Jun 8, 2021
40ac264
No Case: Fixed the RLLib contract change for SampleBatch that's incon…
RuofanKong Jul 16, 2021
47220b8
No Case: Fixed the main branch of 1.3.0. (#81)
RuofanKong Aug 6, 2021
a7770f1
No Case: Generate a new has number. (#82)
RuofanKong Aug 6, 2021
31973af
Fix replay buffer test (#83)
Edilmo Aug 11, 2021
ce94fa8
No Case: Fix the sample collector key bug.
Aug 11, 2021
f5b6897
Merge pull request #84 from BonsaiAI/rukon/sample_collector_fix
RuofanKong Aug 12, 2021
84d4434
No Case: Fix the forcing cast to Numpy Float32 in SampleBatch. (#79)
RuofanKong Aug 16, 2021
7482e1d
No Case: Expose the Sample Collector for the cached buffer data. (#80)
RuofanKong Aug 16, 2021
0a69f8b
No Case: Fixed the PPO runtime error metrics. (#86)
RuofanKong Aug 18, 2021
5d1939b
No Case: Fixed the RLLib 1.3.0 replay buffer bug for SAC. (#87)
RuofanKong Aug 23, 2021
0183e86
No Case: Fixed the sample batch downcasting to float32 issues. (#89)
RuofanKong Aug 26, 2021
60cd62f
No Case: Fixed CQL tf import. (#90)
RuofanKong Aug 28, 2021
1548033
No Case: RLLib fixed the CQL-SAC Q loss.
Sep 1, 2021
c7b948f
fixed the tests.
Sep 1, 2021
6784382
Merge pull request #91 from BonsaiAI/rukon/fix_cql_sac
bimalkmehta Sep 1, 2021
6350a82
No Case: fixed the torch tests.
Sep 1, 2021
768cd9a
Merge branch 'releases/1.3.0' into rukon/fix_torch_sac_tests
Sep 1, 2021
998e230
change torch sac loss.
Sep 1, 2021
93645d9
Merge pull request #92 from BonsaiAI/rukon/fix_torch_sac_tests
bimalkmehta Sep 2, 2021
1e9caf8
No Case: RLLib 1.3.0 fixed the DQN convergence issue per master branc…
Sep 7, 2021
8e751a8
Added the post processing tests.
Sep 7, 2021
71b1912
Changed the test tag.
Sep 7, 2021
68ac95f
Fixed the unit tests.
Sep 7, 2021
d6685c3
Merge pull request #93 from BonsaiAI/rukon/fix_dqn
bimalkmehta Sep 8, 2021
27e76cc
Fix MAC wheels generation (#94)
Edilmo Sep 29, 2021
5ad5920
Fix pip conf (#99)
Edilmo Oct 11, 2021
e6ce4d6
Change agent OS image (#102)
Edilmo Oct 27, 2021
bba67f5
Fix minor bugs (#104)
Edilmo Nov 3, 2021
3a3aa12
handle psutil exceptions when generating RayOutOfMemory message (#109)
Kiko-Aumond Nov 19, 2021
e3d0243
Adding L2, Dropout and VIB Regularizer (#107)
abhiksingla Nov 22, 2021
cdf0e2d
updated log4j-slf4j dependency for Ray 1.3.0
Kiko-Aumond Dec 21, 2021
065929e
Merge branch 'releases/1.3.0' into kiko/log4j_ray_1.3.0
Kiko-Aumond Dec 21, 2021
b04aed7
updated log4j version for Ray 1.3.0
Kiko-Aumond Dec 21, 2021
ddd76fe
Merge remote-tracking branch 'origin/releases/1.3.0' into kiko/log4j_…
Kiko-Aumond Dec 21, 2021
312336a
Merge branch 'kiko/log4j_ray_1.3.0' of github.com:BonsaiAI/ray into k…
Kiko-Aumond Dec 21, 2021
bf9b42a
fixed slf4j version mistake
Kiko-Aumond Dec 21, 2021
679f5ef
Merge pull request #113 from BonsaiAI/kiko/log4j_ray_1.3.0
bimalkmehta Dec 22, 2021
7a992e3
No Case: Sync Bonsai changes with Ray v1.11.0.
Mar 25, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .bazelversion
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
5.0.0
80 changes: 33 additions & 47 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,81 +1,67 @@
# Each line is a file pattern followed by one or more owners.
# See https://help.github.com/articles/about-codeowners/
# for more info about CODEOWNERS file

# It uses the same pattern rule for gitignore file,
# see https://git-scm.com/docs/gitignore#_pattern_format.

# ==== Ray core ====
# ==== Ray default ====
# These owners will be the default owners for everything in
# the repo. Unless a later match takes precedence,
# @BonsaiAI/ray-code-owners will be requested for
# review when someone opens a pull request.
* @BonsaiAI/ray-code-owners

# API compatibility
/src/ray/protobuf/common.proto @wuisawesome @ericl @ameerhajali @robertnishihara @pcmoritz @raulchen
/src/ray/protobuf/gcs.proto @wuisawesome @ericl @ameerhajali @robertnishihara @pcmoritz @raulchen
/src/ray/protobuf/gcs_service.proto @wuisawesome @ericl @ameerhajali @robertnishihara @pcmoritz @raulchen
/dashboard/modules/snapshot @wuisawesome @ijrsvt @joeybai @alanwguo @architkulkarni @kombuchafox

# Metrics
/src/ray/stats/metric_defs.h @ericl @scv119 @rkooo567
/src/ray/stats/metric_defs.cc @ericl @scv119 @rkooo567
# ==== Ray core ====

# All C++ code.
# /src/ray @ray-project/ray-core-cpp
/src/ray @BonsaiAI/ray-maintainers

# Dashboard.
/dashboard/ @BonsaiAI/ray-maintainers

# Dependencies
/python/setup.py @richardliaw @ericl @edoakes
/python/setup.py @BonsaiAI/ray-maintainers

# Formatting tool
/ci/travis/format.sh @richardliaw @ericl @edoakes
/ci/travis/format.sh @BonsaiAI/ray-maintainers

# Python worker.
#/python/ray/ @ray-project/ray-core-python
#!/python/ray/tune/ @ray-project/ray-core-python
#!/python/ray/rllib/ @ray-project/ray-core-python
/python/ray/ @BonsaiAI/ray-maintainers
!/python/ray/tune/ @BonsaiAI/ray-maintainers
!/python/ray/rllib/ @BonsaiAI/ray-maintainers

# Java worker.
/java/dependencies.bzl @jovany-wang @kfstorm @raulchen @ericl @iycheng
/java/pom.xml @jovany-wang @kfstorm @raulchen @ericl @iycheng
/java/pom_template.xml @jovany-wang @kfstorm @raulchen @ericl @iycheng
/java/*/pom_template.xml @jovany-wang @kfstorm @raulchen @ericl @iycheng
/java/api/ @jovany-wang @kfstorm @raulchen @ericl @iycheng
/java/ @BonsaiAI/ray-maintainers

# Ray Client
/src/ray/protobuf/ray_client.proto @ijrsvt @ameerhajali @ckw017 @mwtian
# Kube Operator.
/deploy/ @BonsaiAI/ray-maintainers

# Runtime Env
# TODO(SongGuyang): Add new items to guarantee runtime env API compatibility in multiple languages.
/src/ray/protobuf/runtime_env_common.proto @SongGuyang @raulchen @edoakes @architkulkarni
/src/ray/protobuf/runtime_env_agent.proto @SongGuyang @raulchen @edoakes @architkulkarni
# Doc
/doc/ @BonsaiAI/ray-maintainers

# ==== Libraries and frameworks ====

# Ray tune.
/python/ray/tune/ @ray-project/ray-tune

# Ray data.
/python/ray/data/ @ericl @scv119
/doc/source/data/ @ericl @scv119

# Ray workflows.
/python/ray/workflow/ @ericl @iycheng
/doc/source/workflows/ @ericl @iycheng
/python/ray/tune/ @BonsaiAI/ray-code-owners

# RLlib.
/rllib/ @sven1977 @gjoliver @avnishn
/python/ray/rllib/ @BonsaiAI/ray-code-owners
/rllib/ @BonsaiAI/ray-code-owners

# ML Docker Dependencies
/python/requirements/ml/requirements_dl.txt @amogkam @sven1977 @richardliaw @matthewdeng
/python/requirements_ml_docker.txt @amogkam @sven1977 @richardliaw @matthewdeng

# Ray symbol export
src/ray/ray_version_script.lds @mwtian @iycheng @ericl @scv119
src/ray/ray_exported_symbols.lds @mwtian @iycheng @ericl @scv119
/python/requirements/ml/requirements_dl.txt @BonsaiAI/ray-code-owners
/python/requirements_ml_docker.txt @BonsaiAI/ray-maintainers

# ==== Build and CI ====

# Bazel.
#/BUILD.bazel @ray-project/ray-core
#/WORKSPACE @ray-project/ray-core
#/bazel/ @ray-project/ray-core
/BUILD.bazel @BonsaiAI/ray-code-owners
/WORKSPACE @BonsaiAI/ray-code-owners
/bazel/ @BonsaiAI/ray-code-owners

# CI scripts.
#/.travis.yml @ray-project/ray-core
#/ci/travis/ @ray-project/ray-core
/.travis.yml @BonsaiAI/ray-maintainers
/ci/ @BonsaiAI/ray-maintainers

6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# The build output should clearly not be checked in
*test-output.xml
/bazel-*
/bazel-ray/
/python/ray/core
/python/ray/pickle5_files/
/python/ray/thirdparty_files/
Expand All @@ -13,6 +14,7 @@
/python/ray/serve/generated
/thirdparty/pkg/
/build/java
/python/ray/dashboard
.jar
/dashboard/client/build

Expand Down Expand Up @@ -210,3 +212,7 @@ workflow_data/

# vscode java extention generated
.factorypath

# PyCharm
.ijwb/
.run/
4 changes: 2 additions & 2 deletions bazel/ray_deps_setup.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,8 @@ def ray_deps_setup():

auto_http_archive(
name = "bazel_common",
url = "https://github.com/google/bazel-common/archive/084aadd3b854cad5d5e754a7e7d958ac531e6801.tar.gz",
sha256 = "a6e372118bc961b182a3a86344c0385b6b509882929c6b12dc03bb5084c775d5",
url = "https://github.com/google/bazel-common/archive/bf87eb1a4ddbfc95e215b0897f3edc89b2254a1a.tar.gz",
sha256 = "dab4cbd634aae4bc9b116f4de5737e4d3c0754c3a1d712ad4a9b75140d278614",
)

auto_http_archive(
Expand Down
225 changes: 225 additions & 0 deletions ci/azure_pipelines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Azure Pipelines

This folder contains the code required to create the Azure Pipelines for the CI/CD of the Ray project.
Keep in mind that this could be outdated.
Please check the following links if you want to update the procedure.
- [Azure virtual machine scale set agents](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/scale-set-agents?view=azure-devops)
- [Repo for the Azure Pipelines images](https://github.com/actions/virtual-environments)

## Self-hosted Linux Agents

### Create VM Image

The following are the instructions to build the VM image of a self-hosted linux agent using a Virtual Hard Drive (VHD).
The image will be the same one that is used by the Microsoft-hosted linux agents. This approach
simplifies the maintenance and also allows to keep the pipelines code compatible with both
types of agents.

Requirements:
- Install packer : https://www.packer.io/downloads.html
- Install azure-cli : https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest

Steps for Mac and Ubuntu:
- Clone the GitHub Actions virtual environments repo: `git clone https://github.com/actions/virtual-environments.git`
- Move into the folder of the repo cloned aboved: `pushd virtual-environments/images/linux`
- Log in your azure account: `az login`
- Set your Azure subscription id and tenant id:
- Check your subscriptions: `az account list --output table`
- Set your default (replace your Subscription id in the command): `az account set -s {Subscription Id}`
- Get the subscription id: `SUBSCRIPTION_ID=$(az account show --query 'id' --output tsv)`
- Get the tenant id: `TENANT_ID=$(az account show --query 'tenantId' --output tsv)`
- Select the azure location: `AZURE_LOCATION="eastus"`
- Create and select the name of the resource group where the Azure resources will be created:
- Set the group: `RESOURCE_GROUP_NAME="RayADOAgents"`
- Try to create the group. If the resource group exists, the details for it will be returned: `az group create -n $RESOURCE_GROUP_NAME -l $AZURE_LOCATION`
- Create a Storage Account:
- Set Storage Account name: `STORAGE_ACCOUNT_NAME="rayadoagentsimage"`
- Create the Storage Account: `az storage account create -n $STORAGE_ACCOUNT_NAME -g $RESOURCE_GROUP_NAME -l $AZURE_LOCATION --sku "Standard_LRS"`
- Create a Service Principal. If you have an existing Service Principal, it can also be used instead of creating a new one:
- Set the object id: `OBJECT_ID="http://rayadoagents"`
- Create client and get secret: `CLIENT_SECRET=$(az ad sp create-for-rbac -n $OBJECT_ID --scopes="/subscriptions/${SUBSCRIPTION_ID}" --query 'password' -o tsv)`. If the Principal already exist, this command returns the id of the role assignment. Please use your old password. Or delete the existing Principal with `az ad sp delete --id $OBJECT_ID`.
- Get client id: `CLIENT_ID=$(az ad sp show --id $OBJECT_ID --query 'appId' -o tsv)`
- Set Install password: `INSTALL_PASSWORD="$CLIENT_SECRET"`
- Create a Key Vault. If you have an existing Service Principal, it can also be used instead of creating a new one:
- Set Key Vault name: `KEY_VAULT_NAME="ray-agent-secrets"`
- Create the Key Vault: `az keyvault create --name $KEY_VAULT_NAME --resource-group $RESOURCE_GROUP_NAME --location $AZURE_LOCATION`. If the Key Vault exist, this command returns the info.
- Set a GitHub Personal Access Token with rights to download:
- Set Key Pair name: `GITHUB_FEED_TOKEN_NAME="raygithubfeedtoken"`
- Upload your PAT to the vault (replace your token in the command):`az keyvault secret set --name $GITHUB_FEED_TOKEN_NAME --vault-name $KEY_VAULT_NAME --value "{GitHub Token}"`
- Get PAT from the Vault: `GITHUB_FEED_TOKEN=$(az keyvault secret show --name $GITHUB_FEED_TOKEN_NAME --vault-name $KEY_VAULT_NAME --query 'value' --output tsv)`
- Create the Managed Disk image:
- Create a packer variables file:
```
cat << EOF > azure-variables.json
{
"client_id": "${CLIENT_ID}",
"client_secret": "${CLIENT_SECRET}",
"subscription_id": "${SUBSCRIPTION_ID}",
"tenant_id": "${TENANT_ID}",
"object_id": "${OBJECT_ID}",
"location": "${AZURE_LOCATION}",
"resource_group": "${RESOURCE_GROUP_NAME}",
"storage_account": "${STORAGE_ACCOUNT_NAME}",
"install_password": "${INSTALL_PASSWORD}",
"github_feed_token": "${GITHUB_FEED_TOKEN}"
}
EOF
```
- Execute packer build: `packer build -var-file=azure-variables.json ubuntu1604.json`

For more details (Check the following doc in the virtual environment repo)[https://github.com/actions/virtual-environments/blob/master/help/CreateImageAndAzureResources.md].


### Create Agent Pool

#### 1. Create the Virtual Machine Scale Set (VMSS)

Creation of the VMSS is done using the Azure Resource Manager (ARM) template, `image/agentpool.json`. The following are important fixed parameters that could be changed:

| Parameter | Description |
| ------------- | ------------- |
| vmssName | name of the VMSS to be created |
| instanceCount | number of VMs to create in initial deployemnt (can be changed later) |

Steps for Mac and Ubuntu:
- Log in your azure account: `az login`
- Set your Azure subscription id and tenant id:
- Check your subscriptions: `az account list --output table`
- Set your default: `az account set -s {Subscription Id}`
- Get the subscription id: `SUBSCRIPTION_ID=$(az account show --query 'id' --output tsv)`
- Get the tenant id: `TENANT_ID=$(az account show --query 'tenantId' --output tsv)`
- Set Storage Account name (same that is above): `STORAGE_ACCOUNT_NAME="rayadoagentsimage"`
- Select the azure location: `AZURE_LOCATION="eastus"`
- Create and select the name of the resource group where the Azure resources will be created:
- Set the group: `RESOURCE_GROUP_NAME="RayADOAgents"`
- Try to create the group. If the resource group exists, the details for it will be returned: `az group create -n $RESOURCE_GROUP_NAME -l $AZURE_LOCATION`
- Create a Key Vault. If you have an existing Service Principal, it can also be used instead of creating a new one:
- Set Key Vault name: `KEY_VAULT_NAME="ray-agent-secrets"`
- Create the Key Vault: `az keyvault create --name $KEY_VAULT_NAME --resource-group $RESOURCE_GROUP_NAME --location $AZURE_LOCATION`. If the Key Vault exist, this command returns the info.
- Create a Key Pair in the Vault:
- Set Key Pair name: `SSH_KEY_PAIR_NAME="rayagentadminrsa"`
- Set Key Pair name: `SSH_KEY_PAIR_NAME_PUB="${SSH_KEY_PAIR_NAME}pub"`
- Set SSH key pair file path: `SSH_KEY_PAIR_PATH="$HOME/.ssh/$SSH_KEY_PAIR_NAME"`
- Create the SSH key pair: `ssh-keygen -m PEM -t rsa -b 4096 -f $SSH_KEY_PAIR_PATH`
- Upload your key pair to the vault:
- Public part to be used by the VMs: `az keyvault secret set --name $SSH_KEY_PAIR_NAME_PUB --vault-name $KEY_VAULT_NAME --file ${SSH_KEY_PAIR_PATH}.pub`
- (Optional) Private part to be used by the VMs: `az keyvault secret set --name $SSH_KEY_PAIR_NAME --vault-name $KEY_VAULT_NAME --file $SSH_KEY_PAIR_PATH`
- Get public part from the Vault: `SSH_KEY_PUB=$(az keyvault secret show --name $SSH_KEY_PAIR_NAME_PUB --vault-name $KEY_VAULT_NAME --query 'value' --output tsv)`
- Create the VMSS:
- Set the Subnet Id of the subnet where the VMs must be: `SUBNET_ID="{Subnet Id}"`
- Set the VMSS name: `VMSS_NAME="RayPipelineAgentPoolStandardF16sv2"`
- Set the instance count: `INSTANCE_COUNT="2"`
- Get Reader role definition: `ROLE_DEFINITION_ID=$(az role definition list --subscription $SUBSCRIPTION_ID --query "([?roleName=='Reader'].id)[0]" --output tsv)`
- Set the source image VHD NAME (assuming the latest): `SOURCE_IMAGE_VHD_NAME="$(az storage blob list --subscription $SUBSCRIPTION_ID --account-name $STORAGE_ACCOUNT_NAME -c images --prefix pkr --query 'sort_by([], &properties.creationTime)[-1].name' --output tsv)"`
- Set the source image VHD URI: `SOURCE_IMAGE_VHD_URI="https://${STORAGE_ACCOUNT_NAME}.blob.core.windows.net/images/${SOURCE_IMAGE_VHD_NAME}"`
- Create the VM scale set: `az group deployment create --resource-group $RESOURCE_GROUP_NAME --template-file image/agentpool.json --parameters "vmssName=$VMSS_NAME" --parameters "instanceCount=$INSTANCE_COUNT" --parameters "sourceImageVhdUri=$SOURCE_IMAGE_VHD_URI" --parameters "sshPublicKey=$SSH_KEY_PUB" --parameters "location=$AZURE_LOCATION" --parameters "subnetId=$SUBNET_ID" --parameters "keyVaultName=$KEY_VAULT_NAME" --parameters "tenantId=$TENANT_ID" --parameters "roleDefinitionId=$ROLE_DEFINITION_ID" --name $VMSS_NAME`

#### 2. Create the Agent Pool in Azure DevOps

Open Azure DevOps > "Project Settings" (bottom right) > "Agent Pools" > "New Agent Pool" > "Add pool" to create a new agent pool. Enter the agent pool's name, which must match the value you provided VMSS_NAME (see steps above).

Make sure your admin is added as the administrator in ADO in 2 places:
- Azure DevOps > "Project Settings" (bottom right) > "Agent Pools" > [newly created agent poool] >"Security Tab" and
- Azure DevOps > bizair > Organization Settings > Agent Pools > Security

#### 3. Connect VMs to pool

Steps for Mac and Ubuntu:
- Copy some files to fix some errors in the generation of the agent image:
- The error is due to a issue with the packer script. It's not downloading a postgresql installation script.
In order to check if the image was not fully build, connect to the vm using ssh (see steps below), and run this: `INSTALLER_SCRIPT_FOLDER="/imagegeneration/installers" source /imagegeneration/installers/test-toolcache.sh`.
If you don't get any error message, skip the following 3 steps.
- Tar the image folder: `tar -zcvf image.tar.gz image`
- Set Key Pair name: `export SSH_KEY_PAIR_NAME="rayagentadminrsa"`
- Set SSH key pair file path: `export SSH_KEY_PAIR_PATH="$HOME/.ssh/$SSH_KEY_PAIR_NAME"`
- Set the IP of your VM: `export IP={my.ip}`
- Copy to each of your machines in the Scale set: `scp -o "IdentitiesOnly=yes" -i $SSH_KEY_PAIR_PATH ./image.tar.gz agentadmin@"${IP}":/home/agentadmin`
- Delete the tar: `rm image.tar.gz`
- Connect using ssh:
- Open a ssh tunnel: `ssh -o "IdentitiesOnly=yes" -i $SSH_KEY_PAIR_PATH agentadmin@"${IP}"`
- Fix the image:
- Untar the image file: `tar zxvf ./image.tar.gz`
- Switch to root: `sudo -s`
- In your machine get PAT from the Vault:
- Set Key Pair name: `export GITHUB_FEED_TOKEN_NAME="raygithubfeedtoken"`
- Set Key Vault name: `export KEY_VAULT_NAME="ray-agent-secrets"`
- Get the token: `az keyvault secret show --name $GITHUB_FEED_TOKEN_NAME --vault-name $KEY_VAULT_NAME --query 'value' --output tsv`
- Set the PAT in your ssh session: `export GITHUB_FEED_TOKEN={ GitHub Token }`
- Add agentadmin to the root group: `sudo gpasswd -a agentadmin root`
- Install missing part: `source ./image/fix-image.sh`
- Set the system up:
```
export GITHUB_FEED_TOKEN={ GitHub Token }
export DEBIAN_FRONTEND=noninteractive
export METADATA_FILE="/imagegeneration/metadatafile"
export HELPER_SCRIPTS="/imagegeneration/helpers"
export INSTALLER_SCRIPT_FOLDER="/imagegeneration/installers"
export BOOST_VERSIONS="1.69.0"
export BOOST_DEFAULT="1.69.0"
export AGENT_TOOLSDIRECTORY=/opt/hostedtoolcache
mkdir -p $INSTALLER_SCRIPT_FOLDER/node_modules
sudo chmod --recursive a+rwx $INSTALLER_SCRIPT_FOLDER/node_modules
sudo chown -R agentadmin:root $INSTALLER_SCRIPT_FOLDER/node_modules
source $INSTALLER_SCRIPT_FOLDER/hosted-tool-cache.sh
source $INSTALLER_SCRIPT_FOLDER/test-toolcache.sh
chown -R agentadmin:root $AGENT_TOOLSDIRECTORY
echo 'export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
AGENT_TOOLSDIRECTORY="/opt/hostedtoolcache/"' >> ~/.bashrc
```
- Go to the [New Agent] option in the pool and follow the instructions for linux agents:
- Download the agent: `wget https://vstsagentpackage.azureedge.net/agent/2.170.1/vsts-agent-linux-x64-2.170.1.tar.gz`
- Create and move to a directory for the agent: `mkdir myagent && cd myagent`
- Untar the agent: `tar zxvf ../vsts-agent-linux-x64-2.170.1.tar.gz`
- Configure the agent: `./config.sh`
- Accept the license.
- Enter your organization URL.
- Enter your ADO PAT.
- Set a Personal Access Token:
- Set Key Pair name: `ADO_TOKEN_NAME="rayagentadotoken"`
- Upload your PAT to the vault (replace your token in the command):`az keyvault secret set --name $ADO_TOKEN_NAME --vault-name $KEY_VAULT_NAME --value "{ADO Token}"`
- Enter the agent pool's name, which must match the value you provided VMSS_NAME (see steps above)
- Enter or accept agent name.
- Install the ADO Agent as a service and start it:
- `sudo ./svc.sh install`
- `sudo ./svc.sh start`
- `sudo ./svc.sh status`
- Allow agent user to access Docker:
- `export VM_ADMIN_USER="agentadmin"`
- `sudo gpasswd -a "${VM_ADMIN_USER}" docker`
- `sudo chmod ga+rw /var/run/docker.sock`
- Update group permissions so docker is available without logging out and back in: `newgrp - docker`
- Test docker: `docker run hello-world`
- `export VM_ADMIN_USER="agentadmin"`
- If `/home/"$VM_ADMIN_USER"/.docker` exist:
- `sudo chown "$VM_ADMIN_USER":docker /home/"$VM_ADMIN_USER"/.docker -R`
- `sudo chmod ga+rwx "$HOME/.docker" -R`
- Create a symlink:
- `mkdir -p /home/agentadmin/myagent/_work`
- `ln -s /opt/hostedtoolcache /home/agentadmin/myagent/_work/_tool`

### Deleting an Agent Pool

1. Open Azure DevOps > Settings > Agent Pools > find pool to be removed and click "..." > Delete
2. Open Azure Portal > Key Vaults > ray-agent-secrets > Access Policies > delete the access policy assigned to the VMSS to be deleted
3. Open Azure Portal > All Resources > type the VMSS name into the search bar > select and delete the following resources tied to that VMSS:
- public IP address
- load balancer
- the VMSS itself

### Useful Commands

```
# Get connection info for all VMSS instances
az vmss list-instance-connection-info -g $RESOURCE_GROUP_NAME --name $VMSS_NAME

# SSH to a VMSS instance
ssh -o "IdentitiesOnly=yes" -i $SSH_KEY_PAIR_PATH agentadmin@{ PUBLIC IP}

# Download agentadmin private SSH key (formatting is lost if key is pulled from the UI)
az keyvault secret download --file $SSH_KEY_PAIR_PATH --vault-name $KEY_VAULT_NAME --name $SSH_KEY_PAIR_NAME


az keyvault secret download --file ~/downloads/PAT --vault-name $KEY_VAULT_NAME --name $ADO_TOKEN_NAME
```
Loading