Grid CLI
Name | Type | Description | Default |
--debug |
boolean | Used for logging additional information for debugging purposes. | False |
-o , --output |
choice (console | json ) |
Output format | console |
--help |
boolean | Show this message and exit. | False |
Downloads artifacts for a given run or experiments.
This will download artifacts generated by the runs / experiments. Regex filtering is used to determine which artifacts to download.
Name | Type | Description | Default |
--download_dir |
directory | Download directory that will host all artifact files. | ./grid_artifacts |
-m , --match_regex |
text | Only show artifacts that match this regex filter. Best if quoted. | `` |
--help |
boolean | Show this message and exit. | False |
grid clusters [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Create a grid compute cluster with NAME from the provided AWS account details.
grid clusters aws [OPTIONS] NAME
Name | Type | Description | Default |
--external-id |
text | N/A | |
--role-arn |
text | AWS role ARN attached to`the associated resources. | |
--region |
text | AWS region which is used to host the associated resources. | us-east-1 |
--instance-types |
text | Instance types which you desire to support for computer jobs within the cluster. | g2.8xlarge, g3.16xlarge, g3.4xlarge, g3.8xlarge, g3s.xlarge, g4dn.12xlarge, g4dn.16xlarge, g4dn.2xlarge, g4dn.4xlarge, g4dn.8xlarge, g4dn.metal, g4dn.xlarge, p2.16xlarge, p2.8xlarge, p2.xlarge, p3.16xlarge, p3.2xlarge, p3.8xlarge, p3dn.24xlarge, t2.large, t2.medium, t2.xlarge, t2.2xlarge, t3.large, t3.medium, t3.xlarge, t3.2xlarge |
--cost-savings |
boolean | using this flag ensures that the cluster is created with a profile that is optimized for cost saving, making runs cheaper but start-up times may increase | False |
--wait |
boolean | using this flag CLI will wait until the cluster is running | False |
--edit-before-creation |
boolean | Edit the created cluster spec before submitting to API server. | False |
--help |
boolean | Show this message and exit. | False |
Retrieve cluster logs from the managed cluster identified by CLUSTER_ID.
These logs are streamed to stdout, and can either be tailed to view log lines as they are generated, or limited to a time range.
grid clusters logs [OPTIONS] CLUSTER_ID
Name | Type | Description | Default |
-t , --tail |
boolean | whether to tail log lines | False |
--from |
text | The starting timestamp to query cluster logs from. | 24 hours ago |
--to |
text | The end timestamp / relative time increment to query logs for. This is ignored when tailing logs. | 0 seconds ago |
--limit |
integer | The max number of log lines returned. | 1000 |
--time-format |
choice (human | iso8601 ) |
Timestamp formatting style | iso8601 |
--help |
boolean | Show this message and exit. | False |
Manages credentials.
grid credentials [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--set_default |
text | Credential set to make default. | None |
--help |
boolean | Show this message and exit. | False |
Adds user credentials to access Grid.
grid credentials add [OPTIONS]
Name | Type | Description | Default |
--provider |
choice (aws ) |
Credential provider. | |
--file |
filename | JSON file to where credentials are | |
--alias |
text | Given name for a credential set | None |
--description |
text | Description for a credential set | None |
--help |
boolean | Show this message and exit. | False |
Manages Datastore workflows.
grid datastore [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--global |
boolean | Fetch sessions from everyone in the team when flag is passed | False |
--cluster |
text | The cluster id to list datastores for. | test-7 |
--show-incomplete |
boolean | Show any datastore uploads which were started, but killed or errored before they finished uploading all data and became "viewable" on the grid datastore user interface. | False |
--help |
boolean | Show this message and exit. | False |
Clears datastore cache which is saved on the local machine when uploading a datastore to grid.
This removes all the cached files from the local machine, meaning that resuming an incomplete upload is not possible after running this command.
grid datastore clearcache [OPTIONS]
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Creates a datastore from SOURCE.
The upload session is referenced by the name. this name must be used to resume the upload if it is interupted.
grid datastore create [OPTIONS] [SOURCE]
Name | Type | Description | Default |
--source |
text | N/A | None |
--name |
text | Name of the datastore | None |
--cluster |
text | cluster id to create the datastore on. (Bring Your Own Cloud Customers Only). | test-7 |
--help |
boolean | Show this message and exit. | False |
Deletes a datastore with the given name and version tag.
For bring-your-own-cloud customers, the cluster id of the associated resource is required as well.
grid datastore delete [OPTIONS]
Name | Type | Description | Default |
--name |
text | Name of the datastore | |
--version |
integer | Version of the datastore | |
--cluster |
text | cluster id to delete the datastore from. (Bring Your Own Cloud Customers Only). | test-7 |
--help |
boolean | Show this message and exit. | False |
Resume uploading an incomplete datastore upload session.
grid datastore resume [OPTIONS]
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Allows you to delete grid resources.
grid delete [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Delete CLUSTER and all associated AWS resources.
Deleting a run also deletes all Runs and Experiments which were started on the cluster. deletion permanently removes not only the record of all runs on a cluster, but all associated experiments, artifacts, metrics, logs, etc.
This process may take a few minutes to complete, but once started is irriversable. Deletion permanently removes not only cluster from being managed by grid, but tears down every resource grid managed (for that cluster id) in the host cloud. All object stores, container registries, logs, compute nodes, volumes, etc. are deleted and cannot be recovered.
grid delete cluster [OPTIONS] CLUSTER
Name | Type | Description | Default |
--force |
boolean | Force delete cluster from grid system. This does NOT delete any resources created by the cluster, just cleaning up the entry from the grid system. You should not use this under normal circumstances | False |
--wait |
boolean | using this flag CLI will wait until the cluster is deleted | False |
--help |
boolean | Show this message and exit. | False |
Delete some set of EXPERIMENT_NAMES from grid.
This process is immediate and irreversible, deletion permanently removes not only the record of the experiment, but all associated artifacts, metrics, logs, etc.
grid delete experiment [OPTIONS] EXPERIMENT_NAMES...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Delete some set of RUN_NAMES from grid.
Deleting a run also deletes all experiments contained within the run.
This process is immediate and irreversible, deletion permanently removes not only the record of the run, but all associated experiments, artifacts, metrics, logs, etc.
grid delete run [OPTIONS] RUN_NAMES...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Open the CLI docs.
grid docs [OPTIONS]
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Edits a resource
grid edit [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Edit existing cluster
grid edit cluster [OPTIONS] CLUSTER
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
View list of historic Runs.
grid history [OPTIONS]
Name | Type | Description | Default |
--global |
boolean | Fetch history from everyone in the team when flag is passed | False |
--help |
boolean | Show this message and exit. | False |
List the compute node instance types which are available for computation.
For bring your own cloud customers, the instance types available are defined by the organizational administrators who created the cluster.
grid instance-types [OPTIONS]
Name | Type | Description | Default |
--cluster |
text | Cluster ID whence the instance types needs to be fetched. (Bring Your Own Cloud Customers Only). | test-7 |
--help |
boolean | Show this message and exit. | False |
Authorize the CLI to access Grid AI resources for a particular user.
If no username or key is provided, the CLI will prompt for them. After providing your username, a web browser will open to your account settings page where your API key can be found.
grid login [OPTIONS]
Name | Type | Description | Default |
--key |
text | API Key from Grid | None |
--username |
text | Username used in Grid | None |
--help |
boolean | Show this message and exit. | False |
Shows stdout logs associated with some EXPERIMENT.
This includes both build and experiment logs.
Name | Type | Description | Default |
--show-build-logs |
boolean | Shows build logs if not shown by default. | None |
-l , --tail-lines |
integer | Number of lines to show from the end. | None |
--help |
boolean | Show this message and exit. | False |
Launch a Run from some SCRIPT with the provided SCRIPT_ARGS.
A run is a collection of experiments which run with a single set of SCRIPT_ARGS. The SCRIPT_ARGS passed to the run command can represent fixed values, or a set of values to be searched over for each option. If a set of values are passed, a sweep (grid-search or random-search) will be performed, launching the desired number of experiments in parallel - each with a unique set of input arguments.
The script runs on the specified instance type and Grid collects the generated artifacts, metrics, and logs; making them available for you to view in real time (or later if so desired) on either our Web UI or via this CLI.
grid run [OPTIONS] [RUN_COMMAND]...
Name | Type | Description | Default |
--config |
Path | Path to Grid config YML. | None |
--name |
text | Name for this run | None |
--cluster |
text | N/A | test-7 |
--strategy |
choice (grid_search | random_search ) |
Hyper-parameter search strategy | None |
--num_trials |
text | Number of samples from full search space that are used by the random_search strategy | None |
--seed |
text | Seed value for the random_search strategy |
None |
--instance_type |
text | Instance type to start training session in | t2.medium |
--gpus |
integer | Number of GPUs to allocate per experiment | 0 |
--cpus |
integer | Number of CPUs to allocate per experiment | 1 |
--memory |
text | How much memory an experiment needs | 100 |
--datastore_name |
text | Datastore name to be mounted in training | None |
--datastore_version |
integer | Datastore version to be mounted in training | None |
--datastore_mount_dir |
text | Directory to mount Datastore in training job. The default datastore mount location is /datastores | None |
--framework |
text | Framework to use in training. Select from available options: lightning , torch , tensorflow , julia (will select the latest available version) , julia:1.6.1 , julia:1.6.2 , julia:1.6.3 , julia:1.6.4 , julia:1.6.5 , julia:1.7.0 , julia:1.7.1 , torchelastic |
lightning |
--use_spot |
boolean | Use spot instance. The spot instances, or preemptive instance can be shut down at will | False |
--ignore_warnings |
boolean | If we should ignore warning when executing commands | False |
--scratch_size |
integer | The size in GB of the scratch space attached to the experiment | 100 |
--scratch_mount_path |
text | The mount path to mount the scratch space | /tmp/scratch |
-l , --localdir |
boolean | Upload source code from the local directory instead of having Grid clone the repo from GitHub (default).This option is particularly useful for users that do not host their source code on GitHub. | False |
-d , --dockerfile |
text | Dockerfile for the image building | None |
--dependency_file |
text | Dependency file path. �If not provided and a requirements.txt , environment.yml , or Project.toml file is present in the current-working-directory, then we will automaticallyinstall dependencies from according to the inferred file. |
None |
--auto_resume |
boolean | Mark this run as auto-resumable. �If underlying node/instance/VM is terminated, the experiment will beautomatically resumed, with all artifacts restores from the lastknown state. The experiment code will receive SIGTERM signal and itmust exit with status code 0 upon properly dumping its state to disk. | False |
--help |
boolean | Show this message and exit. | False |
Contains a grouping of commands to manage sessions workflows.
Executing the grid session
command without any further arguments
or commands renders a list of all sessions registered to your Grid
user account.
grid session [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--global |
boolean | Fetch sessions from everyone in the team when flag is passed | False |
--help |
boolean | Show this message and exit. | False |
Change the instance type of a session; this allows you to upgrade or downgrade the compute capability of the session nodes while keeping all of your work in progress untouched.
The session must be PAUSED in order for this command to succeed
Specifying --spot allows you to change the instance to an interuptable spot instances (which come at a steap discount, but which can be interrupted and shut down at any point in time depending on cloud provider instance type demand).
specifying --on_demand changes the instance to an on-demand type, which cannot be inturrupted but is more expensive.
grid session change-instance-type [OPTIONS] SESSION_NAME INSTANCE_TYPE
Name | Type | Description | Default |
--spot |
boolean | Use a spot instance to launch the session | None |
--on_demand , --on-demand |
boolean | Use an on-demand instance to launch the session | None |
--help |
boolean | Show this message and exit. | False |
Creates a new interactive session with NAME.
Interactive sessions are optimized for development activites (before executing hyperparemeter sweeps in a Run). Once created, sessions can be accessed via VSCode, Jupyter-lab, or SSH interfaces.
Grid manages the installation of any/all core libraries, drivers, and interfaces to the outside world. Sessions can be run on anything from a small 2 CPU core + 4GB memory instance to a monster machine with 96 CPU cores + 824 GB memory + eight V100 GPUs + 40 GBPS network bandwidth (no, those values aren't typos!); or really anything in between.
grid session create [OPTIONS]
Name | Type | Description | Default |
--cluster |
text | Cluster to run on | test-7 |
--instance_type |
text | Instance type to start session in. | t2.medium |
--use_spot |
boolean | Use spot instance. The spot instances, or preemptive instance can be shut down at will | False |
--disk_size |
integer | The disk size in GB to allocate to the session. | 200 |
--datastore_name |
text | Datastore name to be mounted in the session. | None |
--datastore_version |
integer | Datastore version to be mounted in the session. | None |
--datastore_mount_dir |
text | Absolute path to mount Datastore in the session (defaults to /datastores/<datastore-name> ). |
None |
--config |
Path | Path to Grid config YML | None |
--name |
text | Name for this session | None |
--help |
boolean | Show this message and exit. | False |
Deletes a session identified by SESSION_NAME.
Deleting a session will stop the running instance (and any computations being performed on it) and billing of your account. All work done on the machine is permenantly removed, including all/any saved files, code, or downloaded data (assuming the source of the data was not a grid datastore - datastore data is not deleted).
grid session delete [OPTIONS] SESSION_NAME
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Mount session directory to local. The session is identified by SESSION and MOUNT_DIR is a path to a directory on the local machine.
To mount a filesystem use: ixNode:[dir] mountpoint
Examples: # Mounts the home directory on the interactive node in dir data grid session mount bluberry-122 ./data
# mounts ~/data directory on the interactive node to ./data
grid session mount bluberry-122:~/data ./data
To unmount it: fusermount3 -u mountpoint # Linux umount mountpoint # OS X, FreeBSD
Under the hood this is just passing data to sshfs after syncing grid's interactive, i.e. this command is dumbed down sshfs
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Pauses a session identified by the SESSION_NAME.
Pausing a session stops the running instance (and any computations being performed on it - be sure to save your work!) and and billing of your account for the machine. The session can be resumed at a later point with all your persisted files and saved work unchanged.
grid session pause [OPTIONS] SESSION_NAME
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Resumes a session identified by SESSION_NAME.
grid session resume [OPTIONS] SESSION_NAME
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
SSH into the interactive node identified by NODE_NAME.
If you'd like the full power of ssh, you can use any ssh client and
do ssh
<node_name>``. This command is stripped down version of it.
1. Path to custom key:
grid session ssh satisfied-rabbit-962 -- -i ~/.ssh/my-key
2. Custom ssh option:
grid session ssh satisfied-rabbit-962 -- -o "StrictHostKeyChecking accept-new"
grid session ssh [OPTIONS] NODE_NAME [SSH_ARGS]...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Manage SSH keys.
grid ssh-keys [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Register a new SSH public key by providing a path to the KEY file and a NAME for it in Grid.
grid ssh-keys add [OPTIONS] NAME KEY
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
list all registered SSH public keys in authorized_keys format
grid ssh-keys authorized_keys [OPTIONS]
Name | Type | Description | Default |
--limit |
integer | maximum number of public keys to fetch | 100 |
--help |
boolean | Show this message and exit. | False |
"list currently registered SSH public keys
grid ssh-keys list [OPTIONS]
Name | Type | Description | Default |
--limit |
integer | maximum number of public keys to fetch | 100 |
--help |
boolean | Show this message and exit. | False |
remote registered SSH public key
grid ssh-keys rm [OPTIONS] KEY_ID
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Checks the status of Runs, Experiments, and Sessions.
grid status [OPTIONS] [RUN]
Name | Type | Description | Default |
--global |
boolean | Fetch status from all collaborators when flag is passed | False |
--help |
boolean | Show this message and exit. | False |
Stop Runs or Experiments.
grid stop [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Stop one or more EXPERIMENT_NAMES.
This preserves progress completed up to this point, but stops further computations and any billing for the machines used.
grid stop experiment [OPTIONS] EXPERIMENT_NAMES...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Stop one or more RUN_NAMES.
This preserves progress completed up to this point, but stops further computations and any billing for the machines used.
grid stop run [OPTIONS] RUN_NAMES...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Synchronize the requirements file with packages and versions from the currently active environment
grid sync-env [OPTIONS]
Name | Type | Description | Default |
--config |
text | Path to Grid config YML | None |
--dependency_file |
text | Path to dependency file. Defaults to the requirements.txt or environment.yml found in the root |
None |
--help |
boolean | Show this message and exit. | False |
Show information about a TEAM_NAME.
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Show the user information of the authorized user for this CLI instance.
grid user [OPTIONS] COMMAND [ARGS]...
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Specify the default CLUSTER_NAME which all operations should be run against.
grid user set-cluster-context [OPTIONS] CLUSTER_NAME
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Specify the default CLUSTER_ID which all operations should be run against.
grid user set-default-cluster [OPTIONS] CLUSTER_NAME
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Prints CLI version to stdout.
grid version [OPTIONS]
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |
Grid view opens a web UI page details the output of some RUN_OR_EXPERIMENTS.
Name | Type | Description | Default |
--help |
boolean | Show this message and exit. | False |