This describes how to train a machine learning model on a remote machine via SSH. This is useful as it allows for developing a model on a local, resource-constrained machine and then seamlessly train the model on a much larger remote "cloud" server.
- Command-line: mltk ssh --help
- Tutorial: Cloud Training with vast.ai
The MLTK features the command: mltk ssh
which internally manages all of the details necessary to execute an MLTK command on a remote machine.
The basic flow for training a model in the cloud is as as follows:
- Create a model specification on a local machine
- Invoke the command:
mltk ssh train my_model
, which will:
a. Open a secure connection to a remote machine
b. Upload all necessary files to the remote machine - Invoke the train command on the remote machine (which may have a large amount of GPUs/CPUs/RAM)
- After training completes on the remote machine, the model archive and any other training files are downloaded to the local machine
So basically, develop the model on the local machine, quickly train it on a cloud machine, and all training results appear on the local machine as if the model had been trained locally.
SSH is a standard protocol for securely connecting to remote machines. With it, shell commands may be issued from a local machine and executed on a remote machine.
While the details of creating an SSH connection is out-of-scope for this document, it is important to note the following:
- The SSH Server runs on the remote machine
- The SSH Client runs on the local machine
- OpenSSH is a free, open-source tool that provides both the an SSH client and server which are available for Windows, Linux, and Mac.
- The OS of the client does not need to match the server, e.g. a Windows SSH client can connect to a Linux SSH server
While an SSH client does not need to be installed on the local machine,
it is helpful to have one to ensure the login credentials are working before using the mltk ssh
command (which internally uses its own SSH client python library).
Refer to the following documentation for how to install the SSH client on Windows: Get started with OpenSSH
On Linux, the SSH client is likely installed by default. However, on Ubuntu-like systems, it can be installed with:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install openssh-client
A keypair is required to securely connect to the SSH server. The details of creating and distributing the keypair are out-of-scope for this document, however, it is important to note the following:
- A keypair consists of one private key and one public key
- The private key resides on the local machine, its contents must be securely stored (i.e. do not share it with others)
- The public key resides on the remote machine, its contents need not be secure (i.e. it can be copied & pasted anywhere)
The MLTK features a helper command for generating an SSH keypair:
mltk ssh-keygen --help
Which will generate an Ed25519 keypair in the specified output directory., e.g.:
# Generate pair at: ~/.ssh/id_my_key
mltk ssh-keygen my_key
Refer to the following for additional information on creating and distributing a keypair:
When the mltk ssh <command>
command is invoked, the following sequence is internally executed:
- Open an SSH connection to remote server
Using the settings specified in the--host
option, in~/.mltk/user_settings.yaml
, or in~/.ssh/config
- Create remote working directory
Specified in--host
option or in~/.mltk/user_settings.yaml
- Create and activate an MLTK python virtual environment
Only if not disabled in model specification or~/.mltk/user_settings.yaml
- Upload files configured in model specification and/or
~/.mltk/user_settings.yaml
- Export any environment variables configured in model specification and/or
~/.mltk/user_settings.yaml
- Execute any startup shell commands configured in model specification and/or
~/.mltk/user_settings.yaml
- Execute the MLTK
<command>
in a detached subprocess
This way, the command continues to execute even if the SSH session prematurely disconnects - Poll the remote MLTK command subprocess while dumping the remote log file to the local terminal
Issuing CTRL+C will abort the remote command subprocess (Use the--no-wait
option to skip this step) - Once the MLTK command completes, download the model archive file (if available)
- Download any files configured in model specification and/or
~/.mltk/user_settings.yaml
- Download any other logs files
- Execute any shutdown shell commands configured in model specification and/or
~/.mltk/user_settings.yaml
The following settings are used by the mltk ssh
command.
Note that most of these settings are optional and may be configured
in several different locations (see the next section, "Settings Locations", for more details).
This is the working directory where the MLTK command will execute.
- This setting is optional
- Default:
.
This setting can be specified in one of three locations (in order of priority):
- The
--host
command-line option, e.g.
mltk ssh --host my_server.com/workspace
- The
SshMixin
model mixin property, e.g.
my_model.remote_dir = '~/workspace
- The
ssh.remote_dir
setting in~/.mltk/user_settings.yaml
, e.g.ssh: remote_dir: ~/workspace
The name of the SSH server. This can be:
- Domain name of server, e.g.: myserver.com
- IP address, e.g.: 145.243.23.222
- Host name in ~/.ssh/config
This setting is required.
This setting can be specified in one of three locations (in order of priority):
- The
hostname
in ~/.ssh/config that maps to theHost
provided on the command-line with the--host
option, e.g.:
mltk ssh --host my_server
-> findHost
entry in~/.ssh/config
with namemy_server
-> Use correspondingHostname
value - The
--host
command-line option, e.g.
mltk ssh --host myserver.com
- The
ssh.connection.hostname
setting in~/.mltk/user_settings.yaml
, e.g.ssh: connection: hostname: myserver.com
The listening port of the SSH server.
- This setting is optional
- Default:
22
This setting can be specified in one of four locations (in order of priority):
- The
--port
command-line option, e.g.:
mltk ssh --host ssh3.vast.ai -p 34567
-> port=34567 - The
--host
command-line option, e.g.
mltk ssh --host ssh3.vast.ai:34567
-> port=34567 - The
User
in ~/.ssh/config that maps to theHost
provided on the command-line with the--host
option, e.g.:
mltk ssh --host my_server
-> findHost
entry in~/.ssh/config
with namemy_server
-> Use correspondingPort
value - The
ssh.connection.port
setting in~/.mltk/user_settings.yaml
, e.g.ssh: connection: port: 2222
The SSH login username.
- This setting is optional
This setting can be specified in one of three locations (in order of priority):
- The
--host
command-line option, e.g.
mltk ssh --host [email protected]:34567
-> username=root - The
User
in ~/.ssh/config that maps to theHost
provided on the command-line with the--host
option, e.g.:
mltk ssh --host my_server
-> findHost
entry in~/.ssh/config
with namemy_server
-> Use correspondingUser
value - The
ssh.connection.username
setting in~/.mltk/user_settings.yaml
, e.g.ssh: connection: username: root
The filepath to the SSH private key.
- This setting is optional
This setting can be specified in one of three locations (in order of priority):
- The
-i
command-line option, e.g.
mltk ssh myserver.com -i ~/.ssh/id_myserver
- The
IdentityFile
in ~/.ssh/config that maps to theHost
provided on the command-line with the--host
option, e.g.:
mltk ssh --host my_server
-> findHost
entry in~/.ssh/config
with namemy_server
-> Use correspondingIdentityFile
value - The
ssh.connection.key_filename
setting in~/.mltk/user_settings.yaml
, e.g.ssh: connection: key_filename: ~/.ssh/id_myserver
List or dictionary of environment variables to export before executing MLTK command on remote server.
- This setting is optional
This setting can be specified in two locations (in order, higher is merged with lower (so higher overwrites lower)):
- The
SshMixin
model mixin property, e.g.
my_model.environment = ['PROD_ENV=1', 'CUDA_DEVICES=2']
or
my_model.environment = dict(PROD_ENV=1, CUDA_DEVICES=2)
- The
ssh.environment
setting in~/.mltk/user_settings.yaml
, e.g.orssh: environment: - PROD_ENV=1 - CUDA_DEVICES=
ssh: environment: PROD_ENV: 1 CUDA_DEVICES: 2
List of file paths to upload from the local to remote before executing MLTK command.
- This setting is optional
If the path does not contain a pipe |
, e.g.: dataset/*.csv
, then:
- The local path is relative to the model specification script
- The remote path is relative to the remote working directory
- Absolute paths are not allowed
- The path may use the recursive glob format
If the path does contain a pipe |
, e.g.: ~/patch.txt|./patch.txt
, then:
- Format is
<local path>|<remote path>
- The local path is relative to the model specification script
- The remote path is relative to the remote working directory
- Both paths may be absolute
- No wildcards
This setting can be specified in two locations (in order, higher is appended lower):
- The
SshMixin
model mixin property, e.g.
my_model.upload_files = ['dataset.zip', 'dataset/*.csv']
- The
ssh.upload_files
setting in~/.mltk/user_settings.yaml
, e.g.ssh: upload_files: - dataset.zip - dataset/*.csv - ~/patch.txt|./patch.txt
List of shell commands to execute on remote machine before executing the MLTK command.
- This setting is optional
- The commands run in a bash shell
This setting can be specified in two locations (in order, higher is appended lower):
- The
SshMixin
model mixin property, e.g.
my_model.startup_cmds = ['pip install mylib', 'sudo apt-get install 7zip']
- The
ssh.startup_cmds
setting in~/.mltk/user_settings.yaml
, e.g.ssh: startup_cmds: - pip install mylib - sudo apt-get install 7zip
List of file paths to download from the remote to local after executing MLTK command.
- This setting is optional
If the path does not contain a pipe |
, e.g.: logs/*.txt
, then:
- The local path is relative to the model specification script
- The remote path is retlavie to the remote working directory
- Absolute paths are not allowed
- The path may use the recursive glob format
If the path does contain a pipe |
, e.g.: ./results.txt|~/results.txt
, then:
- Format is
<remote path>|<local path>
- The local path is relative to the model specification script
- The remote path is relative to the remote working directory
- Both paths may be absolute
- No wildcards
This setting can be specified in two locations (in order, higher is appended lower):
- The
SshMixin
model mixin property, e.g.
my_model.download_files = ['results.zip', 'logs/*.txt']
- The
ssh.download_files
setting in~/.mltk/user_settings.yaml
, e.g.ssh: download_files: - results.zip - logs/*.txt - ./results.txt|~/results.txt
List of shell commands to execute on remote machine after executing the MLTK command.
- This setting is optional
- The commands run in a bash shell
This setting can be specified in two locations (in order, higher is appended lower):
- The
SshMixin
model mixin property, e.g.
my_model.shutdown_cmds = ['curl -F data=log.txt my_server.com']
- The
ssh.shutdown_cmds
setting in~/.mltk/user_settings.yaml
, e.g.ssh: shutdown_cmds: - curl -F `data=log.txt` my_server.com
This flags allows for syncing the local MLTK into the remote MLTK. This is useful if changes have been made to the local MLTK that are not in the public MLTK python package.
- This setting is optional
- Default:
false
- The
ssh.sync_local_mltk
setting in~/.mltk/user_settings.yaml
, e.g.ssh: sync_local_mltk: true
The various settings may be specified in the following locations:
There are three command-line options:
mltk ssh --host [<user name>@]<host>[:<port>][/<path>]
Where:
<user name>
- Optional, user login name<host>
- Required, SSH hostname or name in ~/.ssh/config<port>
- Optional, SSH port, default is 22<path>
- Optional, remote directory path
Examples:
mltk ssh --host my_server
mltk ssh --host myserver.com
mltk ssh --host 192.168.1.56
mltk ssh --host [email protected]
mltk ssh --host [email protected]:456
mltk ssh --host [email protected]/workspace
mltk ssh --port <port>
Where:
<port>
is the SSH server's listening port
mltk ssh --identity_file <file path>
Where:
<file path>
- Is the file path to the SSH private key
The SshMixin model mixin allows for defining model-specific SSH settings.
NOTE: This mixin is optional, it is not required to run the model with the ssh
command.
# Import MLTK model object and mixins
from mltk.core import (
MltkModel,
TrainMixin,
AudioDatasetMixin,
EvaluateClassifierMixin,
SshMixin,
)
# Instantiate MltkModel with SshMixin
class MyModel(
MltkModel,
TrainMixin,
AudioDatasetMixin,
EvaluateClassifierMixin,
SshMixin
):
pass
my_model = MyModel()
# Define model-specific SSH properties
my_model.ssh_remote_dir = '~/workspace'
my_model.ssh_create_venv = True
my_model.ssh_environment = ['DEV=1', 'CUDA_DEVICES=2']
my_model.ssh_startup_cmds = ['pip install mylib']
my_model.ssh_upload_files = ['dataset.zip', 'dataset/*.csv']
my_model.ssh_download_files = ['results.zip']
my_model.ssh_shutdown_cmds = ['echo "all done"']
The user_settings.yaml file allows for defining user-specific MLTK settings.
This file must be manually created at ~/.mltk/user_settings.yaml
.
The following SSH settings may be added to this file (all settings are optional):
ssh:
config_path: <path to ssh config file on local machine>
remote_dir: <path to remote working directory>
create_venv: <true/false, if a MLTK python venv should be automatically created on the remote machine>
connection:
hostname: <SSH server hostname>
port: <SSH server listening port>
username: <user login name>
key_filename: <path to private key on local machine>
environment: <list of environment variables to export on remote machine>
upload_files: <list of files to upload to remote machine>
startup_cmds: <list of shell commands to execute on remote machine before executing MLTK command>
download_files: <list of files to download after executing MLTK command>
shutdown_cmds: <list of shell commands to execute after executing MLTK command>
File: ~/.mltk/user_settings.yaml
:
ssh:
config_path: ~/ssh_config
remote_dir: ~/workspace
create_venv: false
connection:
hostname: my_server.com
port: 222
username: joe
key_filename: ~/.ssh/id_my_server
environment:
- CUDA_VISIBLE_DEVICES=-1
- DEV_ENV=1
upload_files:
- dataset.zip
- config.txt
startup_cmds:
- pip install silabs-mltk
- sudo apt install -y p7zip-full libsndfile1
download_files:
- custom_logs/**
shutdown_cmds:
- curl -F `data=log.txt` my_server.com
The SSH Config file is a standard file used by the SSH client.
By default, this file is located at ~/.ssh/config
. This path can be overridden by defining the ssh.config_path
setting
in ~/.mltk/user_settings.yaml
, e.g.:
ssh:
config_path: custom/path/ssh/config
Refer to the online documentation for more details about the contents of this file: SSH Config
File: ~/.ssh/config
:
Host vast_ai
HostName ssh6.vast.ai
Port 31521
User root
StrictHostKeyChecking no
IdentityFile ~/.ssh/id_vast_ai
Then issuing the following command will use the config file settings:
mltk ssh --host vast_ai train image_example1
Executing MLTK commands on a remote machine via SSH is done using the ssh
operation.
For more details on the available command-line options, issue the command:
mltk ssh --help
When a command is invoked, it executes in a detached sub-process. This way, if the SSH connection prematurely disconnects, the command will continue to execute.
Issuing Ctrl+C
will abort the command on both the local and remote machines.
The following are examples of how remote SSH training can be invoked from the command-line:
The following shows how to train the keyword_spotting_on_off_v3
model on a remote server.
In this example, all of the SSH settings are configured in the ~/.mltk/user_settings.yaml
.
After training completes, the results are downloaded to the local machine.
mltk ssh train keyword_spotting_on_off_v3
The following shows to train the keyword_spotting_on_off_v3
model on a remote server.
In this example, the SSH server settings are provided on the command-line.
After training completes, the results are downloaded to the local machine.
mltk ssh -h [email protected]/workspace -p 23452 -i ~/.ssh/id_vast_ai train keyword_spotting_on_off_v3
The -h
option has the following format: [<user name>@]<host>[:<port>][/<path>]
where:
<user name>
- user login name (optional)<host>
- SSH server hostname<port>
- SSH server listening port (optional)<path>
- Remote directory path (optional)
The -p
is the SSH server's listening port.
And the -i
option points to the SSH private key file on the local machine.
The following shows to train the keyword_spotting_on_off_v3
model on a remote server.
In this example, the SSH server hostname is provided and the login info is retrieved from the ~/.ssh/config
file.
Since the --no-wait
option is provided, the command does not wait for the training to complete on the remote server.
Instead, the command immediately returns and the training command executes on the remote server in the background.
To retrieve the training results, the --resume
option was be later provided (see Example 4 below).
mltk ssh -h vast.ai train keyword_spotting_on_off_v3 --no-wait
The following shows how to retrieve the results of a previously executed command.
This is useful if SSH connection prematurely disconnects or the --no-wait
option was previously used.
This will wait until the previously invoked command has completed on the remote server then download the training results.
mltk ssh -h vast.ai train keyword_spotting_on_off_v3 --resume
HINT: You could also use the --no-wait
option to poll the remote server to see if the command has completed without waiting for it to finish.
Only one command may be active on the remote server. The --force
option may be used to abort a previously invoked command.
mltk ssh train keyword_spotting_on_off_v3 --force