-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathremote_ipynb.tex
266 lines (200 loc) · 17.8 KB
/
remote_ipynb.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
\documentclass[a4paper]{article}
\usepackage{listings}
\renewcommand*\ttdefault{pcr} % use courier for mono text
\lstset{language=bash, frame=single, basicstyle=\ttfamily\small}
\usepackage{graphicx}
\usepackage[pdftex, colorlinks=true]{hyperref} % must go last!
\newenvironment{ttt}{\ttfamily}{\par} % for quoting terminal output
\title{Remotely work with Jupyter (IPython) Notebooks at UCL Geography}
\author{Chad Stainbank}
\begin{document}
\maketitle
\section{Introduction}
\label{sec:intro}
The \href{http://jupyter.org/}{Jupyter Notebook}~(previously IPython Notebook) provides a feature-rich interactive environment for learning and using \href{https://www.python.org/}{Python}.
If you're studying for a BSc or MSc at the University College London Department of Geography then it's likely that you will have worked with these Notebooks in one or more of your modules or in the preparation of a dissertation, usually while seated at one of the workstations in Pearson 110a that form the \href{http://www.geog.ucl.ac.uk/resources/computer-support/teaching-cluster}{Geography Linux Cluster}.
Since physical access to Pearson 110a is not always practical, you may wish to have \emph{remote} access to these files from another computer.
This guide will show you how to work with Notebooks, hosted on the Linux Cluster, from your own computer.
The bulk of this guide is written for somebody using the terminal on a Linux computer.
However, it is likely that your own computer runs on either macOS or Microsoft Windows.
Since macOS is a \href{http://unix.stackexchange.com/questions/1489/is-mac-os-x-unix}{Unix-based} operating system, Apple fans should be able to enter all the commands in Section \ref{sec:linuxmac} into the \href{http://www.macworld.co.uk/feature/mac-software/get-more-out-of-os-x-terminal-3608274/}{Mac Terminal} with little modification.
Windows users, on the other hand, must first set up a terminal emulator before following these instructions, as explained in Section \ref{sec:windows}.
\section{On Linux/ Mac OS X}
\label{sec:linuxmac}
Section \ref{sec:manual} provides a very explicit set of instructions to get up and running with remote Jupyter Notebooks using the Linux terminal.
If you're familiar with the command-line interface then you could skip to Section \ref{sec:auto}, where these commands are condensed to shell functions, although it's worth going through this once to ensure that everything is working smoothly.
\subsection{Manual}
\label{sec:manual}
The \emph{Listing} boxes, such Listing \ref{lst:example} below, display lines of commands to be entered into the terminal.
If a command contains text enclosed in angle brackets (\texttt{\textless{}\textgreater{}}) then you must replace the variable (including the brackets) as specified.
Any text following the \texttt{\#} character is a comment and does not need to be entered.
Many commands take \emph{flags} to specify some of their behaviour --- be careful to distinguish between the single dash (\texttt{-}) and the double dash (\texttt{--}) when typing these out.
\begin{lstlisting}[caption={An example of a command}, label={lst:example}]
cmnd --flag <variable> # this does something
\end{lstlisting}
\subsubsection{Log in to cluster machine}
\label{sec:login}
The program \href{http://linuxcommand.org/man_pages/ssh1.html}{SSH} allows you to log into one of the machines in Pearson 110a from your own computer.
The Linux cluster is set up in a way that requires a two-step (or \emph{multi-hop}) remote login, meaning that you must first pass through a \emph{gateway server}.
To make the first hop, open a terminal window and enter the command in Listing \ref{lst:logingw}, replacing \texttt{\textless{}USER\textgreater{}} with your UCL username and \texttt{\textless{}GATEWAY\textgreater{}} with the name of any one of the Geography gateway servers listed on the \href{http://www.geog.ucl.ac.uk/resources/computer-support/linux-remote-access}{Geography Remote Access page}).
\begin{lstlisting}[caption={Login to gateway}, label={lst:logingw}]
ssh <USER>@<GATEWAY>.geog.ucl.ac.uk
\end{lstlisting}
The first time you log in to a given machine, you will get the following message:
\begin{quote}
\begin{ttt}
The authenticity of host '\textless{}GATEWAY\textgreater{} (\ldots)' can't be established.
RSA key fingerprint is \ldots{}
Are you sure you want to continue connecting (yes/no)?
\end{ttt}
\end{quote}
Type \texttt{yes} to confirm, then enter your Geography cluster password \footnote{This is not necessarily the same password that you use for other UCL services.} when prompted to log in to the gateway server.
To make the second hop, enter the command in Listing \ref{lst:loginws}, replacing \texttt{\textless{}MACHINE\textgreater{}} with the name of your favourite teaching workstation in Pearson 110a. Notice that, as you are already within the geography network, you don't have to specify \texttt{\textless{}USER\textgreater{}} or finish the address with \texttt{.geog.ucl.ac.uk} this time.
\begin{lstlisting}[caption={Login to workstation}, label={lst:loginws}]
ssh <MACHINE>
\end{lstlisting}
Simply enter your password again to complete the login to your chosen cluster machine.
\subsubsection{Run remote Notebook Server}
\label{sec:runnb}
Start running a remote Notebook server with the command in Listing \ref{lst:runnb}.
The flag \texttt{--no-browser} prevents Firefox from opening automatically, while \texttt{--port} tells the server to listen on a particular port number.
This is set to 8888 by default, although you can choose just about any integer above 1026 to replace \texttt{\textless{}PORT\textgreater{}}.
\begin{lstlisting}[caption={Run Notebook server}, label={lst:runnb}]
jupyter notebook --no-browser --port=<PORT>
\end{lstlisting}
Jupyter should now inform you that:
\begin{quote}
\texttt{The IPython Notebook is running at: http://localhost:\textless{}PORT\textgreater{}.}
\end{quote}
Look carefully at the link and check that the port number is the same the one you specified.
If it has changed, simply note down the new number and use it as \texttt{\textless{}PORT\textgreater{}} for all subsequent commands.
\subsubsection{Set up SSH tunnel}
\label{sec:tunnel}
The next step is to open an \href{http://blog.trackets.com/2014/05/17/ssh-tunnel-local-and-remote-port-forwarding-explained-with-examples.html}{\emph{SSH tunnel}} with the workstation in Section \ref{sec:login}, allowing you to connect to the Notebook server created in Section \ref{sec:runnb}.
This is achieved by passing the \texttt{-L} flag, which tells SSH to use \href{https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding}{local port forwarding} to transfer data between nodes.
In a \emph{new} terminal, enter the commands in Listings \ref{lst:tunnel1} and \ref{lst:tunnel2}, replacing variables and supplying passwords as before.
\begin{lstlisting}[caption={Tunnel to gateway server}, label={lst:tunnel1}]
ssh -L <PORT>:localhost:<PORT> <USER>@<GATEWAY>.geog.ucl.ac.uk
\end{lstlisting}
\begin{lstlisting}[caption={Tunnel to cluster machine}, label={lst:tunnel2}]
ssh -L <PORT>:localhost:<PORT> -N <MACHINE> # this will hang
\end{lstlisting}
If all has gone well then the terminal will appear to be hanging, as the optional \texttt{-N} flag prevents any more user input.
This means that the SSH tunnel is now open for business.
\subsubsection{Open Notebook in browser}
\label{sec:opennb}
To use the Notebook, simply open the link from the first terminal (Section \ref{sec:runnb}) using a browser on your local computer.
Congratulations: you should now be looking at the familiar Jupyter Notebook file browser. If something has gone wrong, see Section \ref{sec:trouble}, otherwise move on to Section to \ref{sec:auto} to make the whole remote Notebook process easier.
\subsection{Automatic}
\label{sec:auto}
\subsubsection{Generate SSH keys}
\label{sec:sshkeys}
Following all the steps in \ref{sec:manual} requires you to enter your password a total of four times (once for each SSH command).
\href{https://wiki.archlinux.org/index.php/SSH_keys}{\emph{SSH keys}} provide an alternative form of authentication, using a private and a public key to allow passwordless login over SSH.
Two seperate pairs of keys must be set-up: one for the initial hop from your computer to the gateway node and another for the onward hop to the workstation.
To set up the first pair of keys, open a terminal on your computer and enter the command:
\begin{lstlisting}[caption={Generate SSH keys}, label={lst:genkeys}]
ssh-keygen -t rsa
\end{lstlisting}
Let the program use the default file with no passphrase by presssing \emph{Enter} three times, then copy the public key to the geography file network using Listing \ref{lst:cpkeys} and following the instructions returned by the program.
\begin{lstlisting}[caption={Copy public key to geography file system}, label={lst:cpkeys}]
ssh-copy-id <USER>@<GATEWAY>.geog.ucl.ac.uk
\end{lstlisting}
To generate a pair of keys for the second hop, simply login to the any Geography cluster machine and reenter the commands in Listings \ref{lst:genkeys} and \ref{lst:cpkeys}.
\subsection{Geography login/tunnel functions}
\label{sec:gfuncs}
It will very quickly become tedious to perfectly type out the commands in Section \ref{sec:manual} every time you want to run a Notebook remotely.
The two \href{https://www.shellscript.sh/functions.html}{functions} in Listing \ref{lst:gfuncs} (at the end of this document), written for the \href{http://cs.lmu.edu/~ray/notes/bash/}{\emph{bash shell}}, condense most of this to a few short commands.
Be aware that Listing \ref{lst:gfuncs} is for demonstration only; it does not display properly in a pdf document and so will not work if used as-is.
Instead, a raw text file containing these functions can be found at:
\noindent{\small\url{https://github.com/stainbank/remote_ipynb/blob/master/geog_functions.sh}}
The simplest way to ``install'' these two functions is to paste the entire code block --- replacing the defaults as appropriate --- at the end of a file in your home directory called \href{http://superuser.com/questions/49289/what-is-the-bashrc-file}{.bashrc}~file.
If this file doesn't already exist, \href{http://apple.stackexchange.com/a/119714}{as is the case on Mac OS X}, simply create it and then add the following line to the file .bash\_profile~(also in the user root directory):
\begin{lstlisting}[caption={Source .bashrc on startup}, label={lst:srcbashrc}]
if [ -f ~/.bashrc ]; then . ~/.bashrc; fi
\end{lstlisting}
The command \emph{geog} replaces the commands in Section \ref{sec:login}, while \emph{geogport} replaces Section \ref{sec:tunnel}.
You can therefore work with a remote Notebook by simply entering:
\begin{lstlisting}[caption={Set up and tunnel to remote Notebook server}, label={lst:usegfuncs}]
# In one terminal
geog
jupyter notebook --no-browser --port=<PORT>
# In another terminal
geogport
\end{lstlisting}
To allow you to change any variables from their default values, these functions also take positional arguments, in the order:
\texttt{geog \textless{}MACHINE\textgreater{} \textless{}GATEWAY\textgreater{} \textless{}USER\textgreater{}}
{}
\texttt{geogport \textless{}PORT\textgreater{} \textless{}MACHINE\textgreater{} \textless{}GATEWAY\textgreater{} \textless{}USER\textgreater{}}
Note that you must supply all \emph{preceeding} arguments, and prefix \texttt{\$} to the name of an argument to reuse its default value. Some usage examples are provided in Listing \ref{lst:gfuncsargseg}
\begin{lstlisting}[caption={Examples of arguments to geography login/tunnel functions}, label={lst:gfuncsargseg}]
geog ulanbator # change machine
geog $MACHINE triangle ucfaxyz # change user and gateway
geogport 8889 # change port
geogport $GPORT lima # change machine
\end{lstlisting}
\section{On Windows}
\label{sec:windows}
Sections \ref{sec:wsl} and \ref{sec:putty} describe how to setup a Windows terminal emulator, into which you can enter commands adapted from Section \ref{sec:linuxmac}.
Since much of the material here refers to that section, you must read it first.
However, I strongly urge any Windows users intending on working extensively with Python and Jupyter Notebooks to consider installing a Linux distribution\ldots
\subsection{Install Linux}
\label{sec:win2lnx}
Don't be alarmed: this need not be a drastic switch as there are two simple ways to set up Linux \emph{alongside} Windows.
The first is to use a \emph{\href{https://www.howtogeek.com/196060/beginner-geek-how-to-create-and-use-virtual-machines/}{virtual machine}}, an OS that runs \emph{inside} your current system.
This is very simple to setup (see \href{http://www.storagecraft.com/blog/the-dead-simple-guide-to-installing-a-linux-virtual-machine-on-windows/}{here}) and I myself use this method to work from UCL's library computers.
The second option, and the one I use for my personal computer, is \emph{\href{https://www.howtogeek.com/187789/dual-booting-explained-how-you-can-have-multiple-operating-systems-on-your-computer/}{dual-booting}}.
Although this imay seem somewhat involved, a native Linux install is always faster than a virtual one and there are plenty of guides available online to guide you through the process (e.g. \href{https://itsfoss.com/guide-install-linux-mint-16-dual-boot-windows/}{1}, \href{https://www.lifewire.com/ultimate-windows-7-ubuntu-linux-dual-boot-guide-2200.53}{2}, \href{https://www.howtogeek.com/214571/how-to-dual-boot-linux-on-your-pc/}{3}, \href{http://www.pcworld.com/article/2955460/operating-systems/dual-booting-linux-with-windows-what-you-need-to-know.html}{4}).
Whichever installation option you go for, you will need to choose one of the many \emph{\href{http://distrowatch.com/dwres.php?resource=major}{distributions}} of Linux. \href{https://www.ubuntu.com/download}{Ubuntu}~and \href{https://linuxmint.com/}{Linux Mint} both appeal to new Linux users, and any support pages for one usually applies to the other.
For use on a virtual machine or an ageing laptop, I would recommend opting for a more lightweight flavour --- i.e. one that features MATE, Xfce or LXDE as it's desktop environment.
\subsection{Windows Subsystem for Linux}
\label{sec:wsl}
If you have Windows 10 then you'll be able to take advantage of a new Microsoft development called \emph{Windows Subsystem for Linux} (WSL), otherwise referred to as \emph{Bash on Ubuntu on Linux}, which provides a Linux command-line interface on Windows.
To use WSL to run Jupyter Notebooks, simply \href{https://msdn.microsoft.com/en-gb/commandline/wsl/install_guide}{install the feature} and use these Bash terminals to follow the instructions in Section \ref{sec:linuxmac}
\subsection{PuTTY}
\label{sec:putty}
If you don't have access to WSL then you'll have to use PuTTY, a popular terminal emulator and SSH client, to access your Notebooks.
First install the program from its \href{http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html}{website}, then run a PuTTY instance to load its GUI configuration screen.
Navigate to the \emph{Session} screen and enter \texttt{\textless{}USER\textgreater{}@\textless{}GATEWAY\textgreater{}.geog.ucl.ac.uk} in the \emph{Host Name} box, making the appropriate replacements (described in Section \ref{sec:linuxmac}).
\footnote{Follow \href{http://www.geo.mtu.edu/geoschem/docs/putty_install.html}{this guide} if you want to be able to load X11 windows (i.e. graphical programs such as \emph{gedit} or \emph{Firefox}), although this is not necessary for using Notebooks.}
To store these gateway login settings, enter a name under \emph{Saved Sessions} and click \emph{Save} (Figure \ref{fig:putty1}).
To set up the SSH tunnel, ensure that the settings from the previous session are loaded, then navigate to the port forwarding options via \emph{Connection} \textgreater{} \emph{SSH} \textgreater{} \emph{Tunnels} (Figure \ref{fig:putty2}).
Under \emph{Source Port} supply \texttt{\textless{}PORT\textgreater{}}, and as the \emph{Destination} type \texttt{127.0.0.1:\textless{}PORT\textgreater{}}, substituting \textless{}PORT\textgreater{} with a valid port number as described in Section \ref{sec:runnb}.
Click the \emph{Add} button to commit this information to the list of \emph{Forwarded ports}.
Finally, to save these settings, navigate back to \emph{Sessions} and enter a \emph{new} name under \emph{Saved Sessions} before clicking \emph{Save} (Figure \ref{fig:putty3}).
To work with Jupyter Notebook on the Geography Linux Cluster, open each of these session profiles in individual PuTTY windows.
The first session you created (\emph{arch} in Figure \ref{fig:putty1}) replaces the command in Listing \ref{lst:logingw}, while the second (\emph{archport} in \ref{fig:putty3}) replaces Listing \ref{lst:tunnel1}.
Therefore, to remotely work with a Jupyter Notebook, simply substitute these while following the instructions in Section \ref{sec:manual}.
\begin{figure}[p]
\centering
\includegraphics[width=\textwidth]{figures/putty1_save_login_session.png}
\caption{Save session to login to gateway}
\label{fig:putty1}
\end{figure}
\begin{figure}[p]
\centering
\includegraphics[width=\textwidth]{figures/putty2_setup_tunnel.png}
\caption{Set up port forwarding to gateway}
\label{fig:putty2}
\end{figure}
\begin{figure}[p]
\centering
\includegraphics[width=\textwidth]{figures/putty3_save_tunnel_session.png}
\caption{Save session to tunnel to gateway}
\label{fig:putty3}
\end{figure}
\section{Troubleshooting}
\label{sec:trouble}
Occasionally, when trying to set up the SSH tunnel, you will get a message like:
\begin{quote}
\begin{ttt}
bind: Address already in use
channel\emph{setup}fwd\_listener: cannot listen to port: \textless{}PORT\textgreater{}
Could not request local forwarding
\end{ttt}
\end{quote}
This means that the tunnel could not be established as another process is using that particular port.
If this occurs, start everything over again with a different port number.
\newpage
\lstinputlisting[float, floatplacement=p!, caption={Functions to login/tunnel to geography cluster}, label={lst:gfuncs}, firstline=2, lastline=21, breaklines=true]{geog_functions.sh}
\end{document}