Jupyter on remote servers

This guide describes how to set up a Python 2 and 3 environment and a Jupyter kernel on a remote server. Please reach out to me with any suggestions or issues! My email is codycook@stanford.edu

I have run code almost exclusively on a Jupyter kernel set up on a remote server for nearly 4 years, both while working at Uber and now at Stanford. The benefits over running anything locally are substantial: instead of overwhelming my laptop, everything is done remotely. With most large datasets it will be truly impossible to run code locally due to memory constraints.

First, we need a good Python environment manager

Managing environments with pyenv

Follow the instructions here to install pyenv, pyenv-virtualenv, and pyenv-virtualenvwrapper

The following commands should all be run from the remote server. Importantly, you do not need root access for any of them. There is a chance, however, that a package will require some library dependency that you cannot install yourself and will need to ask a root user (e.g., your university IT department).

Setup directories and add code to your .bashrc for starting pyenv on launch

# Change directory names if desired
mkdir ~/.ve
mkdir ~/workspace
echo 'export WORKON_HOME=~/.ve' >> ~/.bashrc
echo 'export PROJECT_HOME=~/workspace' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

Install the latest Python 2 and 3 versions

pyenv install 3.7.4
pyenv install 2.7.13

Setup two virtual environments, one for Python 2 and one for Python 3

pyenv virtualenv 3.7.4 jupyter3
pyenv virtualenv 2.7.13 ipython2

Now we will install packages in each of the two virtualenvs. This ensures that our Python 2 and Python 3 environments are separate. Soon, I'll show you how to link them such that they both appear when you launch a Jupyter kernel in the jupyter3 environment.

I use a text document requirements.txt to track all packages that I want installed automatically. My current one is available here.

We'll also want to expand Jupyter's capabilities with extensions from nbextensions

Starting with the jupyter3 environment...

pyenv activate jupyter3
pip install --upgrade pip
pip install jupyter
python -m ipykernel install --user
pip install -r requirements.txt # make sure this is in the same folder!
# jupyter extensions
pip install https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tarball/master
jupyter contrib nbextension install --user
pip install jupyter_nbextensions_configurator
jupyter nbextensions_configurator enable --user
pyenv deactivate

For the ipython2 environment, we just need to install the appropriate versions of the packages. All the Jupyter goodness will be run from the jupyter3 environment.

# install python 2 packages
pyenv activate ipython2
pip install --upgrade pip
pip install -r requirements.txt
pyenv deactivate

As you work and decide you need additional packages, be sure to install them in the appropriate environments.

Now we need to make our two environments play nicely with each other. This establishes the PATH priority of the environments.

pyenv global 3.7.4 2.7.13 jupyter3 ipython2

Finally, we want to ensure the virtualenv wrapper starts the moment you login to the server

echo 'pyenv virtualenvwrapper_lazy' >> ~/.bashrc
exec $SHELL # restart the shell to confirm everything worked

Launching Jupyter and keeping the lights on with tmux

We can now launch Jupyter! Make sure to activate the jupyter3 environment first. Since we're running this on a remote server—with no UI or browser—we need to specify that we don't want Jupyter to try to open a browser link.

pyenv activate jupyter3
jupyter notebook --no-browser

One issue: if we log out now, our kernel will die. If we want to leave code running overnight, for example, we'd have to leave our local computer on. This would defeat one of the main benefits of using a remote server.

Never fear, there's a solution. I use tmux, others use an alternative called screen. Tmux allows you to open a 'window' on a remote server that will persist once you log out. We want to run Jupyter in that window, so that our kernel never dies and we can always access it, even without logging back in to the server. I recommend this guide for installing and understanding tmux.

Once you have tmux installed, we make the following small change to the above code:

tmux
pyenv activate jupyter3
jupyter notebook --no-browser

Now we can log out of the server, but our Jupyter notebook will keep running within the tmux window.

When you start the kernel, you want to pay attention to two things. First, the port it's hosted on. It will likely say localhost:8888. Second, the access key, which will be some long string of letters and numbers. Copy this somewhere. If you forget either of these, no worries -- you can open this tmux window again in the future and scroll up to find them.

To check on the kernel in the future, we can just log into the server and attach that tmux window:

tmux ls # list open windows
tmux a -t 0 # attach window 0

Accessing Jupyter locally

Now that we've set up a Jupyter notebook on a remote server, which will persist thanks to running it in tmux, we want to be able to access it in our local browser.

To do so, we'll use tunnels to create a, well, tunnel, from our local computer to the server. Tunnels are kind of a pain to manage. But fortunately, there's a nice solution: the SSH Tunnel Manager. Install it from that link.

Open the Tunnel Manager and click the gear icon to setup a new tunnel. Setup the tunnel to look as follows:

The remote ports should be the port you saw when you started the Jupyter kernel. The local ports will be how we access it now. The only requirement is that these not be in use. The default is usually 8888, but if you are running multiple notebooks on the server then you will need to switch to 8889, etc... for example, in my setup I use 8889 as the local port because 8888 is already in use.

Whenever you want to access your remote Jupyter notebook, you'll first just open the tunnel by clicking start. It will prompt for your password and, if necessary, 2-factor authentication.

In your local browser, go to localhost:8888 (or whichever port you used for the local connection). You should see a toolbar that looks like this:

If you click on 'new' on the right, you'll be able to start a new notebook in either Python 2 or Python 3.

Bringing it all together

Anytime you want to work:

  1. Make sure you have a Jupyter notebook running remotely. If not, set it up per above.
  2. Open tunnel manager and start the relevant tunnel
  3. In your local browser, go to localhost:8888 (or whichever port you used for the local connection)
  4. Create some sweet analyses!

A few other tips:

  • If you need to install additional packages, make sure to activate the jupyter3 or ipython2 environment first
  • Be sure to check out the nbextensions tab! You can access it at localhost:8888/nbextensions
  • If you want to use JupyterLab, just install it while setting up the jupyter3 environment. It will then be available at localhost:8889/lab
  • Use the command htop to check on resource usage. If you're using a shared server, be a good citizen and don't overwhelm its resources!
  • The text editor vim is painful to learn, but incredibly powerful once you invest the time. Perhaps that will be my next guide.
  • Use git to manage your code so that it's easy to use your code on many servers (and locally). And backup your data, especially if you don't manage the server you're using -- don't trust that your code and data won't be deleted!
  • To access files created remotely, either use scp or, for an easy UI way, CyberDuck.

References

My setup didn't evolve in a vacuum and I'm thankful to many guide-writers. Here are a few of my favorites