Skip to content

Setting up JupyterHub

Albert Huang edited this page Oct 31, 2016 · 2 revisions

Introduction

Have a spare server lying around? You can get it setup to host a JupyterHub, a multi-user IPython notebook service!

This tutorial will guide you to setting up a rudimentary JupyterHub server. In particular, it will be setup with GitHub logins to allow easy sign-in/registration on the server.

(The traditional way is to manually create the accounts on the server, but that might be too painful to do...)

Requirements

This tutorial is designed for Debian 8+ users, but it should work for pretty much any user that has NGINX 1.3+ installed.

If you are using Ubuntu/Mint/other Debian derivative, the below commands should work.

The server requires root to execute. It's not the best in terms of security, but for a "rudimentary" server, it works for now. Users will always be jailed in their own accounts, so it's somewhat better. That said, we strongly recommend that you limit the list of people using your installation to those that you trust. You may opt to add a whitelist - see optional JupyterHub configuration below!

Finally, you should know your hostname/IP that you would like to use for your JupyterHub. This will be important, so make sure it is already set up before continuing!

Instructions

For all of these commands, leave out sudo if you are already running as root!

  1. System Preparation

    Make sure your packages are up to date:

    sudo apt-get update
    sudo apt-get upgrade
    

    Then install the "build essentials", aka compilers and tools:

    sudo apt-get install build-essential
    

    (If you're not using a Debian-based distribution, install the equivalent package(s) here.)

  2. Installing NGINX

    Simply run this command to install:

    sudo apt-get install nginx
    

    (Again, if you are not using a Debian-based distribution, install the equivalent package(s) here.)

    Verify that you have at least NGINX v1.3+:

    nginx -V
    

    (On Debian 8, this returns nginx version: nginx/1.6.2, which works!)

  3. Installing Miniconda

    Miniconda is an awesome, pre-built distribution of Python. It is a pain-free way of quickly getting Python installed on your system. And of course, it comes with everything you need to build your own Python modules, too - which will come in handy in the next step!

    To install Miniconda, run the following commands:

    wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    chmod +x Miniconda3-latest-Linux-x86_64.sh 
    ./Miniconda3-latest-Linux-x86_64.sh
    exec $SHELL
    

    During the installation, make sure to:

    • Install Miniconda to /opt/conda (system-wide installation)
    • Add /opt/conda/bin to your PATH (the installer will offer to do so - say yes!)

    Once you're done installing and running the above commands, check that everything is in place:

    python -V
    

    You should see something similar to this (ignoring the version):

    Python 3.5.2 :: Continuum Analytics, Inc.
    

    If you see this, you're all set! (If not, just log out and log back in to see the changes.)

  4. Installing JupyterHub (and dependencies)

    Now you can install JupyterHub:

    pip install jupyter jupyterhub oauthenticator
    

    We also need to install Node + NPM for a proxy package.

    Install ndenv:

    git clone https://github.com/riywo/ndenv ~/.ndenv
    echo 'export PATH="$HOME/.ndenv/bin:$PATH"' >> ~/.bash_profile
    echo 'eval "$(ndenv init -)"' >> ~/.bash_profile
    exec $SHELL -l
    git clone https://github.com/nodenv/node-build.git $(ndenv root)/plugins/node-build
    eval "$(ndenv init -)"
    

    Then install Node 7.0.0 (latest at the time of writing):

    ndenv install 7.0.0
    ndenv global 7.0.0
    

    Finally, install configurable-http-proxy, a node dependency for JupyterHub:

    npm install -g configurable-http-proxy
    
  5. Configuring JupyterHub

    This is the toughest part of the installation... but it's not too bad! (Just remember that at the end, you won't have to touch this configuration ever again!)

    a. Directory Setup

    First and foremost, you need a directory to store everything in. Let's set up everything in /jupyter:

    mkdir /jupyter
    cd /jupyter
    

    b. Initial Configuration

    Now generate a template configuration file:

    jupyterhub --generate-config
    

    Inside the file, you'll be able to see what the default configuration options are for your JupyterHub installation.

    c. GitHub Setup

    Go to https://github.com/settings/developers.

    On the top-right, click on Register a new application.

    Fill out all of the information. For the homepage and callback URL, use https://HOST_OR_IP_HERE/ and https://HOST_OR_IP_HERE/hub/oauth_callback, respectively.

    Once you're done, click Register application.

    d. JupyterHub Configuration

    Now open jupyterhub_config.py (the config you just made), and add/update the following lines:

    # New users should automatically be created!
    # Note that they can't log into our server,
    c.LocalAuthenticator.add_user_cmd = ['adduser', '-q', '--gecos', '""', '--home', '/jupyter/USERNAME', '--shell', '/usr/sbin/nologin', '--disabled-password']
    c.LocalAuthenticator.create_system_users = True
    
    # We disable SSL here because we can enforce this
    # easily elsewhere
    c.JupyterHub.confirm_no_ssl = True
    
    # And even better, there's no worries about that since
    # we're only allowing connections through NGINX's proxy
    c.JupyterHub.ip = '127.0.0.1'
    
    # Set the authentication to use GitHub's authenticator
    # We use LocalGitHubOAuthenticator to take advantage of the
    # automatic user creation.
    c.JupyterHub.authenticator_class = 'oauthenticator.LocalGitHubOAuthenticator'  
    
    # Configuration bits for the GitHub authenticator
    # Insert your information here!
    c.LocalGitHubOAuthenticator.oauth_callback_url = "https://HOST_OR_IP_HERE/hub/oauth_callback"
    c.LocalGitHubOAuthenticator.client_id = "YOUR_CLIENT_ID"
    c.LocalGitHubOAuthenticator.client_secret = "YOUR_CLIENT_SECRET"
    
    # Which GitHub user is an admin? Specify that here!
    c.Authenticator.admin_users = { 'MY_GITHUB_USERNAME' }
    
    # OPTIONAL: Whitelist of which GitHub users can actually
    # use this service. Users not listed here will not
    # be allowed to access. This is much more secure, and
    # is highly recommended if your installation is public.
    c.Authenticator.whitelist = { 'MY_GITHUB_USERNAME', 'GOOD_GITHUB_USERNAME_1' }

    Replace the placeholder text with your actual information. The client ID and secret should be on the GitHub application configuration page.

    e. Nginx Configuration

    Now open /etc/nginx/sites-enabled/default, and change the main block to:

    server {
        listen 80 default_server;
        listen [::]:80 default_server;
    
        # SSL configuration
        #
        # listen 443 ssl default_server;
        # listen [::]:443 ssl default_server;
        #
        # Self signed certs generated by the ssl-cert package
        # Don't use them in a production server!
        #
        # include snippets/snakeoil.conf;
    
        root /var/www/html;
    
        # Add index.php to the list if you are using PHP
        index index.html index.htm index.nginx-debian.html;
    
        server_name _;
    
        location / {
            proxy_pass http://127.0.0.1:8000;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header Host $host;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;  
        }
    
        # Managing WebHook/Socket requests between hub user servers and external proxy
        location ~* /(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket)/? {
            proxy_pass http://127.0.0.1:8000;
    
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header Host $host;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            # WebSocket support   
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
        }
    }

    Once you're done, reload the server configuration:

    sudo systemctl reload nginx
    

    f. SSL Configuration

    To enable SSL on your server, you can do it in two ways:

    Both options are free!

    For Let's Encrypt, follow this guide to get this to work!

  1. Running JupyterHub

    Take a deep breath. Then, when you're ready, run:

    jupyterhub
    

    If all goes well, you should now be able to run notebook instances on your server!

Troubleshooting

ndenv not working

Symptoms

Despite doing all of the instructions related to getting Node installed, you are still not able to use ndenv! You can run the command manually to init, but it does not automatically load for you!

Solution

It's very likely that your ~/.bash_profile isn't being loaded in the first place! Some distributions do not source this file when your shell is loaded.

To test, add a temporary echo line to your ~/.bash_profile:

echo "Hello, world"

Save, and restart your shell. If you don't see the Hello, world message, that means that your ~/.bash_profile isn't being loaded at all!

Remove that echo line, and then try this:

echo ". ~/.bash_profile" >> ~/.bashrc

This will add sourcing from ~/.bash_profile to your bash shell initialization script. If you restart your shell now, loading should hopefully now work!

(This issue also applies for those trying to get pyenv to work as well!)

Proxy errors

This error was probably the easiest (and the most annoying) to fix!

Symptoms

When you launch your Python notebook, it loads the kernel, but is stuck when connecting to the kernel. It will stay stuck until it finally gives up and timeouts. It might repeat the connection process until it hits a limit, and then stops with an error message.

While this is occurring, in the console, if you turn on proxy debugging, e.g. adding c.JupyterHub.debug_proxy = True to your jupyterhub_config.py, you are likely getting an error that looks like this:

HH:NN:SS.mmm - error: [ConfigProxy] Proxy error:  Error: socket hang up
    at createHangUpError (_http_client.js:253:15)
    at Socket.socketCloseListener (_http_client.js:285:23)
    at emitOne (events.js:101:20)
    at Socket.emit (events.js:188:7)
    at TCP._handle.close [as _onclose] (net.js:501:12)

Everything works (including user creation!), but it's not able to connect to the kernel, despite it starting successfully.

Solution

When I was debugging this, I tried going to the port directly, e.g. http://my.ip.goes.here:8000/, and logging in. As it turns out, it worked!

12:30:09.309 - debug: [ConfigProxy] PROXY WEB /user/alberthdev/api/sessions to http://127.0.0.1:54427
12:30:09.309 - debug: [ConfigProxy] PROXY WEB /user/alberthdev/api/contents/Untitled.ipynb/checkpoints?_=SOME_TS to http://127.0.0.1:54427
12:30:09.322 - debug: [ConfigProxy] PROXY WEB /user/alberthdev/kernelspecs/python3/logo-64x64.png to http://127.0.0.1:54427
12:30:09.439 - debug: [ConfigProxy] PROXY WS /user/alberthdev/api/kernels/kernelss-guid-guid-guid-morekernelss/channels?session_id=SESH_ID_GOES_HERE to http://127.0.0.1:54427

Then I went back to the regular host that's using the nginx proxy, and tried again. Of course, it errored out:

12:31:26.158 - debug: [ConfigProxy] PROXY WEB /user/alberthdev/api/kernels/kernelss-guid-guid-guid-morekernelss/channels?session_id=SESH_ID_GOES_HERE
F4C95C to http://127.0.0.1:54427
[W 2016-10-30 12:31:26.164 alberthdev handlers:255] Replacing stale connection: kernelss-guid-guid-guid-morekernelss:SESH_ID_GOES_HEREF4C95C
12:32:05.356 - debug: [ConfigProxy] PROXY WEB /user/alberthdev/notebooks/Untitled.ipynb to http://127.0.0.1:54427
12:32:05.386 - error: [ConfigProxy] Proxy error:  Error: socket hang up
    at createHangUpError (_http_client.js:253:15)
    at Socket.socketCloseListener (_http_client.js:285:23)
    at emitOne (events.js:101:20)
    at Socket.emit (events.js:188:7)
    at TCP._handle.close [as _onclose] (net.js:501:12)
12:32:05.386 - error: [ConfigProxy] 503 GET /user/alberthdev/api/kernels/kernelss-guid-guid-guid-morekernelss/channels?session_id=SESH_ID_GOES_HEREF4C95C
[I 2016-10-30 12:32:05.391 JupyterHub log:100] 200 GET /hub/error/503?url=%2Fuser%2Falberthdev%2Fapi%2Fkernels%2Fkernelss-guid-guid-guid-morekernelss%2Fchanne
ls%3Fsession_id%3DSESH_ID_GOES_HERE (@127.0.0.1) 1.43ms

Then I realized something: the successful connection mentioned PROXY WS, but the failed one mentioned PROXY WEB. Aha! This meant that NGINX was forwarding the request, but not as a WebSocket connection.

What was I doing wrong? Well, I looked at my NGINX configuration, and saw this:

proxy_set_header Connection "update";

I'm not updating the connection - I'm upgrading it from HTTP to WebSocket! The correct line should be:

proxy_set_header Connection "upgrade";

Once that was fixed, I reloaded NGINX's configuration, and voila - it worked!

12:32:06.459 - debug: [ConfigProxy] PROXY WEB /user/alberthdev/api/sessions to http://127.0.0.1:54427
12:32:06.468 - debug: [ConfigProxy] PROXY WEB /user/alberthdev/api/contents/Untitled.ipynb/checkpoints?_=SOME_TS to http://127.0.0.1:54427
12:32:06.666 - debug: [ConfigProxy] PROXY WS /user/alberthdev/api/kernels/kernelss-guid-guid-guid-morekernelss/channels?session_id=SESH_ID_GOES_HERE to http://127.0.0.1:54427

Two takeaways from this issue:

  • Check your debug log to make sure that your connection is being proxied correctly.
  • When in doubt, check... and recheck your configuration!