Install Jupyter notebook with Livy for Spark on Cloudera Hadoop

Environment

  • Cloudera CDH 5.12.x running Livy and Spark (see other blog on this website to install Livy)
  • Anaconda parcel installed using Cloudera Manager (see other blog on this website to install Anaconda parcel on CDH)
  • Non-Kerberos cluster. Kerberos based Hadoop cluster needs different setup and these instructions wont work.

We will first install Anaconda and Sparkmagic on Windows 10 to install Jupyter Notebook using Anaconda.

  1. We strongly recommend installing Python and Jupyter using the Anaconda Distribution, which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.

2. Download the Anaconda installer from https://www.anaconda.com/download/ which was Anaconda 5.0.1 For Windows Installer Python 3.6 version 64-bit.

3. After successful install of the package on Windows click on the Anaconda Navigator in the Windows Start. Click on the Jupyterlab launch button.

4. On the Jupyterlab browser notebook and tried to load sparkmagic using:

%load_ext sparkmagic.magics

This gave an error of module not found.

5. We need to install sparkmagic. Run the following command on the notebook.

!pip install sparkmagic

It gives a successful install message in the output  that Sparkmagic-0.12.5 is installed along with few other packages.

Note: If you get an error under linux that says:

Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: ‘/opt/anaconda3/lib/python3.6/site-packages/pbr’. Consider using the `–user` option or check the permissions.

Then you may need to run the $ pip install sparkmagic  after login as root in a linux terminal. Also if you see the error:  distributed 1.21.8 requires msgpack, which is not installed. Then run as root:

# pip install msgpack

Next run in the notebook:

!pip show sparkmagic

Name: sparkmagic

Version: 0.12.5

Summary: SparkMagic: Spark execution via Livy

Home-page: https://github.com/jupyter-incubator/sparkmagic

Author: Jupyter Development Team

Author-email: jupyter@googlegroups.org

License: BSD 3-clause

NEXTSTEP:

From https://github.com/jupyter-incubator/sparkmagic/blob/master/sparkmagic/example_config.json

Copy the example_config.json into ~/.sparkmagic/config.json  (Usually in Windows the folder location will be C:\Users\youruserid\.sparkmagic\config.json ). Change the username, password and url of the Livy server.

{

“kernel_python_credentials” : {

“username”: “youruserid”,

“password”: “xxxxx”,

“url”: “http://livyhostname:8998 

“auth”: “None”

},

“kernel_scala_credentials” : {

“username”: “youruserid”,

“password”: “xxxxx”,

“url”: “http://livyhostname:8998

“auth”: “None”

},

“kernel_r_credentials”: {

“username”: “youruserid”,

“password”: “xxxxx”,

“url”: “http://livyhostname:8998 

},

Next launch Jupiter Lab browser:

Click Windows->Start->Anaconda Navigator and Launch Jupiter Lab. Next run the below command in the Jupiter Notebook:

!jupyter nbextension enable –py –sys-prefix widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension…
– Validating: ok

Note: If you get an error in linux: PermissionError: [Errno 13] Permission denied: ‘/opt/anaconda3/etc/jupyter/nbconfig/notebook.json’

Then we will need to run the above command from a linux terminal using root login.

Next run the following commands in the Jupyter notebook:

!jupyter-kernelspec install c:\users\youruserid\appdata\local\continuum\anaconda3\lib\site-packages\sparkmagic\kernels\sparkkernel

In Linux if you get an error:

[Errno 13] Permission denied: '/usr/local/share/jupyter'
Perhaps you want to install with `sudo` or `--user`?

Then login as root in linux and run from a terminal:

# jupyter-kernelspec install /opt/anaconda3/lib/python3.6/site-packages/sparkmagic/kernels/sparkkernel
[InstallKernelSpec] Installed kernelspec sparkkernel in /usr/local/share/jupyter/kernels/sparkkernel

!jupyter-kernelspec install c:\users\youruserid\appdata\local\continuum\anaconda3\lib\site-packages\sparkmagic\kernels\pysparkkernel

!jupyter-kernelspec install c:\users\youruserid\appdata\local\continuum\anaconda3\lib\site-packages\sparkmagic\kernels\pyspark3kernel

!jupyter-kernelspec install c:\users\youruserid\appdata\local\continuum\anaconda3\lib\site-packages\sparkmagic\kernels\sparkrkernel

!jupyter serverextension enable –py sparkmagic

NEXTSTEP:

From the Jupyter top right corner click on the Python 3 and change to the PySpark kernel.

Make sure Livy server is running by running curl on the Livy server:

$ curl localhost:8998/sessions
{"from":0,"total":0,"sessions":[]}

Run a simple command 1+1 in the Jupyter, the notebook will connect to Spark cluster to execute your commands. It will start Spark application with your first command.

Run another command:

%%sql

show databases

It will display the list of databases defined in Hive in the Hadoop Spark Cluster.

Another EXAMPLE: To draw a plot and store in a pdf file on the Livy server:

import os

import matplotlib

import matplotlib.pyplot as plt

plt.switch_backend(“Agg”)

matplotlib.use(‘Agg’)

mydir = r’/home/yourlinuxid/’

os.chdir(mydir)

os.getcwd()

plt.plot([1,4,3,6,12,20])

plt.savefig(‘myplot1.pdf’)

Now if you download the myplot1.pdf from your home directory in the linux server running Livy then you can see the graph created in pdf.

You have now successfully installed Jupyter notebook on Windows 10 and ran Python using pySpark to access Livy and Spark for Hadoop backend.


REFERENCES:

https://github.com/jupyter-incubator/sparkmagic

https://blog.chezo.uno/livy-jupyter-notebook-sparkmagic-powerful-easy-notebook-for-data-scientist-a8b72345ea2d

https://spark-summit.org/east-2017/events/secured-kerberos-based-spark-notebook-for-data-science/

https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.