Run a Python program to access Hadoop webhdfs with Kerberos enabled

Following python code makes REST calls to a secure Kerberos enabled Hadoop cluster to use webhdfs REST api to get file data:   You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user. Make sure the python modules requests and requests_kerberos have been installed. Otherwise install it … Continue reading Run a Python program to access Hadoop webhdfs with Kerberos enabled

Advertisements

Install Jupyterhub

This blog is not yet ready and so do not use these instructions for now Jupyterhub Prerequisites: Before installing JupyterHub, you will need: a Linux/Unix based system Python 3.4 or greater. An understanding of using pip or conda for installing Python packages is helpful. Installation using conda: Check if Anaconda package is already installed: dpkg -l | grep conda If … Continue reading Install Jupyterhub

Some helpful links

Cloudera/Hadoop: tiny.cloudera.com/hw-reqs tiny.cloudera.com/aws-ra http://docs.aws.amazon.com/quickstart/latest/cloudera/welcome.html http://docplayer.net/25124019-Hadoop-security-authors-ben-spivey-and-joey-echeverria-provide-in-depth-information-about-the-security-features-available-in-hadoop-and-organize-them.html http://blog.cloudera.com/blog/2015/03/how-to-quickly-configure-kerberos-for-your-apache-hadoop-cluster/ http://wpcertification.blogspot.com/ https://henning.kropponline.de/ https://blogs.msdn.microsoft.com/pliu/2016/01/02/integrating-cloudera-cluster-with-active-directory-part-13/     Jupyter: https://blog.insightdatascience.com/using-jupyter-on-apache-spark-step-by-step-with-a-terabyte-of-reddit-data-ef4d6c13959a Docker: https://www.dataquest.io/blog/docker-data-science/ Miscellaneous: https://blog.daftcode.pl/hype-driven-development-3469fc2e9b22 https://github.com/parth8891/NYC_Taxi_Data_Analysis https://keshif.me/demo/VisTools http://blog.thedigitalgroup.com/dattatrayap/high-speed-ingestion-into-solr-with-custom-talend-component-developed-by-tdg/ http://www.bigendiandata.com/          

Install Jupyter notebook with Livy for Spark on Cloudera Hadoop

Environment Cloudera CDH 5.12.x running Livy and Spark (see other blog on this website to install Livy) Anaconda parcel installed using Cloudera Manager (see other blog on this website to install Anaconda parcel on CDH) We will first install Anaconda and Sparkmagic on Windows 10 to install Jupyter Notebook using Anaconda. We strongly recommend installing Python and … Continue reading Install Jupyter notebook with Livy for Spark on Cloudera Hadoop

Install Anaconda Python package on Cloudera CDH.

  This blog will show how to install Anaconda parcel in CDH to enable Pandas and other python libraries on Hue pySpark notebook. Install Steps: Installing the Anaconda Parcel 1.From the Cloudera Manager Admin Console, click the “Parcels” indicator in the top navigation bar. 2.Click the “Configuration” button on the top right of the Parcels … Continue reading Install Anaconda Python package on Cloudera CDH.