KAFKA examples

Some examples: Doctor Kafka: medium.com/@Pinterest_Engineering/open-sourcing-doctorkafka-kafka-cluster-healing-and-workload-balancing-e51ad25b6b17  


Install Jupyterhub

This blog is not yet ready and so do not use these instructions for now Jupyterhub Prerequisites: Before installing JupyterHub, you will need: a Linux/Unix based system Python 3.4 or greater. An understanding of using pip or conda for installing Python packages is helpful. Installation using conda: Check if Anaconda package is already installed: dpkg -l | grep conda If … Continue reading Install Jupyterhub

Some helpful links

Cloudera/Hadoop: tiny.cloudera.com/hw-reqs tiny.cloudera.com/aws-ra http://docs.aws.amazon.com/quickstart/latest/cloudera/welcome.html http://docplayer.net/25124019-Hadoop-security-authors-ben-spivey-and-joey-echeverria-provide-in-depth-information-about-the-security-features-available-in-hadoop-and-organize-them.html http://blog.cloudera.com/blog/2015/03/how-to-quickly-configure-kerberos-for-your-apache-hadoop-cluster/ http://wpcertification.blogspot.com/ https://henning.kropponline.de/   Jupyter: https://blog.insightdatascience.com/using-jupyter-on-apache-spark-step-by-step-with-a-terabyte-of-reddit-data-ef4d6c13959a Docker: https://www.dataquest.io/blog/docker-data-science/ Miscellaneous: https://blog.daftcode.pl/hype-driven-development-3469fc2e9b22 https://github.com/parth8891/NYC_Taxi_Data_Analysis https://keshif.me/demo/VisTools http://blog.thedigitalgroup.com/dattatrayap/high-speed-ingestion-into-solr-with-custom-talend-component-developed-by-tdg/ http://www.bigendiandata.com/          

Install Jupyter notebook with Livy for Spark on Cloudera Hadoop

Environment Cloudera CDH 5.12.x running Livy and Spark (see other blog on this website to install Livy) Anaconda parcel installed using Cloudera Manager (see other blog on this website to install Anaconda parcel on CDH) We will first install Anaconda and Sparkmagic on Windows 10 to install Jupyter Notebook using Anaconda. We strongly recommend installing Python and … Continue reading Install Jupyter notebook with Livy for Spark on Cloudera Hadoop

Install Anaconda Python package on Cloudera CDH.

  This blog will show how to install Anaconda parcel in CDH to enable Pandas and other python libraries on Hue pySpark notebook. Install Steps: Installing the Anaconda Parcel 1.From the Cloudera Manager Admin Console, click the “Parcels” indicator in the top navigation bar. 2.Click the “Configuration” button on the top right of the Parcels … Continue reading Install Anaconda Python package on Cloudera CDH.

Install Hue Spark Notebook with Livy on Cloudera

This blog will show simple steps to install and configure Hue Spark notebook to run interactive pySpark  scripts using Livy. Environment used: CDH 5.12.x , Cloudera Manager, Hue 4.0, Livy 0.3.0, Spark 1.6.0 on RHEL linux. Sentry was installed in unsecure mode. NOTE: Make sure the user who logs into Hue has access to Hive … Continue reading Install Hue Spark Notebook with Livy on Cloudera