Copy a local linux file to hdfs: $ hdfs dfs -copyFromLocal sourcefile.txt /tmp View the file in hdfs: $ hdfs dfs -cat /tmp/sourcefile.txt
Following steps are needed to run a Java program in Hadoop with Kerberos security enabled: 1. Create a text file named FileCount.java and store it in your home directory such as /home/userid 2. Copy paste the below code into the file FileCount.java . Change the hadoop hostname from quickstart.cloudera:8020 to the correct host. import java.io.*; import … Continue reading Run a Java program in Hadoop with Kerberos enabled.
For installing Cloudera Private Cloud on OpenShift, we have to first start with the Cloudera Private Cloud Base installed on baremetal or VMs. Then we use the Cloudera Manager to deploy the CML and CDW on Openshift which is mainly compute cluster. The storage like HDFS, Hive, Kudu will remain in the Base(data center) servers … Continue reading Cloudera Hadoop Install Notes
Environment: MicroStrategy Desktop 10.11 Cloudera CDH 5.12 Impala 2.x Steps to connect MicroStrategy Destop to Cloudera Impala: Best thing about MicroStrategy Desktop unlike Tableau Desktop is it is free to download and use and a powerful BI visualization/query tool. Tableau Public Desktop is free but it only has few connectors and cannot connect to Hadoop … Continue reading MicroStrategy Desktop connect to Impala
The following steps are used to install Cloudera Search which is based on Apache Solr. Environment: Cloudera CDH 5.12.x solr-spec 4.10.3 Deploying Cloudera Search Cloudera Search (powered by Apache Solr) is included in CDH 5. If you have installed CDH 5.0 or higher, you do not need to perform any additional actions to install Search. … Continue reading Cloudera Search (Solr) install steps
Below are some reference architectures for Hadoop: https://www.cloudera.com/documentation/other/reference-architecture.html http://en.community.dell.com/techcenter/ready_solutions/data_analytics/ Cluster Hosts and Role Assignments https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_host_allocations.html CDH 5 and Cloudera Manager 5 Install/Upgrade/Requirements and Supported Versions https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#concept_xdm_rgj_j1b https://www.cloudera.com/documentation/enterprise/latest/PDF/cloudera-installation.pdf High Availability support https://www.cloudera.com/documentation/enterprise/latest/topics/admin_ha.html HDFS Directory Structure recommendation (Eric Sammer): /data : Contains canonical, raw data sets ingested from other systems. Read only to users. /user/<username> : … Continue reading Hadoop architecture notes
Below are the steps to install Kafka parcel in Cloudera manager. Cloudera Distribution of Apache Kafka Requirements and Supported Versions: Cloudera Kafka 2.2.x lowest supported Cloudera manager version 5.9.x, CDH 5.9.x and higher . General Information Regarding Installation and Upgrade These are the official instructions: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_installing.html#concept_jms_yb1_v5 Cloudera recommends that you deploy Kafka on dedicated hosts … Continue reading Kafka install on Cloudera Hadoop
Environment Cloudera CDH 5.12.x running Livy and Spark (see other blog on this website to install Livy) Anaconda parcel installed using Cloudera Manager (see other blog on this website to install Anaconda parcel on CDH) Non-Kerberos cluster. Kerberos based Hadoop cluster needs different setup and these instructions wont work. We will first install Anaconda and … Continue reading Install Jupyter notebook with Livy for Spark on Cloudera Hadoop
This blog will show how to install Anaconda parcel in CDH to enable Pandas and other python libraries on Hue pySpark notebook. http://docs.anaconda.com/anaconda/user-guide/tasks/integration/cloudera/ There are two methods of using Anaconda on an existing cluster with Cloudera CDH, Cloudera’s distribution including Apache Hadoop: Use the Anaconda parcel for Cloudera CDH. The following procedure describes how to install … Continue reading Install Anaconda Python package on Cloudera CDH.
This blog will show simple steps to install and configure Hue Spark notebook to run interactive pySpark scripts using Livy. Environment used: CDH 5.12.x , Cloudera Manager, Hue 4.0, Livy 0.3.0, Spark 1.6.0 on RHEL linux. Sentry was installed in unsecure mode. Kerberos was not used in the Hadoop cluster. Kerberos will need additional steps … Continue reading Install Hue Spark Notebook with Livy on Cloudera