The following steps are used to install Cloudera Search which is based on Apache Solr. Environment: Cloudera CDH 5.12.x solr-spec 4.10.3 Deploying Cloudera Search Cloudera Search (powered by Apache Solr) is included in CDH 5. If you have installed CDH 5.0 or higher, you do not need to perform any additional actions to install Search. … Continue reading Cloudera Search (Solr) install steps
Below are some reference architectures for Hadoop: https://www.cloudera.com/documentation/other/reference-architecture.html http://en.community.dell.com/techcenter/ready_solutions/data_analytics/ Cluster Hosts and Role Assignments https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_host_allocations.html CDH 5 and Cloudera Manager 5 Requirements and Supported Versions https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#concept_xdm_rgj_j1b High Availability support https://www.cloudera.com/documentation/enterprise/latest/topics/admin_ha.html HDFS Directory Structure recommendation (Eric Sammer): /data : Contains canonical, raw data sets ingested from other systems. Read only to users. /user/<username> : Home … Continue reading Hadoop architecture notes
Below are the steps to install Kafka parcel in Cloudera manager. Cloudera Distribution of Apache Kafka Requirements and Supported Versions: Cloudera Kafka 2.2.x lowest supported Cloudera manager version 5.9.x, CDH 5.9.x and higher . General Information Regarding Installation and Upgrade These are the official instructions: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_installing.html#concept_jms_yb1_v5 Cloudera recommends that you deploy Kafka on dedicated hosts … Continue reading Kafka install on Cloudera Hadoop
Environment Cloudera CDH 5.12.x running Livy and Spark (see other blog on this website to install Livy) Anaconda parcel installed using Cloudera Manager (see other blog on this website to install Anaconda parcel on CDH) We will first install Anaconda and Sparkmagic on Windows 10 to install Jupyter Notebook using Anaconda. We strongly recommend installing Python and … Continue reading Install Jupyter notebook with Livy for Spark on Cloudera Hadoop
This blog will show how to install Anaconda parcel in CDH to enable Pandas and other python libraries on Hue pySpark notebook. Install Steps: Installing the Anaconda Parcel 1.From the Cloudera Manager Admin Console, click the “Parcels” indicator in the top navigation bar. 2.Click the “Configuration” button on the top right of the Parcels … Continue reading Install Anaconda Python package on Cloudera CDH.
This blog will show simple steps to install and configure Hue Spark notebook to run interactive pySpark scripts using Livy. Environment used: CDH 5.12.x , Cloudera Manager, Hue 4.0, Livy 0.3.0, Spark 1.6.0 on RHEL linux. Sentry was installed in unsecure mode. NOTE: Make sure the user who logs into Hue has access to Hive … Continue reading Install Hue Spark Notebook with Livy on Cloudera
/root>sudo -u hdfs hdfs dfs -df -h Filesystem Size Used Available Use% hdfs://xyz.com:8020 957.6 G 74.4 G 770.2 G 8% /root>sudo -u hdfs hdfs dfs -du -h / 0 0 /system 4.2 G 12.7 G … Continue reading Check Hadoop space usage
First download and install the popular free tool Oracle SQL Developer for Windows from Oracle website. Read this blog for a good idea about connecting Oracle SQL Developer to Hadoop Hive: https://blogs.oracle.com/bigdataconnectors/move-data-between-apache-hadoop-and-oracle-database-with-sql-developer Note when configuring Cloudera-Hive JDBC drivers use the below website to download the 64bit JDBC driver for windows. https://www.cloudera.com/downloads/connectors/hive/jdbc/2-5-19.html In the Tools->Preferences->Database->Third party … Continue reading Query Cloudera Hadoop Hive using Oracle SQL Developer.