Cloudera Search (Solr) install steps

The following steps are used to install Cloudera Search which is based on Apache Solr. Environment: Cloudera CDH 5.12.x solr-spec 4.10.3 Deploying Cloudera Search Cloudera Search (powered by Apache Solr) is included in CDH 5. If you have installed CDH 5.0 or higher, you do not need to perform any additional actions to install Search. … Continue reading Cloudera Search (Solr) install steps

Advertisements

Hadoop architecture notes

Below are some reference architectures for Hadoop: https://www.cloudera.com/documentation/other/reference-architecture.html http://en.community.dell.com/techcenter/ready_solutions/data_analytics/ Cluster Hosts and Role Assignments https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_host_allocations.html CDH 5 and Cloudera Manager 5 Install/Upgrade/Requirements and Supported Versions https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#concept_xdm_rgj_j1b https://www.cloudera.com/documentation/enterprise/latest/PDF/cloudera-installation.pdf   High Availability support https://www.cloudera.com/documentation/enterprise/latest/topics/admin_ha.html HDFS Directory Structure recommendation (Eric Sammer): /data : Contains canonical, raw data sets ingested from other systems. Read only to users. /user/<username> : … Continue reading Hadoop architecture notes

Kafka install on Cloudera Hadoop

Below are the steps to install Kafka parcel in Cloudera manager. Cloudera Distribution of Apache Kafka Requirements and Supported Versions: Cloudera Kafka 2.2.x lowest supported Cloudera manager version 5.9.x, CDH 5.9.x and higher . General Information Regarding Installation and Upgrade These are the official instructions: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_installing.html#concept_jms_yb1_v5 Cloudera recommends that you deploy Kafka on dedicated hosts … Continue reading Kafka install on Cloudera Hadoop

Business Intelligence, ETL and Data Science tools

Free or Opensource BI / ETL tools: Talend = ETL tool, leader in Gartner Magic Quadrant Streamsets = ETL tool Apache Nifi = ETL tool Pentaho = desktop and server version BI/ETL tool HUE = Hadoop Analytics server, BI, Query tool KNIME = Data Science leader in Gartner Magic Quadrant 2017 desktop version Jupyter Notebook … Continue reading Business Intelligence, ETL and Data Science tools

Ansible script examples

EXAMPLE: Restart NTP daemon in Linux First create the playbook pbntprestart.yml as below and run using this command. The cdhservers is where you define all your hostnames. $ ansible-playbook pbntprestart.yml -u root -k The content of pbntprestart.yml  is below: $ cat pbntprestart.yml --- - hosts: cdhservers tasks: - name: NTPRESTART shell: service ntpd restart register: out1 … Continue reading Ansible script examples