Kerberos setup in Cloudera Hadoop

Reference: http://blog.cloudera.com/blog/2015/03/how-to-quickly-configure-kerberos-for-your-apache-hadoop-cluster/ Cloudera Security manual .pdf – CDH 5.15 on Cloudera Documentation website Environment: Cloudera CDH 5.15 on Centos 7 MIT KDC Kerberos   Setting up Kerberos in Cloudera CDH is somewhat tricky. The above blog is a good step by step way to setup. Also refer to the official Cloudera Security .pdf document on … Continue reading Kerberos setup in Cloudera Hadoop

Advertisements

MicroStrategy Desktop connect to Impala

Environment: MicroStrategy Desktop 10.11 Cloudera CDH 5.12 Impala 2.x Steps to connect MicroStrategy Destop to Cloudera Impala: Best thing about MicroStrategy Desktop unlike Tableau Desktop is it is free to download and use and a powerful BI visualization/query tool. Tableau Public Desktop is free but it only has few connectors and cannot connect to Hadoop … Continue reading MicroStrategy Desktop connect to Impala

ESRI-GIS Tools for Hadoop

The ESRI GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data. References: https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive   Aggregation Sample for Hive: point-in-polygon-aggregation-hive The following steps are taken from the above reference. Step-1: Make a folder anywhere in your local server where hive is installed: $mkdir /tmp/esri-git Step-2: Bring down the git repository: … Continue reading ESRI-GIS Tools for Hadoop

Using Streamsets for ETL to/from Hadoop

[blog in progress - incomplete] This blog will show some examples of doing ETL to or from Hadoop. USE CASE #1: Use Sqoop commands inside Streamsets to copy data to Hadoop from RDBMS https://www.youtube.com/watch?v=k8VbTR77l8M     REFERENCES: https://streamsets.com/tutorials/ https://github.com/streamsets/tutorials http://www.treselle.com/blog/import-and-ingest-data-into-hdfs-using-kafka-in-streamsets/ https://streamsets.com/blog/transform-data-streamsets-data-collector/ https://streamsets.com/blog/blogreplicating-relational-databases-with-streamsets-data-collector/ https://github.com/streamsets/tutorials/blob/master/tutorial-hivedrift/readme.md http://blog.cloudera.com/blog/2016/02/how-to-build-a-real-time-search-system-using-streamsets-apache-kafka-and-cloudera-search/ https://www.youtube.com/watch?v=Gnvl30OJNao https://www.youtube.com/watch?v=qAyFvC4c2n4  

Streamsets install Oracle JDBC driver in External Library

This blog will show how to install the Oracle JDBC driver to the Streamsets External Library.  Environment: Cloudera CDH 5.12, Streamsets 3.1.2 TASK: Update the Oracle JDBC driver inside Streamsets https://streamsets.com/documentation/datacollector/latest/help/#datacollector/UserGuide/Configuration/ExternalLibs.html#concept_pdv_qlw_ft Step 1. Set Up an External Directory  Setting Up for Cloudera Manager In Cloudera Manager, select the StreamSets service and then click Configuration. On the Configuration page, in the Data … Continue reading Streamsets install Oracle JDBC driver in External Library

Upgrade JDK1.7 to JDK1.8 in Cloudera CDH

Environment: Cloudera CDH 5.12 on RHEL Requirements Install one of the CDH and Cloudera Manager Supported JDK Versions. Install the same version of the Oracle JDK on each host. Install the JDK in /usr/java/jdk-version All nodes must run the same JDK version. Cloudera only supports 64bit JDK from Oracle. Upgrading to Oracle JDK 1.8 in a Cloudera Manager Deployment … Continue reading Upgrade JDK1.7 to JDK1.8 in Cloudera CDH

Streamsets install using Cloudera Manager

Environment: CDH 5.12 STREAMSETS-3.1.2.0.jar Follow the install instructions in the link below: https://streamsets.com/documentation/datacollector/latest/help/index.html#datacollector/UserGuide/Installation/CMInstall-Overview.html#concept_nb5_c3m_25 Installation with Cloudera Manager To install Data Collector through Cloudera Manager, perform the following steps: Install the StreamSets custom service descriptor (CSD). (Optional.) Manually install the parcel and checksum files. Typically only needed when the Cloudera Manager Server does not have internet access. Download, … Continue reading Streamsets install using Cloudera Manager