Following python code will read a Hive table and convert to Pandas dataframe so you can use Pandas to process the rows. NOTE: Be careful when copy/paste the below code the double quotes need to be retyped as they get changed and gives syntax error. -------------------------------------------------------------------------------------------------------------- import pandas as pd from pyspark import SparkConf, SparkContext … Continue reading Use Pandas in Jupyter PySpark3 kernel to query Hive table
Reference: http://website4everything.blogspot.com/2015/04/connecting-tableau-to-hive-server-2.html The basic steps to connect Tableau to Cloudera Hive or Impala with Kerberos authentication involves the following steps: Download and Install the MIT Kerberos Client for Window Set the C:\ProgramData\MIT\Kerberos5\krb5.ini with the Kerberos realm and server details (Optional) KRB5CCNAME system environment variable may need to be set at times to a temporary value: FILE:C:\temp\kerberos\krb5cache … Continue reading Tableau Desktop connect to Cloudera Hadoop using Kerberos
Environment: MicroStrategy Desktop 10.11 Cloudera CDH 5.12 Impala 2.x Steps to connect MicroStrategy Destop to Cloudera Impala: Best thing about MicroStrategy Desktop unlike Tableau Desktop is it is free to download and use and a powerful BI visualization/query tool. Tableau Public Desktop is free but it only has few connectors and cannot connect to Hadoop … Continue reading MicroStrategy Desktop connect to Impala
The ESRI GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data. References: https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive Aggregation Sample for Hive: point-in-polygon-aggregation-hive The following steps are taken from the above reference. Step-1: Make a folder anywhere in your local server where hive is installed: $mkdir /tmp/esri-git Step-2: Bring down the git repository: … Continue reading ESRI-GIS Tools for Hadoop
Talend Open Studio for Big Data is a powerful ETL tool which is also open source. You can download and use it to do ETL to and from Hadoop including both HDFS and Hive. Talend Install steps Downloaded the free Talend Open Studio for Big Data from https://www.talend.com/products/big-data/big-data-open-studio/ The download file location is set to c:\temp … Continue reading Use Talend Open Studio for Big Data to ETL to Hadoop
[CAUTION: Currently the ES-Hadoop jars are giving errors with Cloudera CDH and Hue throwing errors saying multiple jars found and so the below process is not working. Use these instructions at your own risk as they may not work and so far not able to get a solution yet.] Environment: Cloudera CDH 5.12.x elasticsearch-hadoop-6.2.1 … Continue reading Connect ElasticSearch to Cloudera Hadoop using ES-Hadoop.
Free or Opensource BI / ETL tools: Talend = ETL tool, leader in Gartner Magic Quadrant Streamsets = ETL tool Apache Nifi = ETL tool Pentaho = desktop and server version BI/ETL tool HUE = Hadoop Analytics server, BI, Query tool KNIME = Data Science leader in Gartner Magic Quadrant 2017 desktop version Jupyter Notebook … Continue reading Business Intelligence, ETL and Data Science tools
Environment: Oracle SQL Developer Version 18.1.0.095 on Windows 64bit and Hive on Cloudera Hadoop CDH 5.12.x. Config steps: First download and install the popular free tool Oracle SQL Developer for Windows from Oracle website. Read this blog for a good idea about connecting Oracle SQL Developer to Hadoop Hive: https://blogs.oracle.com/bigdataconnectors/move-data-between-apache-hadoop-and-oracle-database-with-sql-developer Note when configuring Cloudera-Hive JDBC drivers … Continue reading Query Cloudera Hadoop Hive using Oracle SQL Developer.