Tableau Server connect to Cloudera Hive with MIT Kerberos

We know Tableau Desktop works with MIT Kerberos on Windows to connect to Cloudera Hive/Impala. But there is some confusing information in Tableau support sites whether Tableau SERVER can work with MIT Kerberos in an Windows environment. There is a note that Kerberos delegation requires Active Directory and MIT Kerberos is not supported. But let … Continue reading Tableau Server connect to Cloudera Hive with MIT Kerberos

Advertisements

Run any ad-hoc SQL query in Power BI desktop

It is not documented clearly how to run any arbitrary SQL query in Power BI desktop. It is definitely possible to easily run any SQL query as below: First click on Edit Queries in the top ribbon and then go to Advanced Editor and type in the SQL query as given in the picture below. … Continue reading Run any ad-hoc SQL query in Power BI desktop

Connect Microsoft Power BI desktop to Cloudera Impala or Hive with Kerberos

Microsoft Power BI desktop is free and is able to successfully connect to a Cloudera Impala or hive database with Kerberos security enabled. The below blog only shows Impala driver but you can use same procedure with Hive driver also. The basic steps are: Install the MIT Kerberos client for Windows and make sure you … Continue reading Connect Microsoft Power BI desktop to Cloudera Impala or Hive with Kerberos

Use Pandas in Jupyter PySpark3 kernel to query Hive table

Following python code will read a Hive table and convert to Pandas dataframe so you can use Pandas to process the rows. NOTE: Be careful when copy/paste the below code the double quotes need to be retyped as they get changed and gives syntax error. -------------------------------------------------------------------------------------------------------------- import pandas as pd from pyspark import SparkConf, SparkContext … Continue reading Use Pandas in Jupyter PySpark3 kernel to query Hive table

Tableau Desktop connect to Cloudera Hadoop using Kerberos

Reference: http://website4everything.blogspot.com/2015/04/connecting-tableau-to-hive-server-2.html The basic steps to connect Tableau to Cloudera Hive or Impala with Kerberos authentication involves the following steps: Download and Install the MIT Kerberos Client for WindowSet the C:\ProgramData\MIT\Kerberos5\krb5.ini with  the Kerberos realm and server details(Optional) KRB5CCNAME system environment variable may need to be set at times to a temporary value: FILE:C:\temp\kerberos\krb5cacheStart the MIT … Continue reading Tableau Desktop connect to Cloudera Hadoop using Kerberos

Run a Python program to access Hadoop webhdfs and Hive with Kerberos enabled

Following python code makes REST calls to a secure Kerberos enabled Hadoop cluster to use webhdfs REST api to get file data: You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user.Make sure the python modules requests and requests_kerberos have been installed. Otherwise install it for example: … Continue reading Run a Python program to access Hadoop webhdfs and Hive with Kerberos enabled

MicroStrategy Desktop connect to Impala

Environment: MicroStrategy Desktop 10.11 Cloudera CDH 5.12 Impala 2.x Steps to connect MicroStrategy Destop to Cloudera Impala: Best thing about MicroStrategy Desktop unlike Tableau Desktop is it is free to download and use and a powerful BI visualization/query tool. Tableau Public Desktop is free but it only has few connectors and cannot connect to Hadoop … Continue reading MicroStrategy Desktop connect to Impala

ESRI-GIS Tools for Hadoop

The ESRI GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data. References: https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive   Aggregation Sample for Hive: point-in-polygon-aggregation-hive The following steps are taken from the above reference. Step-1: Make a folder anywhere in your local server where hive is installed: $mkdir /tmp/esri-git Step-2: Bring down the git repository: … Continue reading ESRI-GIS Tools for Hadoop

Use Talend Open Studio for Big Data to ETL to Hadoop

Talend Open Studio for Big Data is a powerful ETL tool which is also open source. You can download and use it to do ETL to and from Hadoop including both HDFS and Hive. Talend Install steps Downloaded the free Talend Open Studio for Big Data from https://www.talend.com/products/big-data/big-data-open-studio/ The download file location is set to c:\temp … Continue reading Use Talend Open Studio for Big Data to ETL to Hadoop

Connect ElasticSearch to Cloudera Hadoop using ES-Hadoop.

[CAUTION: Currently the ES-Hadoop jars are giving errors with Cloudera CDH and Hue throwing errors saying multiple jars found and so the below process is not working. Use these instructions at your own risk as they may not work and so far not able to get a solution yet.] Environment: Cloudera CDH 5.12.x elasticsearch-hadoop-6.2.1   … Continue reading Connect ElasticSearch to Cloudera Hadoop using ES-Hadoop.