Use Pandas in Jupyter PySpark3 kernel to query Hive table

Following python code will read a Hive table and convert to Pandas dataframe so you can use Pandas to process the rows. NOTE: Be careful when copy/paste the below code the double quotes need to be retyped as they get changed and gives syntax error. -------------------------------------------------------------------------------------------------------------- import pandas as pd from pyspark import SparkConf, SparkContext … Continue reading Use Pandas in Jupyter PySpark3 kernel to query Hive table

Advertisements

Tableau Desktop connect to Cloudera Hadoop using Kerberos

Reference: http://website4everything.blogspot.com/2015/04/connecting-tableau-to-hive-server-2.html The basic steps to connect Tableau to Cloudera Hive or Impala with Kerberos authentication involves the following steps: Download and Install the MIT Kerberos Client for Window Set the C:\ProgramData\MIT\Kerberos5\krb5.ini with  the Kerberos realm and server details (Optional) KRB5CCNAME system environment variable may need to be set at times to a temporary value: FILE:C:\temp\kerberos\krb5cache … Continue reading Tableau Desktop connect to Cloudera Hadoop using Kerberos

Business Intelligence, ETL and Data Science tools

Free or Opensource BI / ETL tools: Talend = ETL tool, leader in Gartner Magic Quadrant Streamsets = ETL tool Apache Nifi = ETL tool Pentaho = desktop and server version BI/ETL tool HUE = Hadoop Analytics server, BI, Query tool KNIME = Data Science leader in Gartner Magic Quadrant 2017 desktop version Jupyter Notebook … Continue reading Business Intelligence, ETL and Data Science tools