Use Pandas in Jupyter PySpark3 kernel to query Hive table

Following python code will read a Hive table and convert to Pandas dataframe so you can use Pandas to process the rows. NOTE: Be careful when copy/paste the below code the double quotes need to be retyped as they get changed and gives syntax error. -------------------------------------------------------------------------------------------------------------- import pandas as pd from pyspark import SparkConf, SparkContext … Continue reading Use Pandas in Jupyter PySpark3 kernel to query Hive table

Advertisements

Cloudera Hadoop Data Encryption at rest Notes

In Cloudera Hadoop there are few components that are used to implemented Data Encryption at rest: The Key Management Server (KMS) uses the Key Trustee Server as the enderlying keystore instead of the file-based Java KeyStore(JKS) used by the default Hadoop KMS. Cloudera Navigator Key Trustee Server is the actual keystore for the encryption keys … Continue reading Cloudera Hadoop Data Encryption at rest Notes

Tableau Desktop connect to Cloudera Hadoop using Kerberos

Reference: http://website4everything.blogspot.com/2015/04/connecting-tableau-to-hive-server-2.html The basic steps to connect Tableau to Cloudera Hive or Impala with Kerberos authentication involves the following steps: Download and Install the MIT Kerberos Client for Window Set the C:\ProgramData\MIT\Kerberos5\krb5.ini with  the Kerberos realm and server details (Optional) KRB5CCNAME system environment variable may need to be set at times to a temporary value: FILE:C:\temp\kerberos\krb5cache … Continue reading Tableau Desktop connect to Cloudera Hadoop using Kerberos

Kerberos commands

Common Kerberos commands: 1.Change password of a principal(user) $ kadmin.local kadmin.local: cpw <principalname> Enter password for princal "principalname@REALM.COM": 2.initialize a kerberos ticket $ kinit <principalname> To get detailed verbose info use below options: $ KRB5_TRACE=/dev/stdout kinit -V 3. Destroy the current ticket: $ kdestroy 4. Check the status of Kerberos KDC $ systemctl status kadmin $ systemctl … Continue reading Kerberos commands

Access webhdfs using Kerberos from laptop client

The following blog shows how to access a kerberized hadoop cluster from a Chrome browser in laptop. https://community.hortonworks.com/articles/28537/user-authentication-from-windows-workstation-to-hd.html This will work mostly except change the below: 3. network.negotiate-auth.gsslib = C:\Program Files\MIT\Kerberos\bin\gssapi64.dll instead of the gssapi32.dll  since we mostly use 64-bit Firefox which doesnt work with the 32bit dll.    

Run a Python program to access Hadoop webhdfs with Kerberos enabled

Following python code makes REST calls to a secure Kerberos enabled Hadoop cluster to use webhdfs REST api to get file data:   You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user. Make sure the python modules requests and requests_kerberos have been installed. Otherwise install it … Continue reading Run a Python program to access Hadoop webhdfs with Kerberos enabled

Run a Java program in Hadoop with Kerberos enabled.

Following steps are needed to run a Java program in Hadoop with Kerberos security enabled: 1. Create a text file named FileCount.java and store it in your home directory such as /home/userid 2. Copy paste the below code into the file FileCount.java . Change the hadoop hostname from quickstart.cloudera:8020 to the correct host. import java.io.*; import … Continue reading Run a Java program in Hadoop with Kerberos enabled.

Kerberos, SPNEGO and WebHDFS on Hadoop using Chrome browser:

SPNEGO, and WebHDFS on Hadoop using Chrome browser: Reference: http://www.ghostar.org/2015/06/google-chrome-spnego-and-webhdfs-on-hadoop/   We want to see if the Chrome browser can be used to authenticate users with Kerberos and display Hadoop webhdfs REST api data. In the Cloudera Security .pdf manual follow these steps: Step 9: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles … Continue reading Kerberos, SPNEGO and WebHDFS on Hadoop using Chrome browser:

Kerberos setup in Cloudera Hadoop

Reference: http://blog.cloudera.com/blog/2015/03/how-to-quickly-configure-kerberos-for-your-apache-hadoop-cluster/ Cloudera Security manual .pdf – CDH 5.15 on Cloudera Documentation website http://www.ghostar.org/2015/06/google-chrome-spnego-and-webhdfs-on-hadoop/ https://www.youtube.com/watch?v=4TwU0LwDJAg   Environment: Cloudera CDH 5.15 on Centos 7 MIT KDC Kerberos   Setting up Kerberos in Cloudera CDH is somewhat tricky. The above blog is a good step by step way to setup. Also refer to the official Cloudera Security … Continue reading Kerberos setup in Cloudera Hadoop