Useful SQL query examples

Example SQL queries which may be helpful: This works in IMPALA SQL to convert a unix epoch time to 30min intervals time for example time 19:15, 19:25 will show as 19:00 and 19:31, 19:50 will show as 19:30 etc. SELECT from_timestamp (cast((epochtime div 1800000)*1800 as timestamp) + interval (epochtime % 1000) milliseconds, 'yyyy-MM-dd-HH:mm') as timeat30mininterval, … Continue reading Useful SQL query examples

Connect DBeaver SQL Tool to Cloudera Hive/Impala with Kerberos

DBeaver https://dbeaver.io/ is a a powerful free opensource SQL editor tool than can connect to 80+ different databases. The below procedures will enable DBeaver to connect to Cloudera Hive/Impala using kerberos. Initially tried to use the Cloudera JDBC connection but it kept giving kerberos error: [Cloudera]ImpalaJDBCDriver Error initialized or created transport for authentication: [Cloudera]ImpalaJDBCDriver Unable … Continue reading Connect DBeaver SQL Tool to Cloudera Hive/Impala with Kerberos

Use Beeline to query Hive table

Example how to query with beeline: Login with your userid on the linux server: [userxyz]$ beeline beeline> !connect jdbc:hive2://hive-server-hostname:10000/default;principal=hive/_HOST@XYZREALM.COM Error: Could not open client transport with JDBC Uri: : GSS initiate failed (state=08S01,code=0) This error is due to kinit not done. So do $ kinit userxyz beeline> !connect jdbc:hive2://hive-server-hostname:10000/default;principal=hive/_HOST@XYZREALM.COM Connected to: Apache Hive (version 1.1.0-cdh5.16.1) … Continue reading Use Beeline to query Hive table

Transfer parquet Hive table from one Hadoop cluster to another

EXAMPLE: HOW TO TRANSFER PARQUET HIVE TABLE FROM ONE CLUSTER TO ANOTHER CLUSTER First create a new table as CTAS to combine multiple hive table parquet files to a single parquet file for ease of transfer from one cluster to another. In Source cluster create a new table: CREATE TABLE default.mynewtable stored as PARQUET AS … Continue reading Transfer parquet Hive table from one Hadoop cluster to another

Connect Excel to Cloudera Hive/Impala

Below procedure will help you connect Microsoft Excel to Cloudera Impala or Hive using ODBC driver. First download and install the MIT Kerberos Client for windows from Kerberos for Windows Release 4.1 - current release Make sure you get the Kerberos userid/password from the Cloudera Administrator and your are able to login and get a … Continue reading Connect Excel to Cloudera Hive/Impala

Tableau Server connect to Cloudera Hive with MIT Kerberos

We know Tableau Desktop works with MIT Kerberos on Windows to connect to Cloudera Hive/Impala. But there is some confusing information in Tableau support sites whether Tableau SERVER can work with MIT Kerberos in an Windows environment. There is a note that Kerberos delegation requires Active Directory and MIT Kerberos is not supported. But let … Continue reading Tableau Server connect to Cloudera Hive with MIT Kerberos

Run any ad-hoc SQL query in Power BI desktop

It is not documented clearly how to run any arbitrary SQL query in Power BI desktop. It is definitely possible to easily run any SQL query as below: First click on Edit Queries in the top ribbon and then go to Advanced Editor and type in the SQL query as given in the picture below. … Continue reading Run any ad-hoc SQL query in Power BI desktop

Connect Microsoft Power BI desktop to Cloudera Impala or Hive with Kerberos

Microsoft Power BI desktop is free and is able to successfully connect to a Cloudera Impala or hive database with Kerberos security enabled. The below blog only shows Impala driver but you can use same procedure with Hive driver also. The basic steps are: Install the MIT Kerberos client for Windows and make sure you … Continue reading Connect Microsoft Power BI desktop to Cloudera Impala or Hive with Kerberos

Use Pandas in Jupyter PySpark3 kernel to query Hive table

Following python code will read a Hive table and convert to Pandas dataframe so you can use Pandas to process the rows. NOTE: Be careful when copy/paste the below code the double quotes need to be retyped as they get changed and gives syntax error. -------------------------------------------------------------------------------------------------------------- import pandas as pd from pyspark import SparkConf, SparkContext … Continue reading Use Pandas in Jupyter PySpark3 kernel to query Hive table

Cloudera Hadoop Data Encryption at rest Notes

In Cloudera Hadoop there are few components that are used to implemented Data Encryption at rest: The Key Management Server (KMS) uses the Key Trustee Server as the enderlying keystore instead of the file-based Java KeyStore(JKS) used by the default Hadoop KMS. Cloudera Navigator Key Trustee Server is the actual keystore for the encryption keys … Continue reading Cloudera Hadoop Data Encryption at rest Notes