Install Python pip on RHEL/Centos 7

When you initially try to install pip you may get an error as below: # python --version Python 2.7.5 [root]# yum install python-pip No package python-pip available. Error: Nothing to do We need to install epel-release first: [root] # yum install epel-release Installed: epel-release.noarch 0:7-11 Next install pip: [root]# yum install python-pip Installed: python2-pip.noarch 0:8.1.2-10.el7 … Continue reading Install Python pip on RHEL/Centos 7

Errors and debugging

In jython if you get a syntx error like below check if you used a capital letter like "If" instead of "if" in your code: javax.script.ScriptException: SyntaxError: no viable alternative at input 'xyz' in <script> at line number 116 at column number 15

Useful SQL query examples

Example SQL queries which may be helpful: This works in IMPALA SQL to convert a unix epoch time to 30min intervals time for example time 19:15, 19:25 will show as 19:00 and 19:31, 19:50 will show as 19:30 etc. SELECT from_timestamp (cast((epochtime div 1800000)*1800 as timestamp) + interval (epochtime % 1000) milliseconds, 'yyyy-MM-dd-HH:mm') as timeat30mininterval, … Continue reading Useful SQL query examples

Streamsets renew JWT token to call api

Many JWT tokens expire hourly and need to be renewed to pass in an api call. Streamsets auto renewal of JWT tokens may not work so here is another way to renew JWT tokens. STEPS: PIPELINE-1: A continuously running separate pipeline will periodically renew the JWT token and store in a text file PIPELINE-2: The … Continue reading Streamsets renew JWT token to call api

Connect DBeaver SQL Tool to Cloudera Hive/Impala with Kerberos

DBeaver https://dbeaver.io/ is a a powerful free opensource SQL editor tool than can connect to 80+ different databases. The below procedures will enable DBeaver to connect to Cloudera Hive/Impala using kerberos. Initially tried to use the Cloudera JDBC connection but it kept giving kerberos error: [Cloudera]ImpalaJDBCDriver Error initialized or created transport for authentication: [Cloudera]ImpalaJDBCDriver Unable … Continue reading Connect DBeaver SQL Tool to Cloudera Hive/Impala with Kerberos

Connect Excel to Cloudera Hive/Impala

Below procedure will help you connect Microsoft Excel to Cloudera Impala or Hive using ODBC driver. First download and install the MIT Kerberos Client for windows from Kerberos for Windows Release 4.1 - current release Make sure you get the Kerberos userid/password from the Cloudera Administrator and your are able to login and get a … Continue reading Connect Excel to Cloudera Hive/Impala

Run any ad-hoc SQL query in Power BI desktop

It is not documented clearly how to run any arbitrary SQL query in Power BI desktop. It is definitely possible to easily run any SQL query as below: First click on Edit Queries in the top ribbon and then go to Advanced Editor and type in the SQL query as given in the picture below. … Continue reading Run any ad-hoc SQL query in Power BI desktop

Connect Microsoft Power BI desktop to Cloudera Impala or Hive with Kerberos

Microsoft Power BI desktop is free and is able to successfully connect to a Cloudera Impala or hive database with Kerberos security enabled. The below blog only shows Impala driver but you can use same procedure with Hive driver also. The basic steps are: Install the MIT Kerberos client for Windows and make sure you … Continue reading Connect Microsoft Power BI desktop to Cloudera Impala or Hive with Kerberos

Anaconda Python notes

Some notes on Anaconda python package manager: Reference: https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c Conda is the main installer for the Anaconda packagesConda can be used to create multiple environments with different python or other package versions.The Anaconda packages are installed under /<some path>/Anaconda3/pkgs and other sub-directoriesInside a new Conda installation, the root environment is activated by default, so you … Continue reading Anaconda Python notes

Use Pandas in Jupyter PySpark3 kernel to query Hive table

Following python code will read a Hive table and convert to Pandas dataframe so you can use Pandas to process the rows. NOTE: Be careful when copy/paste the below code the double quotes need to be retyped as they get changed and gives syntax error. -------------------------------------------------------------------------------------------------------------- import pandas as pd from pyspark import SparkConf, SparkContext … Continue reading Use Pandas in Jupyter PySpark3 kernel to query Hive table