Install Kudu in Cloudera CDH 5.16

Reference: Apache Kudu is a relational database in the Hadoop ecosystem which provides CRUD update/delete capabilities in Impala tables. It stores data outside of hdfs in tablet files in the hadoop datanodes. It is useful for fast IOT data storage and querying as soon as data is inserted into the table unlike HDFS hive … Continue reading Install Kudu in Cloudera CDH 5.16

Use pyodbc with Cloudera Impala ODBC and Kerberos

Initially tried the python impyla package to connect to Cloudera Impala but ran into various errors and dependency issues. Also 2 of 3 queries would hang or give errors. So next tried pyodbc to connect to Impala. Linux System Requirements: The Cloudera ODBC Driver for Impala is recommended for Impala versions 2.8 through 3.3, and … Continue reading Use pyodbc with Cloudera Impala ODBC and Kerberos

Python code examples

JSON PARSING EXAMPLE-1: import json # some JSON: x = '{  "name": "John", "devlist": [ {"name":"kiny", "age":30,"city":"New York"}]}' # parse x: y = json.loads(x) # the result is a Python dictionary: print(y["name"]) print(y["devlist"][0]["name"]) -----------------------RESULT----------------------------- john kiny JSON PARSING EXAMPLE-2: import json # some JSON: loradevice    = \ '[{"devtype":"dlab","deviceeui": ["eui1", "eui2"]}, \   {"devtype":"adenu","deviceeui": ["eui3", "eui4"]}]' # … Continue reading Python code examples

Use Windows VScode to edit Linux files.

VScode is one of the best code editors with lots of add-on packages. Although you can install VScode in linux and use X windows to edit in linux, most people run VScode in Windows and would like to edit Linux files from Windows within VScode. There is a simple way to do this using the … Continue reading Use Windows VScode to edit Linux files.

Install Python pip on RHEL/Centos 7

When you initially try to install pip you may get an error as below: # python --version Python 2.7.5 [root]# yum install python-pip No package python-pip available. Error: Nothing to do We need to install epel-release first: [root] # yum install epel-release Installed: epel-release.noarch 0:7-11 Next install pip: [root]# yum install python-pip Installed: python2-pip.noarch 0:8.1.2-10.el7 … Continue reading Install Python pip on RHEL/Centos 7

Useful SQL query examples

Example SQL queries which may be helpful: This works in IMPALA SQL to convert a unix epoch time to 30min intervals time for example time 19:15, 19:25 will show as 19:00 and 19:31, 19:50 will show as 19:30 etc. SELECT from_timestamp (cast((epochtime div 1800000)*1800 as timestamp) + interval (epochtime % 1000) milliseconds, 'yyyy-MM-dd-HH:mm') as timeat30mininterval, … Continue reading Useful SQL query examples

Streamsets renew JWT token to call api

Many JWT tokens expire hourly and need to be renewed to pass in an api call. Streamsets auto renewal of JWT tokens may not work so here is another way to renew JWT tokens. To simulate any api call to see what it is sending to api server you can use the website: STEPS: … Continue reading Streamsets renew JWT token to call api

Connect DBeaver SQL Tool to Cloudera Hive/Impala with Kerberos

DBeaver is a a powerful free opensource SQL editor tool than can connect to 80+ different databases. The below procedures will enable DBeaver to connect to Cloudera Hive/Impala using kerberos. Initially tried to use the Cloudera JDBC connection but it kept giving kerberos error: [Cloudera]ImpalaJDBCDriver Error initialized or created transport for authentication: [Cloudera]ImpalaJDBCDriver Unable … Continue reading Connect DBeaver SQL Tool to Cloudera Hive/Impala with Kerberos

Connect Excel to Cloudera Hive/Impala

Below procedure will help you connect Microsoft Excel to Cloudera Impala or Hive using ODBC driver. First download and install the MIT Kerberos Client for windows from Kerberos for Windows Release 4.1 - current release Make sure you get the Kerberos userid/password from the Cloudera Administrator and your are able to login and get a … Continue reading Connect Excel to Cloudera Hive/Impala