Use pyodbc with Cloudera Impala ODBC and Kerberos

Initially tried the python impyla package to connect to Cloudera Impala but ran into various errors and dependency issues. Also 2 of 3 queries would hang or give errors. So next tried pyodbc to connect to Impala. Linux System Requirements: The Cloudera ODBC Driver for Impala is recommended for Impala versions 2.8 through 3.3, and … Continue reading Use pyodbc with Cloudera Impala ODBC and Kerberos

Python code examples

JSON PARSING EXAMPLE-1: import┬ájson # some JSON: x = '{  "name": "John", "devlist": [ {"name":"kiny", "age":30,"city":"New York"}]}' # parse x: y = json.loads(x) # the result is a Python dictionary: print(y["name"]) print(y["devlist"][0]["name"]) -----------------------RESULT----------------------------- john kiny JSON PARSING EXAMPLE-2: import json # some JSON: loradevice    = \ '[{"devtype":"dlab","deviceeui": ["eui1", "eui2"]}, \   {"devtype":"adenu","deviceeui": ["eui3", "eui4"]}]' # … Continue reading Python code examples

Use Windows VScode to edit Linux files.

VScode is one of the best code editors with lots of add-on packages. Although you can install VScode in linux and use X windows to edit in linux, most people run VScode in Windows and would like to edit Linux files from Windows within VScode. There is a simple way to do this using the … Continue reading Use Windows VScode to edit Linux files.

Install Python pip on RHEL/Centos 7

When you initially try to install pip you may get an error as below: # python --version Python 2.7.5 [root]# yum install python-pip No package python-pip available. Error: Nothing to do We need to install epel-release first: [root] # yum install epel-release Installed: epel-release.noarch 0:7-11 Next install pip: [root]# yum install python-pip Installed: python2-pip.noarch 0:8.1.2-10.el7 … Continue reading Install Python pip on RHEL/Centos 7

Streamsets renew JWT token to call api

Many JWT tokens expire hourly and need to be renewed to pass in an api call. Streamsets auto renewal of JWT tokens may not work so here is another way to renew JWT tokens. STEPS: PIPELINE-1: A continuously running separate pipeline will periodically renew the JWT token and store in a text file PIPELINE-2: The … Continue reading Streamsets renew JWT token to call api

Anaconda Python notes

Some notes on Anaconda python package manager: Reference: https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c Conda is the main installer for the Anaconda packagesConda can be used to create multiple environments with different python or other package versions.The Anaconda packages are installed under /<some path>/Anaconda3/pkgs and other sub-directoriesInside a new Conda installation, the root environment is activated by default, so you … Continue reading Anaconda Python notes

Use Pandas in Jupyter PySpark3 kernel to query Hive table

Following python code will read a Hive table and convert to Pandas dataframe so you can use Pandas to process the rows. NOTE: Be careful when copy/paste the below code the double quotes need to be retyped as they get changed and gives syntax error. -------------------------------------------------------------------------------------------------------------- import pandas as pd from pyspark import SparkConf, SparkContext … Continue reading Use Pandas in Jupyter PySpark3 kernel to query Hive table

Run a Python program to access Hadoop webhdfs and Hive with Kerberos enabled

Following python code makes REST calls to a secure Kerberos enabled Hadoop cluster to use webhdfs REST api to get file data: You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user.Make sure the python modules requests and requests_kerberos have been installed. Otherwise install it for example: … Continue reading Run a Python program to access Hadoop webhdfs and Hive with Kerberos enabled