Use pyodbc with Cloudera Impala ODBC and Kerberos

Initially tried the python impyla package to connect to Cloudera Impala but ran into various errors and dependency issues. Also 2 of 3 queries would hang or give errors. So next tried pyodbc to connect to Impala.

Linux System Requirements:

The Cloudera ODBC Driver for Impala is recommended for Impala versions 2.8 through 3.3, and CDH versions 5.11 through 5.16 and 6.0 through 6.3.

First we need to download from Cloudera website the Cloudera Impala ODBC driver for Linux 64bit. I used the latest RHEL 64bit version of driver. https://www.cloudera.com/downloads/connectors/impala/odbc/2-6-10.html

If you are using RHEL or CentOS, run the following command:

(base) [root@downloads]# yum –nogpgcheck localinstall ClouderaImpalaODBC-2.6.10.1010-1.x86_64.rpm

Installed: Complete!

The Cloudera ODBC Driver for Impala files are installed in the /opt/cloudera/impalaodbc directory.

Next Configure the ODBC Driver:

Creating a Data Source Name on a Non-Windows Machine. In /opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini add below:


[Driver]
ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/
LogLevel=0
LogPath=

Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
Host=your-impalahost-or-ipaddr
Port=21050
AuthMech=1
KrbRealm=YOUR.KERBEROS.REALM
KrbFQDN=your-impalahost-or-ipaddr-or-fqdn
KrbServiceName=impala

In /etc/odbc.ini add below lines:

[ODBC Data Sources]
your-odbc-dsn=Cloudera ODBC Driver for Impala 64-bit

[your-odbc-dsn]
Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

Next test the impala connection as below:

$ kinit your-kerberos-userid
# isql -v your-odbc-dsn

+---------------------------------------+
| Connected!                             |
|                                        |
| sql-statement                          |
| help [tablename]                       |
| quit                                   |
|                                        |
+---------------------------------------+
SQL>

This shows that the connection to impala is successful.

Now run a pyodbc script:

import pyodbc
conn = pyodbc.connect('DSN=your-odbc-dsn',autocommit=True)
 
crsr = conn.cursor()
crsr.execute('select * from mydb.mytable limit 5;')
print(crsr.fetchall())

Note: In python3 script we need to set

# autocommit=True , else we get an error (‘HYC00’, ‘[HYC00] [Cloudera][ODBC] (11470) Transactions are not supported.

conn = pyodbc.connect(cnxnstr, autocommit=True)

If you get below error then need to kinit:

$ kinit your-kerberos-userid

Error: (‘HY000’, ‘[HY000] [Cloudera][DriverSupport] (1110) Unexpected response received from server. Please ensure the server host and port specified for the connection are correct and confirm if SSL should be enabled for the connection. (1110) (SQLDriverConnect)’)

ENJOY!!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.