Run a Python program to access Hadoop webhdfs and Hive with Kerberos enabled

Following python code makes REST calls to a secure Kerberos enabled Hadoop cluster to use webhdfs REST api to get file data:

  1. You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user.
  2. Make sure the python modules requests and requests_kerberos have been installed. Otherwise install it for example:

# pip install requests

# pip install requests-kerberos

3. Put the below code in a file and run the code such as $ python webhdfsexample.py

# start of python code

# start of python code

import http
import requests
import json
from requests_kerberos import HTTPKerberosAuth, REQUIRED

kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)
webhdfs_url = "http://namenode:50070/webhdfs/v1/tmp?op=LISTSTATUS"
headers = { 'X-Requested-By': 'someuser'}
response = requests.get(webhdfs_url, headers=headers, auth=kerberos_auth, verify=False)

print ("webhdfs response statuscode=", response.status_code)
print ("webhdfs response responsetext=", response.text)

# end of python code

4. After running you should get results like below:

webhdfs response statuscode= 200
webhdfs response responsetext= {“FileStatuses”:{“FileStatus”:[
{“accessTime”:0,”blockSize”:0,”childrenNum”:7,”fileId”:26479,”group”:”group”,”length”:0,”modificationTime”:1532544 496983,”owner”:”userid”,”pathSuffix”:”staging”,”permission”:”700″,”replication”:0,”storagePolicy”:0,”type”:”DIRECTORY”},

]}}

ANOTHER EXAMPLE: USE PYTHON TO ACCESS HIVE TABLE WITH KERBEROS ENABLED USING PYHIVE

Make sure you have installed $ pip install pyhive

First make sure you have successfully got a Kerberos ticket in a linux terminal otherwise it wont work:

You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user.

$ klist

Put the below script in a text file and run it with python:

from pyhive import hive
conn = hive.connect(host=’hive-server-hostname’, port=10000,auth=’KERBEROS’,kerberos_service_name=’hive’)

cursor = conn.cursor()
cursor.execute(‘SELECT * FROM mydbname.mytablenamexyz’)

# cursor.fetchone()

cursor.fetchmany(size=3)

This will produce result like:

(a,b,c),

(x,y,z),

(1.2.3)

etc.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.