Following python code makes REST calls to a secure Kerberos enabled Hadoop cluster to use webhdfs REST api to get file data:
- You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user.
- Make sure the python modules requests and requests_kerberos have been installed. Otherwise install it for example:
# pip install requests
# pip install requests-kerberos
3. Put the below code in a file and run the code such as $ python webhdfsexample.py
# start of python code
# start of python code
import http
import requests
import json
from requests_kerberos import HTTPKerberosAuth, REQUIRED
kerberos_auth = HTTPKerberosAuth(mutual_authentication=REQUIRED, sanitize_mutual_error_response=False)
webhdfs_url = "http://namenode:50070/webhdfs/v1/tmp?op=LISTSTATUS"
headers = { 'X-Requested-By': 'someuser'}
response = requests.get(webhdfs_url, headers=headers, auth=kerberos_auth, verify=False)
print ("webhdfs response statuscode=", response.status_code)
print ("webhdfs response responsetext=", response.text)
# end of python code
4. After running you should get results like below:
webhdfs response statuscode= 200
webhdfs response responsetext= {“FileStatuses”:{“FileStatus”:[
{“accessTime”:0,”blockSize”:0,”childrenNum”:7,”fileId”:26479,”group”:”group”,”length”:0,”modificationTime”:1532544 496983,”owner”:”userid”,”pathSuffix”:”staging”,”permission”:”700″,”replication”:0,”storagePolicy”:0,”type”:”DIRECTORY”},
]}}
ANOTHER EXAMPLE: USE PYTHON TO ACCESS HIVE TABLE WITH KERBEROS ENABLED USING PYHIVE
Make sure you have installed $ pip install pyhive
First make sure you have successfully got a Kerberos ticket in a linux terminal otherwise it wont work:
You need to first run $ knit userid@REALM to authenticate and initiate the Kerberos ticket for the user.
$ klist
Put the below script in a text file and run it with python:
from pyhive import hive
conn = hive.connect(host=’hive-server-hostname’, port=10000,auth=’KERBEROS’,kerberos_service_name=’hive’)
cursor = conn.cursor()
cursor.execute(‘SELECT * FROM mydbname.mytablenamexyz’)
# cursor.fetchone()
cursor.fetchmany(size=3)
This will produce result like:
(a,b,c),
(x,y,z),
(1.2.3)
etc.