Install ElasticSearch on Cloudera Hadoop

Environment:

Cloudera CDH 5.12.x

 

There are 3 ways to connect from Hive to ElasticSearch.

We can set ES-hadoop jar in the command:

hive -hiveconf hive.aux.jars.path=/opt/elastic/elasticsearch-hadoop-2.4.3/dist/elasticsearch-hadoop-hive-2.4.3.jar;

Other option for doing same thing is to open hive session and then calling following command as first thing:


ADD JAR /opt/elastic/elasticsearch-hadoop-2.4.3/dist/elasticsearch-hadoop-hive-2.4.3.jar;

Problem with both these approaches is that you will have to keep letting hive know the full path to elasticsearch jars every single time. Instead you can take care of this issue by copying elasticsearch-hadoop-hive-<eshadoopversion>.jar into same directory on every node in your local machine. In my case i copied it to /usr/lib/hive/lib directory by executing following command

sudo cp /opt/elastic/elasticsearch-hadoop-2.4.3/dist/elasticsearch-hadoop-hive-2.4.3.jar /usr/lib/hive/lib/.

Then in Cloudera Manager set the value of Hive Auxiliary JARs Directory hive.aux.jars.path property to /usr/lib/hive/lib directory.

 

 

REFERENCES:

https://qbox.io/blog/how-to-offload-elasticsearch-indices-to-hive-hadoop

https://db-blog.web.cern.ch/blog/prasanth-kothuri/2016-03-integrating-hadoop-and-elasticsearch-part-1-loading-and-querying

https://www.linkedin.com/pulse/how-use-hive-table-elasticsearch-index-khan-arshad/

http://www.idata.co.il/2016/06/integrating-elasticsearch-with-hadoop-using-hive/

https://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html

 

 

 

Advertisements