Install ElasticSearch on Cloudera Hadoop

Environment:

Cloudera CDH 5.12.x

 

There are 3 ways to connect from Hive to ElasticSearch.

We can set ES-hadoop jar in the command:

hive -hiveconf hive.aux.jars.path=/opt/elastic/elasticsearch-hadoop-2.4.3/dist/elasticsearch-hadoop-hive-2.4.3.jar;

Other option for doing same thing is to open hive session and then calling following command as first thing:


ADD JAR /opt/elastic/elasticsearch-hadoop-2.4.3/dist/elasticsearch-hadoop-hive-2.4.3.jar;

Problem with both these approaches is that you will have to keep letting hive know the full path to elasticsearch jars every single time. Instead you can take care of this issue by copying elasticsearch-hadoop-hive-<eshadoopversion>.jar into same directory on every node in your local machine. In my case i copied it to /usr/lib/hive/lib directory by executing following command

sudo cp /opt/elastic/elasticsearch-hadoop-2.4.3/dist/elasticsearch-hadoop-hive-2.4.3.jar /usr/lib/hive/lib/.

Then in Cloudera Manager set the value of Hive Auxiliary JARs Directory hive.aux.jars.path property to /usr/lib/hive/lib directory.

 

 

REFERENCES:

https://qbox.io/blog/how-to-offload-elasticsearch-indices-to-hive-hadoop

https://db-blog.web.cern.ch/blog/prasanth-kothuri/2016-03-integrating-hadoop-and-elasticsearch-part-1-loading-and-querying

https://www.linkedin.com/pulse/how-use-hive-table-elasticsearch-index-khan-arshad/

http://www.idata.co.il/2016/06/integrating-elasticsearch-with-hadoop-using-hive/

https://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s