Elasticsearch notes

Elasticsearch vs RDBMS concepts: You can (roughly) think of an Elastic index like a RDBMS database.   MySQL => Databases => Tables => Rows=>Columns Elasticsearch => Indices(database) => Types(tables) => Documents(rows) with Properties(columns) An Elasticsearch cluster can contain multiple Indices (databases), which in turn contain multiple Types(tables). These types hold multiple Documents (rows), and each document has Properties(columns). A ES mapping … Continue reading Elasticsearch notes

Connect Hadoop to ElasticSearch using Talend

(BLOG IN PROGRESS - INCOMPLETE) This blog will show how to update an ElasticSearch index with data from HDFS file using the Talend Open Studio for Big Data ETL tool. First create a new job in Talend Studio such as HDFStoESindexjob. Drag the following components into the Design area: tHDFSconnection_1----onsubok----> tHDFSinput_1-----row1(Main)--> tWriteJSONField_2-----row2(Main)--->tRESTClient_1 3.   Talend … Continue reading Connect Hadoop to ElasticSearch using Talend

Connect ElasticSearch to Cloudera Hadoop using ES-Hadoop.

[CAUTION: Currently the ES-Hadoop jars are giving errors with Cloudera CDH and Hue throwing errors saying multiple jars found and so the below process is not working. Use these instructions at your own risk as they may not work and so far not able to get a solution yet.] Environment: Cloudera CDH 5.12.x elasticsearch-hadoop-6.2.1   … Continue reading Connect ElasticSearch to Cloudera Hadoop using ES-Hadoop.