Talend Open Studio for Big Data is a powerful ETL tool which is also open source. You can download and use it to do ETL to and from Hadoop including both HDFS and Hive. Talend Install steps Downloaded the free Talend Open Studio for Big Data from https://www.talend.com/products/big-data/big-data-open-studio/ The download file location is set to c:\temp … Continue reading Use Talend Open Studio for Big Data to ETL to Hadoop
[CAUTION: Currently the ES-Hadoop jars are giving errors with Cloudera CDH and Hue throwing errors saying multiple jars found and so the below process is not working. Use these instructions at your own risk as they may not work and so far not able to get a solution yet.] Environment: Cloudera CDH 5.12.x elasticsearch-hadoop-6.2.1 … Continue reading Connect ElasticSearch to Cloudera Hadoop using ES-Hadoop.
Opensource BI / ETL tools: Talend = ETL tool, leader in Gartner Magic Quadrant Streamsets = ETL tool HUE = Hadoop Analytics server Jupyter Notebook = Datascience BI tool Pentaho = desktop and server version KNIME = Data Science leader in Gartner Magic Quadrant 2017 desktop version PowerBI = desktop free version Oracle SQL Developer … Continue reading Business Intelligence, ETL and Data Science tools
First download and install the popular free tool Oracle SQL Developer for Windows from Oracle website. Read this blog for a good idea about connecting Oracle SQL Developer to Hadoop Hive: https://blogs.oracle.com/bigdataconnectors/move-data-between-apache-hadoop-and-oracle-database-with-sql-developer Note when configuring Cloudera-Hive JDBC drivers use the below website to download the 64bit JDBC driver for windows. https://www.cloudera.com/downloads/connectors/hive/jdbc/2-5-19.html In the Tools->Preferences->Database->Third party … Continue reading Query Cloudera Hadoop Hive using Oracle SQL Developer.