Using Streamsets for ETL to/from Hadoop

[blog in progress - incomplete] This blog will show some examples of doing ETL to or from Hadoop. USE CASE #1: Use Sqoop commands inside Streamsets to copy data to Hadoop from RDBMS https://www.youtube.com/watch?v=k8VbTR77l8M https://streamsets.com/tutorials/ USE CASE #2: Use Kafka with Kerberos and Sentry with Streamsets on Cloudera The following steps need to be completed … Continue reading Using Streamsets for ETL to/from Hadoop

Advertisements

Streamsets install Oracle JDBC driver in External Library for CDH

This blog will show how to install the Oracle JDBC driver to the Streamsets External Library in a Cloudera Hadoop system.  Environment: Cloudera CDH 5.12, Streamsets 3.1.2 TASK: Update the Oracle JDBC driver inside Streamsets https://streamsets.com/documentation/datacollector/latest/help/#datacollector/UserGuide/Configuration/ExternalLibs.html#concept_pdv_qlw_ft Step 1. Set Up an External Directory  Setting Up for Cloudera Manager In Cloudera Manager, select the StreamSets service and then … Continue reading Streamsets install Oracle JDBC driver in External Library for CDH