The Nifi pipeline for copying HDFS files to Azure Synapse ADLS2 folder will be as below. This was tested with Cloudera CDP 7.1.7 and Azure Synapse ADLS2. GetHDFSFileinfo will generate the list of both directory and files. Keep in mind that the GetHDFSFileInfo processor does not maintain any state, so every time it executes it … Continue reading Copy HDFS file to Azure Synapse ADLS2 using Nifi
Category: ETL
Migrating Streamsets SDC to Apache Nifi Install steps
Now that Streamsets DataCollector (SDC) is no longer open-source after version 4.x as of mid 2021 we need a similar Dataflow ETL tool to replace it. After looking at Apache Airflow and Apache Nifi two of the most popular open-source ETL tools, it appears Nifi is most similar to Streamsets SDC. Airflow is more a … Continue reading Migrating Streamsets SDC to Apache Nifi Install steps
Streamsets renew JWT token to call api
Many JWT tokens expire hourly and need to be renewed to pass in an api call. Streamsets auto renewal of JWT tokens may not work so here is another way to renew JWT tokens. To simulate any api call to see what it is sending to api server you can use the website: http://httpbin.org/ STEPS: … Continue reading Streamsets renew JWT token to call api
TALEND ETL Examples
Consuming REST APIs with Talend Open Studio 6.2.1 medium.com/@stevenbeeckman/consuming-rest-apis-with-talend-open-studio-6-2-1-147d0de15c35
Using Streamsets for ETL to/from Hadoop
[blog in progress - incomplete] This blog will show some examples of doing ETL to or from Hadoop. USE CASE #1: Use Sqoop commands inside Streamsets to copy data to Hadoop from RDBMS https://www.youtube.com/watch?v=k8VbTR77l8M https://streamsets.com/tutorials/ How to use JDBC Quey Consumer with a Date Offset column? https://community.streamsets.com/how-to-44/how-to-use-jdbc-quey-consumer-with-a-date-offset-column-368 AkshayJadhav StreamSets Employee0 replies Issue: Some customers have an … Continue reading Using Streamsets for ETL to/from Hadoop
Streamsets install Oracle JDBC driver in External Library for CDH
This blog will show how to install the Oracle JDBC driver to the Streamsets External Library in a Cloudera Hadoop system. Environment: Cloudera CDH 5.12, Streamsets 3.1.2 TASK: Update the Oracle JDBC driver inside Streamsets https://streamsets.com/documentation/datacollector/latest/help/#datacollector/UserGuide/Configuration/ExternalLibs.html#concept_pdv_qlw_ft Step 1. Set Up an External Directory Setting Up for Cloudera Manager In Cloudera Manager, select the StreamSets service and then … Continue reading Streamsets install Oracle JDBC driver in External Library for CDH
Streamsets install using Cloudera Manager
Environment: CDH 5.12 STREAMSETS-3.1.2.0.jar Follow the install instructions in the link below: https://streamsets.com/documentation/datacollector/latest/help/index.html#datacollector/UserGuide/Installation/CMInstall-Overview.html#concept_nb5_c3m_25 Installation with Cloudera Manager To install Data Collector through Cloudera Manager, perform the following steps: Install the StreamSets custom service descriptor (CSD). (Optional.) Manually install the parcel and checksum files. Typically only needed when the Cloudera Manager Server does not have internet access. Download, … Continue reading Streamsets install using Cloudera Manager