Follow the install instructions in the link below:
Installation with Cloudera Manager
To install Data Collector through Cloudera Manager, perform the following steps:
- Install the StreamSets custom service descriptor (CSD).
- (Optional.) Manually install the parcel and checksum files. Typically only needed when the Cloudera Manager Server does not have internet access.
- Download, distribute, and activate the StreamSets parcel.
- Configure the StreamSets service.
Afterwards, you can configure the Data Collector if necessary.
Step 1. Install the StreamSets Custom Service Descriptor
- Use the following URL to download the CSD from the StreamSets website: https://streamsets.com/opensource
$ mv STREAMSETS-126.96.36.199.jar /opt/cloudera/csd/
$ sudo chown cloudera-scm:cloudera-scm STREAMSETS-188.8.131.52.jar && sudo chmod 644 STREAMSETS-184.108.40.206.jar
$ sudo /etc/init.d/cloudera-scm-server restart
In Cloudera Manager, to restart the Cloudera Management Service, click Menu icon and select Restart.. To the right of Cloudera Management Service, click the
Are you sure you want to run the Restart command on the service Cloudera Management Service? Click Restart.
Step 3. Distribute and Activate the StreamSets Parcel
- To view the list of available parcels, in the menu bar, click the Parcels icon.
The StreamSets parcel displays in the list of available parcels. If it doesn’t display, click Check for New Parcels.
[Note: Ran into an issue with this as the STREAMSETS_DATACOLLECTOR 220.127.116.11 parcel did not show up initially in the parcel list. There was an error in the /var/log/cloudera-scm-server/cloudera-scm-server.log :
2018-03-26 16:37:10,286 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: (7 skipped) Unable to retrieve remote parcel repository manifest
java.util.concurrent.ExecutionException: java.net.ConnectException: Received fatal alert: protocol_version to https://archives.streamsets.com/datacollector/latest/parcel/manifest.json
This seemed to be caused by either java version or ssl/tls issue. So changed the parcel repo url from https:// to http://archives.streamsets.com/datacollector/latest/parcel/ . After this change the STREAMSETS_DATACOLLECTOR 18.104.22.168 parcel showed up in the parcel list]
2. To download the StreamSets parcel to the local repository, click Download.
Step 4. Configure the StreamSets Service
To run Data Collector in cluster streaming mode, colocate Data Collector on a node with the Spark Gateway role. To run Data Collectorin cluster batch mode, colocate Data Collector on a node with the YARN Gateway role.
To write to HDFS, colocate Data Collector on a node with the HDFS Gateway role. Similarly, to write to HBase or Hive, colocate Data Collector on nodes with the HBase or Hive Gateway roles, respectively.
- In Cloudera Manager, click the menu for the cluster you want to use, then click Add a Service.
- In the Service Types list, select StreamSets, then click Continue.
- To select the hosts where you want to install StreamSets, on the Customize Role Assignments for StreamSets page, click Select Hosts to open a list of available hosts.
- Select one or more hosts, then click OK. Click Continue.
The Review Changes page displays the Data and Resource directories for the Data Collector.
- Optionally change the directories, then click Continue.
The First Run Command page displays status updates as Cloudera Manager starts Data Collector on the selected hosts.
- Click Continue, then click Finish.
[NOTE: This gave an error in CM about JDK7 not supported so need to upgrade to JDK8.
Mon Mar 26 17:48:58 EDT 2018: Prepending content from /opt/cloudera/parcels/STREAMSETS_DATACOLLECTOR-22.214.171.124/libexec/sdc-env.sh to /var/run/cloudera-scm-agent/process/809-streamsets-DATACOLLECTOR/sdc-env.sh
ERROR: Detected JDK7 that is no longer supported. Please upgrade to JDK8.]
After upgrading the JDK to JDK1.8u162 the Streamsets service started successfully.
Check the other blog here on how to upgrade Cloudera CDH from JDK1.7 to JDK1.8.
Step 5(optional). Change the admin password for StreamSets login
The default passwords in Streamsets after install are like user=admin / password=admin ; guest / guest etc. There is no menu option to change user passwords. This has to be done using Cloudera Manager. Go to the Cloudera Manager->Streamsets->Configuration menu.
Search for password in the search box we will see the list of users and MD5 hash password for example:
Now login to linux OS and create a new MD5 password to update the above passwords:
$ echo -n password123 | md5sum
Ignore the hyphen at the end and copy the MD5 hash above and paste into the Cloudera Manager->Streamsets Configuration. Save and restart/deploy the Streamsets service using Cloudera Manager. After that you will be able to login with the new password into Streamsets.