Kafka install on Cloudera Hadoop

Below are the steps to install Kafka parcel in Cloudera manager.

Cloudera Distribution of Apache Kafka Requirements and Supported Versions:

Cloudera Kafka 2.2.x lowest supported Cloudera manager version 5.9.x, CDH 5.9.x and higher .

General Information Regarding Installation and Upgrade

These are the official instructions: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_installing.html#concept_jms_yb1_v5

Cloudera recommends that you deploy Kafka on dedicated hosts that are not used for other cluster roles.

Click on the Parcels icon in Cloudera Manager in top right. If you do not see Kafka 2.2.x  in the list of parcels, you can add the parcel url to the list.

  1. Find the parcel for the version of Kafka you want to use on Cloudera Distribution of Apache Kafka Versions.
  2. url http://archive.cloudera.com/kafka/parcels/2.2.0/ as the 3.3.0 parcel is not supported on CDH 5.12.x .  Copy this parcel repository link.
  3. On the Cloudera Manager Parcels page, click Configuration.
  4. In the field Remote Parcel Repository URLs, click + next to an existing parcel URL to add a new field.
  5. Paste the parcel repository link.
  6. Save your changes.
  7. On the Cloudera Manager Parcels page, download the Kafka parcel, distribute the parcel to the hosts in your cluster, and then activate the parcel. After you activate the Kafka parcel, Cloudera Manager prompts you to restart the cluster.
  8. Add the Kafka service to your cluster using the Cloudera manager->Add Service
  9. Select HDFS, Sentry and Zookeeper as list of dependencies when prompted.
  10. Next download, distribute, activate the parcel.
  11. Add the Kafka Service in Cloudera manager.
  12. Select the hosts for kafka services.
  13. Enter the Destination Broker List, Source Broker List including port.Destination Broker List                             myhost.com:9092bootstrap.serversSource Broker List                                     myhost.com:9092source.bootstrap.servers

Please note that both this Server Names must be FQDN and resolvable by your DNS (or hosts file), otherwise you’ll get other errors. Also the format with the trailing Port Number is mandatory!

Seems there is some bug in this kafka parcel and review this posting to find the solution. 

https://community.cloudera.com/t5/Cloudera-Manager-Installation/adding-a-Kafka-service-failed/td-p/40526

3) Click “Continue”. Service will NOT start (error). Do not navigate away from that screen

4) Open another Cloudera Manager in another browser pane. You should now see “Kafka” in the list of Services (red, but it should be there). Click on the Kafka Service and then “Configure”.

5) Search for the “java heap space” Configuration Property. The standard Java Heap Space you’ll find already set up should be 50 MBytes. Put in at least 256 MBytes. The original value is simply not enough.

6) Now search for the “whitelist” Configuration Property. In the field, put in “(?!x)x” (without the quotation marks). That’s a regular expression that does not match anything. Given that apparently a Whitelist is mandatory for the Mirrormaker Service to start, and I’m assuming you don’t want to replicate any topics remotely right now, just put in something that won’t replicate anything e.g. that regular expression.

7) Save the changes and go back to the original Configuration Screen on the other browser pane. Click “Retry”, or whatever, or even exit that screen and manually restart the Kafka Service in Cloudera Manager.

After this the Kafka service should start successfully.

 

Next try some Kafka commands:

Kafka command-line tools are located in /usr/bin. Login to the server where a Kafka broker is running with root:

  • kafka-topics examples for Create, alter, list, and describe topics. 

$ /usr/bin/kafka-topics –create –zookeeper kafkahost.com:2181 –replication-factor 1 –partitions 1 –topic xyztopic1

17/12/18 10:09:34 INFO zkclient.ZkClient: zookeeper state changed (SyncConnected)
17/12/18 10:09:34 INFO admin.AdminUtils$: Topic creation {“version”:1,”partitions”:{“0”:[112]}}
Created topic “gctopic1”.
17/12/18 10:09:34 INFO zkclient.ZkEventThread: Terminate ZkClient event thread.
17/12/18 10:09:34 INFO zookeeper.ZooKeeper: Session: 0x2602cabe1904fcd closed
17/12/18 10:09:34 INFO zookeeper.ClientCnxn: EventThread shut down

$ kafka-topics –zookeeper Bkafkahost.com:2181 –list

$ /opt/cloudera/parcels/KAFKA-2.2.0-1.2.2.0.p0.68/lib/kafka/bin/kafka-consumer-groups.sh –zookeeper localhost:2181 –list

$ /opt/cloudera/parcels/KAFKA-2.2.0-1.2.2.0.p0.68/lib/kafka/bin/kafka-consumer-groups.sh –zookeeper localhost:2181 –describe –group consumergroup123

TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID

topic1       0                       121                       121                         0       consumergroup123_xyz

NOTE: Kafka 0.11.0.0 added support to manipulate offsets for a consumer group via cli kafka-consumer-groupscommand. Prior to this version reset-offset gives an error: Exception in thread “main” joptsimple.UnrecognizedOptionException: reset-offsets is not a recognized option

 

References:

https://www.rittmanmead.com/blog/2015/03/creating-real-time-search-dashboards-using-apache-solr-hue-flume-and-morphlines/

https://www.cloudera.com/documentation/kafka/latest/topics/kafka_installing.html#concept_jms_yb1_v5

https://www.confluent.io/blog/stream-data-platform-1/

https://developer.ibm.com/opentech/2017/05/31/kafka-acls-in-practice/

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s