The following steps are used to install Cloudera Search which is based on Apache Solr.
Cloudera CDH 5.12.x
Deploying Cloudera Search
Cloudera Search (powered by Apache Solr) is included in CDH 5. If you have installed CDH 5.0 or higher, you do not need to perform any additional actions to install Search.
When you deploy Cloudera Search, SolrCloud partitions your data set into multiple indexes and processes, and uses ZooKeeper to simplify management, which results in a cluster of coordinating Apache Solr servers.
For Cloudera Manager installations, if you have not yet added the Solr service to your cluster, do so now from the Cloudera Manager->Home->Add Service dropdown. The Add a Service wizard automatically configures and initializes the Solr service.
Select the hosts in the next screen in CMgr where you want to deploy the SOLR servers. I selected two hosts which also had zookeeper running on them. Keep the default Zookeeper Znode /solr and HDFS Data Dirfectory /solr in the next screen.
However while running the add service got an error that “Sentry requires that authentication be turned on for Solr.”
Clicked back on the page and this time removed the dependency of Sentry and only kept HDFS and Zookeeper. And also in another browser went to the Cloudera Manager->Solr–>configuration and set Sentry Service to None. After that retried the install and it ran successfully.
Next in CMgr Hue->configuration->Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini remove Search from blacklist.
Login to Hue and try to click on the Query->Dashboard . But got an error:
HTTPConnectionPool(host=’localhost’, port=8983): Max retries exceeded with url: /solr/admin/cores?user.name=hue&doAs=hive&wt=json (Caused by NewConnectionError(‘<requests.packages.urllib3.connection.HTTPConnection object at 0x7fba0805bad0>: Failed to establish a new connection: [Errno 113] No route to host’,))
To resolve this issue go to Cloudera manager->Hue->configuration and search for solr.
In the Solr Service->Hue(Service Wide) click on Solr button if it has none clicked.
This will automatically update these lines in the hue.ini
Click restart in CMgr. After that clicking Hue->Query->Dashboard works and gives a new message:
It seems there is nothing to search on..
What about creating a new index?
This indicates that Hue is now successfully configured with Solr search.
A little about SolrCores and Collections
On a single instance, Solr has something called a SolrCore that is essentially a single index. If you want multiple indexes, you create multiple SolrCores. With SolrCloud, a single index can span multiple Solr instances. This means that a single index can be made up of multiple SolrCore‘s on different machines. We call all of these SolrCores that make up one logical index a collection. A collection is a essentially a single index that spans many SolrCore‘s, both for index scaling as well as redundancy.
EXAMPLE: Easy indexing of data into Solr with ETL operations
We need to create a new Solr collection from business review data. To start let’s put the data file somewhere on HDFS so we can access it.
Click on Hue->Indexes menu option on left. (Note: Index and Collection are mostly the same). Click on add Index.
Give the index a name and the location of the .csv file which was uploaded to HDFS using Hue. It will recognize the fields and data type and we can create the index by clicking finish.
After the index is created we can see it on the left under Collections. Click on the index then click on Search. This will display a Dashboard and you can create various Dashboards from the data in Hue.
If you go to the SOLR admin page you can get lot of details about SOLR and the index(collection) created.
Next running a list command on the Solr server gives the list of collections:
/root>solrctl instancedir –list