I recently worked at moving our application search solution from using sphinx to elastic search. Our Sphinx setup was not
optimal (and maybe we did not set it up correctly). In production with sphinx, we had to maintain a separate dummy app on
our sphinx server just to run maintenance on our sphinx instance. We had long decided to separate our sphinx server from
our application servers. I have become an avid fan of Docker and containerization especially in a micro-service architecurre like ours. Basically my opinion is ‘if you are not using Docker, you should at least look into it’. Setting up Docker in development involved the following process.
Docker setup in Mac OS X
Installing Docker on Mac not scope of this article but instructions can be found here. Installation using those instructions are straight forward and worked like a charm with no hiccups. There is no point of re-writing them here.
Docker setup in Ubuntu
Likewise, in Ubuntu, Instructions to install Docker on Ubuntu were straight forward as documented in the Docker documentation
Mac OS X setup
Pull the latest elastic search image using:
$docker pull elasticsearch
Run the image by mounting a volume to persist the elastic search indexing data. This is so that we can discard our elastic search container at any time without neccessarily losing our search index data since the data will reside in the host machine and not the container.
$ docker run -d -p 9200:9200 -v "$PWD/esdata":/usr/share/elasticsearch/data elasticsearch
This should work fine. However, I ran into a documented problem regarding mounting docker volumes in Mac OS X. It turns out that there is a permissions problem with Mac OS X and mounting docker volumes when using docker machine on Mac. The error you may get will be similar to the one below:
There is a GitHub issue that explores the problem and solution which can be found here. The solution involves activating and mounting the shared folder in the virtual machine as NFS. The workaround documented in the issue above, involves doing the following:
Get the docker-machine-nfs script from here
Then run the command:
$ docker-machine-nfs dev-nfs
When done, open the file /etc/exports and replace -mapall=$uid:$gid with -maproot=0
or just use
Then restart nfsd
And finally run
You should then be able to start you container with:
And verify with
Remember Mac OS X has a light weight Linux VM machine to run docker so the container is reachable only via the IP of the VM. You can find out the IP of the VM using:
(In this case my Virtual Machine is named default)
Lets say the ip is 192.168.99.100 , you should be able to visit your browser at port 9200 and get the following output:
Once Elasticsearch is setup, we can move to the Rails application.
There is a Ruby elastic search library that we will be using ‘elasticsearch’ which is a wrapper for two separate libraries (elasticsearch-transport and elasticsearch-api)
The elasticshearch-model gem, builds on top of the elastic search library. The key is to extend any model that you want to search via elastic search with Elasticsearch functionality. To do this we make use of ActiveSupport::Concern instrumentation so as to pull out Elastcisearch code from our model code and make it ‘dryer’. For example let us say we have an Account model which we want to expose to Elasticsearch.
In the models/concerns directory, create a file called account_indexer.rb and add the following:
Then in the Account model code you would have
Is used to configure the index with mappings. In this case, we are interested in indexing 3 columns of the Account model - number, first_name, last_name. You can view the mappings hash with the command:
The defined settings and mappings are used to create an index with the desired configuration
Commands to create and refresh indexes are provided by the name spaced functions:
You can write rake tasks that perform these actions like so:
Before running the commands above, Elasticsearch needs to be bootstrapped in some kind of initialiser. Create an elastcisearch.yml file in your config directory with the following:
Create an initializer elastcisearch.rb in config/initializers with the following:
This will create an Elasticsearch client that will be used by ALL models in the application.
After creating the index, you could do the initial import of the data using the command.
However, recall that this call will be making remote api calls to the Elasticsearch server. At scale with thousands of records this is not optimal. Thankfully Elasticsearch provides a bulk indexing functionality to mitigate this. You could create a module to easily do bulk indexing of existing records like the one below:
With this for example, a call to bulk index the Account model would be:
A simple search now takes the form
We are using just the basic search of elasticsearch as can be seen in the function
You can customize the search by sending in more options using the Elasticsearch DSL but we will stick with the plain vanilla search and you can look at the search DSL documentation. There are many options of retrieving the search results including the related records eg.
As the application is used, invariably, the indices need to be updated when records are created/edited/deleted. The gem has callbacks that can be invoked automatically by including
in the model (or in our case in the concern we are including). However, this will result in additional overhead when performing crud operations because the indexing operation involves doing an API call. It is prudent to pass this to a back ground job using Sidekiq or Resque. Since we use Resque, a background job worker to update indices could look something like this:
Add callbacks to the Account concern that will be invoked on saving an Account object or deleting an Account object