Setting up elastic search in Rails using Docker
I recently worked at moving our application search solution from using sphinx to elastic search. Our Sphinx setup was not optimal (and maybe we did not set it up correctly). In production with sphinx, we had to maintain a separate dummy app on our sphinx server just to run maintenance on our sphinx instance. We had long decided to separate our sphinx server from our application servers. I have become an avid fan of Docker and containerization especially in a micro-service architecurre like ours. Basically my opinion is ‘if you are not using Docker, you should at least look into it’. Setting up Docker in development involved the following process.
Docker Setup
Docker setup in Mac OS X
Installing Docker on Mac not scope of this article but instructions can be found here. Installation using those instructions are straight forward and worked like a charm with no hiccups. There is no point of re-writing them here.
Docker setup in Ubuntu
Likewise, in Ubuntu, Instructions to install Docker on Ubuntu were straight forward as documented in the Docker documentation
Elasticsearch setup
Mac OS X setup
Pull the latest elastic search image using:
$docker pull elasticsearch
Run the image by mounting a volume to persist the elastic search indexing data. This is so that we can discard our elastic search container at any time without neccessarily losing our search index data since the data will reside in the host machine and not the container.
$ docker run -d -p 9200:9200 -v "$PWD/esdata":/usr/share/elasticsearch/data elasticsearch
This should work fine. However, I ran into a documented problem regarding mounting docker volumes in Mac OS X. It turns out that there is a permissions problem with Mac OS X and mounting docker volumes when using docker machine on Mac. The error you may get will be similar to the one below:
[2016-03-23 14:26:06,882][warn ][bootstrap] unable to install syscall filter: seccomp unavailable: your kernel is buggy and you should upgrade
Exception in thread "main" java.lang.IllegalStateException: Unable to access 'path.data' (/usr/share/elasticsearch/data/elasticsearch)
Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/elasticsearch
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
at java.nio.file.Files.createDirectories(Files.java:767)
at org.elasticsearch.bootstrap.Security.ensureDirectoryExists(Security.java:337)
at org.elasticsearch.bootstrap.Security.addPath(Security.java:314)
at org.elasticsearch.bootstrap.Security.addFilePermissions(Security.java:259)
at org.elasticsearch.bootstrap.Security.createPermissions(Security.java:212)
at org.elasticsearch.bootstrap.Security.configure(Security.java:118)
at org.elasticsearch.bootstrap.Bootstrap.setupSecurity(Bootstrap.java:196)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:167)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:285)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
Refer to the log for complete error details
There is a GitHub issue that explores the problem and solution which can be found here. The solution involves activating and mounting the shared folder in the virtual machine as NFS. The workaround documented in the issue above, involves doing the following:
Get the docker-machine-nfs script from here
Then run the command:
$ docker-machine-nfs dev-nfs
When done, open the file /etc/exports and replace -mapall=$uid:$gid with -maproot=0
or just use
docker-machine-nfs default --shared-folder=/Users --nfs-config="-alldirs -maproot=0” [updated setting]
Then restart nfsd
$ sudo nfsd restart
And finally run
$ eval "$(docker-machine env dev-nfs)"
You should then be able to start you container with:
docker run -d -p 9200:9200 --name k2_search -v "$PWD/esdata":/usr/share/elasticsearch/data elasticsearch
And verify with
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
980a09759648 elasticsearch "/docker-entrypoint.s" About an hour ago Up About an hour 0.0.0.0:9200->9200/tcp, 9300/tcp k2_search
Remember Mac OS X has a light weight Linux VM machine to run docker so the container is reachable only via the IP of the VM. You can find out the IP of the VM using:
\$docker-machine ip default
(In this case my Virtual Machine is named default)
Lets say the ip is 192.168.99.100 , you should be able to visit your browser at port 9200 and get the following output:
{
"name" : "Longneck",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.2.1",
"build_hash" : "d045fc29d1932bce18b2e65ab8b297fbf6cd41a1",
"build_timestamp" : "2016-03-09T09:38:54Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}
Once Elasticsearch is setup, we can move to the Rails application.
Rails setup
Setup
There is a Ruby elastic search library that we will be using ‘elasticsearch’ which is a wrapper for two separate libraries (elasticsearch-transport and elasticsearch-api)
The elasticshearch-model gem, builds on top of the elastic search library. The key is to extend any model that you want to search via elastic search with Elasticsearch functionality. To do this we make use of ActiveSupport::Concern instrumentation so as to pull out Elastcisearch code from our model code and make it ‘dryer’. For example let us say we have an Account model which we want to expose to Elasticsearch.
In the models/concerns directory, create a file called account_indexer.rb and add the following:
module AccountIndexer
extend ActiveSupport::Concern
included do
include Elasticsearch::Model
settings index: { number_of_shards: 1} do
mappings dynamic: 'false' do
indexes :number, analyzer: 'english', index_options: 'offsets'
indexes :first_name, analyzer: 'english', index_options: 'offsets'
indexes :last_name, analyzer: 'english', index_options: 'offsets'
end
end
def as_indexed_json(options={})
as_json(root: false, only: [:number, :first_name, :last_name])
end
def self.serach(query)
__elasticsearch__.search(query)
end
end
end
Then in the Account model code you would have
class Account < ActiveRecord::Base
include AccountIndexer
end
The code
settings index: { number_of_shards: 1} do
mappings dynamic: 'false' do
indexes :number, analyzer: 'english', index_options: 'offsets'
indexes :first_name, analyzer: 'english', index_options: 'offsets'
indexes :last_name, analyzer: 'english', index_options: 'offsets'
end
end
Is used to configure the index with mappings. In this case, we are interested in indexing 3 columns of the Account model - number, first_name, last_name. You can view the mappings hash with the command:
Account.mappings.to_hash
{:account=> {
:dynamic=>"false",
:properties=> {
:number=> {:analyzer=>"english", :index_options=>"offsets", :type=>"string"},
:first_name=> {:analyzer=>"english", :index_options=>"offsets", :type=>"string"},
:last_name=> {:analyzer=>"english", :index_options=>"offsets", :type=>"string"}
}
}
}
Indexing
The defined settings and mappings are used to create an index with the desired configuration Commands to create and refresh indexes are provided by the name spaced functions:
Account.**elasticsearch**.create_index! force: true
Account.**elasticsearch**.refresh_index!
You can write rake tasks that perform these actions like so:
desc 'bulk index the account model'
task bulk_index_accounts: :environment do
puts 'creating accounts index...'
Account.**elasticsearch**.create_index!
end
Before running the commands above, Elasticsearch needs to be bootstrapped in some kind of initialiser. Create an elastcisearch.yml file in your config directory with the following:
##
development:
host: 'http://localhost:9200'
transport_options:
request:
timeout: 5
test:
host: 'http://localhost:9200'
transport_options:
request:
timeout: 5
production:
host: 'http://localhost:9200'
transport_options:
request:
timeout: 5
Create an initializer elastcisearch.rb in config/initializers with the following:
config = {
host: 'http://localhost:9200/',
transport_options: {
request: { timeout: 5 }
},
}
if File.exists?('config/elasticsearch.yml')
config.merge!(YAML.load_file("#{Rails.root}/config/elasticsearch.yml")[Rails.env].deep_symbolize_keys)
end
Elasticsearch::Model.client = Elasticsearch::Client.new(config)
This will create an Elasticsearch client that will be used by ALL models in the application.
After creating the index, you could do the initial import of the data using the command.
Account.import
However, recall that this call will be making remote api calls to the Elasticsearch server. At scale with thousands of records this is not optimal. Thankfully Elasticsearch provides a bulk indexing functionality to mitigate this. You could create a module to easily do bulk indexing of existing records like the one below:
module Search
module BulkIndexer
def self.import(klass)
klass.constantize.find_in_batches do |batch_objects|
bulk_index(klass, batch_objects)
end
end
def self.prepare_records(records)
records.map do |record|
{index: {_id: record.id, data: record.as_indexed_json}}
end
end
def self.bulk_index(klass, records)
klass.constantize.__elasticsearch__.client.bulk({
index: klass.constantize.__elasticsearch__.index_name,
type: klass.constantize.__elasticsearch__.document_type,
body: prepare_records(records)
})
end
end
end
With this for example, a call to bulk index the Account model would be:
Search::BulkIndexer.import('Account')
Searching
A simple search now takes the form
Account.search('query_string')
We are using just the basic search of elasticsearch as can be seen in the function
def self.serach(query)
**elasticsearch**.search(query)
end
You can customize the search by sending in more options using the Elasticsearch DSL but we will stick with the plain vanilla search and you can look at the search DSL documentation. There are many options of retrieving the search results including the related records eg.
records = Account.search('query_string').records
Updating indices
As the application is used, invariably, the indices need to be updated when records are created/edited/deleted. The gem has callbacks that can be invoked automatically by including
include Elasticsearch::Model::Callbacks
in the model (or in our case in the concern we are including). However, this will result in additional overhead when performing crud operations because the indexing operation involves doing an API call. It is prudent to pass this to a back ground job using Sidekiq or Resque. Since we use Resque, a background job worker to update indices could look something like this:
module Search
class Indexer
@queue = :indexer
def async_index_operation(operation, record_id, klass)
begin
Resque.enqueue(KopoKopo::Search::Indexer, operation, record_id, klass)
rescue => ex
indexing_logger.error "E [#{Time.now.utc.iso8601}] Error: Error in Enqueueing #{klass} #{operation} operation: #{ex.message}"
end
end
def self.perform(operation, record_id, klass)
begin
@es_client = Elasticsearch::Model.client
index_object(operation, record_id, klass) if @es_client
ensure
end
end
def self.index_object(operation, record_id, klass)
case operation.to_s
when /index/
record = klass.constantize.find(record_id)
@es_client.index(index: ActiveSupport::Inflector.pluralize(klass.downcase),
type: klass.downcase,
id: record_id,
body: record.as_indexed_json) if record
when /delete/
@es_client.delete(index: ActiveSupport::Inflector.pluralize(klass.downcase),
type: klass.downcase,
id: record_id)
else raise ArgumentError, "Unknown operation '#{operation}'"
end
end
def indexing_logger
@indexing_logger = Logger.new( "#{Rails.root}/log/indexer-job.log", 'monthly')
end
end
end
Add callbacks to the Account concern that will be invoked on saving an Account object or deleting an Account object
after_save {Search::Indexer.new.async_index_operation('index', self.id, self.class.to_s)}
after_destroy {Search::Indexer.new.async_index_operation('delete', self.id, self.class.to_s)}