Archonic

Posted on Jun 23, 2022 • Edited on Feb 13

Swapping Elasticsearch for Meilisearch in Rails feat. Docker

#rails #elasticsearch #meilisearch #docker

Elasticsearch is a comprehensive and highly configurable search engine and storage system for a multitude of app concerns. In this article we’re only going to be comparing it’s search engine capabilities within the context of a Dockerized Ruby on Rails app. If your app has a need for specifically weighted attribute boosting, results that get better with machine learning, mature highly available sharding capabilities, or multi-index searching, Elasticsearch is still what you want.

If your search needs are somewhere between pg_search/ransack and Elasticsearch, Meilisearch is a new contender which is blazing fast (<50ms), much more resource efficient, has a sensible default configuration, a first-party Ruby library and Rails gem and an admin panel to try out searching before fully integrating within your app. With full text search, synonyms, typo-tolerance, stop words and customizable relevancy rules, Meilisearch has enough features to satisfy most applications — and that’s before their v1.0 release 👏. Multi-index searching is also on the roadmap.

Part Zero: But Why?

First lets compare Elasticsearch and Meilisearch on the item you’re probably here to learn about — resource usage. Memory on the cloud is expensive and Elasticsearch is a known memory hog. On my Rails app which has fairly low usage, it’s using 3.5GB. That’s 2.7GB more than the next highest container which is Rails web workers running malloc instead of jemalloc (a topic for a different article!).

So how much more efficient is Meilisearch? Let’s get a baseline with Elasticsearch first. We’ll be using this movie database with ~32k rows.

I have to note here that Elasticsearch took a lot more time to set up. It initially refused to start up because it needed more memory than the OS would allow it to allocate just to start. That limit needed to be expanded with sysctl -w vm.max_map_count=262144. Then the JSON file needed a fair amount of manipulation because the bulk JSON API expects you to specify the index for every row. This wasn’t evident in the documentation and an ancient StackOverflow answer came to my rescue.

docker network create elastic
docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.2.3
curl --location --request POST 'https://localhost:9200/movies/_bulk/' \
--header 'Content-Type: application/x-ndjson' \
--header 'Authorization: Basic ---' \
--data-binary '@movies.json'

docker stats reports that Elasticsearch is using 5.2GB of memory. Adding the movies to the index did not increase this — it uses 5.2GB by default with no data. You can of course set ES_JAVA_OPTS and get that down. Even small apps however risk container evictions due to memory pressure when doing that. This was the main motivator for me to check out Meilisearch.

Now let’s do the same thing with Meilisearch. It was quite a bit easier to setup and the bulk import was a breeze.

docker run --rm -p 7700:7700 -v "$(pwd)/meili_data:/meili_data" getmeili/meilisearch
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
  --header 'content-type: application/json' \
  --data-binary @movies.json
```



Letting Meilisearch run for a few minutes, the memory usage actually halved down to **96.7MB**.

Now let’s run a simple comparison benchmark. We’ll run 100 iterations of `q=batman&limit=10` for Meilisearch and `?q=batman&size=10` for Elasticsearch.

**Elasticsearch: 9.68ms average, 15ms peak.
Meilisearch: 5.17ms average. 11ms peak.**

**Meilisearch used 54.8x less memory and was 46.6% faster than Elasticsearch with the same data and the same queries.**

![That’s a lot faster and a lot easier to host.](https://miro.medium.com/max/480/1*JdDfBF36W6NwIHWthqVCZw.gif)

The image is also 36MB instead of 1.2GB — nice. Note that this is specifically a comparison of **default** configurations. What’s more is Meilisearch has an interface at localhost:7700 so we don’t even need to open Postman to poke around (sorry, no filtering or sorting on the admin interface at the moment).

Convinced? Ok read on and I’ll show you what switching from Elasticsearch to Meilisearch looked like for a real production app — [ScribeHub](https://scribehub.com). We also moved from Ankane’s excellent [Searchkick](https://github.com/ankane/searchkick/) gem to the first party [meilisearch-rails](https://github.com/meilisearch/meilisearch-rails/) gem and I’ll show you the changes there as well.

## Part One: DevOps

Begin by replacing your Elasticsearch container with a Meilisearch container in your docker-compose.yml:



```
meilisearch:
  image: getmeili/meilisearch:v0.27.0
  user: root
  ports:
    - "7700:7700"
  volumes:
    - "meili:/meili_data/"
  env_file:
    - .msenv
...
volumes:
  meili:
```



The first big difference is authentication. Meilisearch supports a direct front-end integration which doesn’t even touch Rails (neat!). That means if a master key is set, it will generate default keys with specific permissions on start up. If you’re just trying MS out locally, I recommend not setting the master key so that it will allow unauthenticated requests. If you intend to ship to production, I’d recommend setting the master key to ensure you understand how that works before you’re launching. We won’t be going into front-end only implementations in this article — we’re just going to focus on the ES to MS migration.

Something that almost made me [give up](https://github.com/meilisearch/meilisearch-rails/issues/148) right at the beginning was that the MS service will roll the keys if there is any change to it’s environment file. I kept dropping the default admin key into a common .env file which would roll the keys again and I would get auth errors when trying to reindex. It’s supposed to roll the keys if there’s a change to the master key, but rolling the keys on any change to the env file means **you should have a separate env file for the MS service**. I called it ‘.msenv’ as you can see above. I’ve seen it roll the keys even when there was no change to it’s own env file but that was a result of not mounting to the /meili_data directory.

If you’re setting a master key, run `SecureRandom.hex 32` from a Rails console and drop that into `MEILI_MASTER_KEY` in your .msenv file. You can also set the host and turn off anonymous analytics while you’re at it, which I personally think should default to disabled. Here’s my example .msenv:



```
# WARNING
# Every time any change is made to this file, Meilisearch will regenerate keys.
# That will invalidate current keys and make you sad.
MEILISEARCH_HOST=http://meilisearch:7700
MEILI_MASTER_KEY=<YOUR MASTER KEY>
MEILI_NO_ANALYTICS=true
```



Run `docker-compose up` and you should see this in the MS start up output:

> A Master Key has been set. Requests to Meilisearch won’t be authorized unless you provide an authentication key.

Now we’ll need to fetch the default admin API key. Here’s the curl request to fetch keys. I recommend saving the query in Postman or Insomnia so you don’t have to keep looking it up in the future.

curl --location --request GET 'http://localhost:7700/keys' \
--header 'Authorization: Bearer '




Drop the default admin API key into `MEILISEARCH_API_KEY` in your Rails .env file and set `MEILISEARCH_HOST` to the same thing you set it to in .msenv so that’s available on the Rails side as well. Time to write your Meilisearch initializer file! You can tune timeouts and retries while you’re at it.

MeiliSearch::Rails.configuration = {
meilisearch_host: ENV['MEILISEARCH_HOST'],
meilisearch_api_key: ENV['MEILISEARCH_API_KEY'],
timeout: 1,
max_retries: 2
}




Restart everything to pick up the environment changes and you should now be able to reindex a model in terms of permissions. But first we need a model to reindex.

## Part Deux: Rails Integration

This is where my path and yours differ, but I’ll provide an example model integration. Because ScribeHub has many searchable resources, I wrote a concern. schema_searchable.rb:

module SchemaSearchable
extend ActiveSupport::Concern
included do
include MeiliSearch::Rails
extend Pagy::Meilisearch
end
module ClassMethods
def trigger_sidekiq_job(record, remove)
MeilisearchEnqueueWorker.perform_async(record.class.name, record.id, remove)
end
end
end




This DRYed things more with Elasticsearch but I’ll take all the code reduction I can get. Now you can drop `include SchemaSearchable` into any searchable model. Here’s an example of additions to our GlossaryTerm model:

include SchemaSearchable
after_touch :index!

meilisearch enqueue: :trigger_sidekiq_job, per_environment: true, primary_id: :ms_id do
attributes [:account_id, :id, :term, :definition, :updated]
attribute :updated do
updated_at.to_i
end
filterable_attributes [:account_id]
end

def ms_id
"gt_#{account_id}_#{id}"
end




Note that Meilisearch does not have a data type for Ruby or Rails date time objects, so we’re converting it to Unix epoch with `to_i`. `after_touch :index!` keeps your index up to date when the model changes. `per_environment: true` will ensure you’re not polluting your development indexes with test data. `enqueue` will run index updates in the background per the method defined in schema_searchable.rb — but we still need that worker. Here is meilisearch_enqueue_worker.rb:

class MeilisearchEnqueueWorker
include Sidekiq::Worker
def perform(klass, record_id, remove)
if remove
klass.constantize.index.delete_document(record_id)
else
klass.constantize.find(record_id).index!
end
end
end




If you’re able to start a fresh Rails console and run `Model.reindex!` without error, then you’re ready to edit your index action in the controller. Right now using the [active pagy search method](https://ddnexus.github.io/pagy/extras/meilisearch#active-mode) without creating an N+1 query means we need both `pagy_meilisearch` and `pagy_search` like so:

def index
@pagy, @glossary_terms = pagy_meilisearch(
GlossaryTerm.includes(GlossaryTerm.search_includes).pagy_search(
params[:q],
**{
filter: "account_id = #{current_account.id}"
}
)
)
end




The `search_includes` method on GlossaryTerm is just a list of associations needed to avoid N+1 queries. I like keeping that in the model:

def self.search_includes
%i(
user
)
end




Assembling the filter string can get tricky compared to Elasticsearch due to it being a string instead of a hash but it lets you assemble the logic with as many `AND` and `OR`’s as your heart desires. For things like filtering by tags with AND logic, you’ll need to do something like this:

filter = "discarded=false"
if @conditions.key?(:tags)
@conditions[:tags].each do |tag|
filter += " AND tags='#{tag}'"
end
end




In this case `@conditionals` is a hash which is populated by processing the query to extract things like tags and sort keys. The documentation has some [helpful notes about combining logic](https://docs.meilisearch.com/learn/advanced/filtering_and_faceted_search.html#filter-expressions).

Fixing up the tests should be all that remains and it’s pretty much just changing `index` for `index!` and `search_index.delete` for `clear_index!`. It was very cool seeing the tests pass again after such minimal test fixing.

Hope you enjoyed! We certainly did here at [ScribeHub](https://scribehub.com) and we eagerly await multi-index searching 😉.

Top comments (3)

Paweł Świątkowski • Jun 25 '22

Funny thing: I user ElasticSearch a lot, but mainly not for search. It's a huge beast with a large overhead (also on devops side), so for use cases like simple search, it nice to see an alternative. I will try Meilisearch for sure.

However, it would be interesting to see how the comparison looks for larger datasets. Because let's be honest, 32k records does not even justify leaving pg_search ;)

Archonic • Jun 29 '22

People have built amazing things with the ELK stack for sure. The AI threat detection stuff is especially impressive.

Does pg_search have typo tolerance or synonyms? It certainly can't be beat in terms of simple devops. I look forward to one day writing an app that doesn't need more than pg_search.

Paweł Świątkowski • Jun 29 '22

Good points. You can do fuzzy searching to some extent in PostrgeSQL using trigrams: freecodecamp.org/news/fuzzy-string... But not sure about synonyms.

DEV Community

Swapping Elasticsearch for Meilisearch in Rails feat. Docker

Part Zero: But Why?

Top comments (3)

Read next

Differentiating Docker stop vs Kill

How to Set Up a Local Docker Environment with Rust and Rocket

Docker — Advanced Interview Questions

Setting Up a VPS Server with Docker, Nginx Proxy Manager, and Portainer