DEV Community 👩‍💻👨‍💻

Archonic
Archonic

Posted on

Swapping Elasticsearch for Meilisearch in Rails feat. Docker

A wise move for apps with simple search needs

Elasticsearch is a comprehensive and highly configurable search engine and storage system for a multitude of app concerns. In this article we’re only going to be comparing it’s search engine capabilities within the context of a Dockerized Ruby on Rails app. If your app has a need for specifically weighted attribute boosting, results that get better with machine learning, mature highly available sharding capabilities, or multi-index searching, Elasticsearch is still what you want.

If your search needs are somewhere between pg_search/ransack and Elasticsearch, Meilisearch is a new contender which is blazing fast (<50ms), much more resource efficient, has a sensible default configuration, a first-party Ruby library and Rails gem and an admin panel to try out searching before fully integrating within your app. With full text search, synonyms, typo-tolerance, stop words and customizable relevancy rules, Meilisearch has enough features to satisfy most applications — and that’s before their v1.0 release 👏. Multi-index searching is also on the roadmap.

Part Zero: But Why?

Why go through the pain of switching? Performance and resource efficiency!

First lets compare Elasticsearch and Meilisearch on the item you’re probably here to learn about — resource usage. Memory on the cloud is expensive and Elasticsearch is a known memory hog. On my Rails app which has fairly low usage, it’s using 3.5GB. That’s 2.7GB more than the next highest container which is Rails web workers running malloc instead of jemalloc (a topic for a different article!).

So how much more efficient is Meilisearch? Let’s get a baseline with Elasticsearch first. We’ll be using this movie database with ~32k rows.

I have to note here that Elasticsearch took a lot more time to set up. It initially refused to start up because it needed more memory than the OS would allow it to allocate just to start. That limit needed to be expanded with sysctl -w vm.max_map_count=262144. Then the JSON file needed a fair amount of manipulation because the bulk JSON API expects you to specify the index for every row. This wasn’t evident in the documentation and an ancient StackOverflow answer came to my rescue.

docker network create elastic
docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.2.3
curl --location --request POST 'https://localhost:9200/movies/_bulk/' \
--header 'Content-Type: application/x-ndjson' \
--header 'Authorization: Basic ---' \
--data-binary '@movies.json'
Enter fullscreen mode Exit fullscreen mode

docker stats reports that Elasticsearch is using 5.2GB of memory. Adding the movies to the index did not increase this — it uses 5.2GB by default with no data. You can of course set ES_JAVA_OPTS and get that down. Even small apps however risk container evictions due to memory pressure when doing that. This was the main motivator for me to check out Meilisearch.

Now let’s do the same thing with Meilisearch. It was quite a bit easier to setup and the bulk import was a breeze.

docker run --rm -p 7700:7700 -v "$(pwd)/meili_data:/meili_data" getmeili/meilisearch
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
  --header 'content-type: application/json' \
  --data-binary @movies.json
Enter fullscreen mode Exit fullscreen mode

Letting Meilisearch run for a few minutes, the memory usage actually halved down to 96.7MB.

Now let’s run a simple comparison benchmark. We’ll run 100 iterations of q=batman&limit=10 for Meilisearch and ?q=batman&size=10 for Elasticsearch.

Elasticsearch: 9.68ms average, 15ms peak.
Meilisearch: 5.17ms average. 11ms peak.

Meilisearch used 54.8x less memory and was 46.6% faster than Elasticsearch with the same data and the same queries.

That’s a lot faster and a lot easier to host.

The image is also 36MB instead of 1.2GB — nice. Note that this is specifically a comparison of default configurations. What’s more is Meilisearch has an interface at localhost:7700 so we don’t even need to open Postman to poke around (sorry, no filtering or sorting on the admin interface at the moment).

Convinced? Ok read on and I’ll show you what switching from Elasticsearch to Meilisearch looked like for a real production app — ScribeHub. We also moved from Ankane’s excellent Searchkick gem to the first party meilisearch-rails gem and I’ll show you the changes there as well.

Part One: DevOps

Begin by replacing your Elasticsearch container with a Meilisearch container in your docker-compose.yml:

meilisearch:
  image: getmeili/meilisearch:v0.27.0
  user: root
  ports:
    - "7700:7700"
  volumes:
    - "meili:/meili_data/"
  env_file:
    - .msenv
...
volumes:
  meili:
Enter fullscreen mode Exit fullscreen mode

The first big difference is authentication. Meilisearch supports a direct front-end integration which doesn’t even touch Rails (neat!). That means if a master key is set, it will generate default keys with specific permissions on start up. If you’re just trying MS out locally, I recommend not setting the master key so that it will allow unauthenticated requests. If you intend to ship to production, I’d recommend setting the master key to ensure you understand how that works before you’re launching. We won’t be going into front-end only implementations in this article — we’re just going to focus on the ES to MS migration.

Something that almost made me give up right at the beginning was that the MS service will roll the keys if there is any change to it’s environment file. I kept dropping the default admin key into a common .env file which would roll the keys again and I would get auth errors when trying to reindex. It’s supposed to roll the keys if there’s a change to the master key, but rolling the keys on any change to the env file means you should have a separate env file for the MS service. I called it ‘.msenv’ as you can see above. I’ve seen it roll the keys even when there was no change to it’s own env file but that was a result of not mounting to the /meili_data directory.

If you’re setting a master key, run SecureRandom.hex 32 from a Rails console and drop that into MEILI_MASTER_KEY in your .msenv file. You can also set the host and turn off anonymous analytics while you’re at it, which I personally think should default to disabled. Here’s my example .msenv:

# WARNING
# Every time any change is made to this file, Meilisearch will regenerate keys.
# That will invalidate current keys and make you sad.
MEILISEARCH_HOST=http://meilisearch:7700
MEILI_MASTER_KEY=<YOUR MASTER KEY>
MEILI_NO_ANALYTICS=true
Enter fullscreen mode Exit fullscreen mode

Run docker-compose up and you should see this in the MS start up output:

A Master Key has been set. Requests to Meilisearch won’t be authorized unless you provide an authentication key.

Now we’ll need to fetch the default admin API key. Here’s the curl request to fetch keys. I recommend saving the query in Postman or Insomnia so you don’t have to keep looking it up in the future.

curl --location --request GET 'http://localhost:7700/keys' \
--header 'Authorization: Bearer <YOUR MASTER KEY>'
Enter fullscreen mode Exit fullscreen mode

Drop the default admin API key into MEILISEARCH_API_KEY in your Rails .env file and set MEILISEARCH_HOST to the same thing you set it to in .msenv so that’s available on the Rails side as well. Time to write your Meilisearch initializer file! You can tune timeouts and retries while you’re at it.

MeiliSearch::Rails.configuration = {
  meilisearch_host: ENV['MEILISEARCH_HOST'],
  meilisearch_api_key: ENV['MEILISEARCH_API_KEY'],
  timeout: 1,
  max_retries: 2
}
Enter fullscreen mode Exit fullscreen mode

Restart everything to pick up the environment changes and you should now be able to reindex a model in terms of permissions. But first we need a model to reindex.

Part Deux: Rails Integration

This is where my path and yours differ, but I’ll provide an example model integration. Because ScribeHub has many searchable resources, I wrote a concern. schema_searchable.rb:

module SchemaSearchable
  extend ActiveSupport::Concern
  included do
    include MeiliSearch::Rails
    extend Pagy::Meilisearch
  end
  module ClassMethods
    def trigger_sidekiq_job(record, remove)
      MeilisearchEnqueueWorker.perform_async(record.class.name, record.id, remove)
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

This DRYed things more with Elasticsearch but I’ll take all the code reduction I can get. Now you can drop include SchemaSearchable into any searchable model. Here’s an example of additions to our GlossaryTerm model:

include SchemaSearchable
after_touch :index!

meilisearch enqueue: :trigger_sidekiq_job, per_environment: true, primary_id: :ms_id do
  attributes [:account_id, :id, :term, :definition, :updated]
  attribute :updated do
    updated_at.to_i
  end
  filterable_attributes [:account_id]
end

def ms_id
  "gt_#{account_id}_#{id}"
end
Enter fullscreen mode Exit fullscreen mode

Note that Meilisearch does not have a data type for Ruby or Rails date time objects, so we’re converting it to Unix epoch with to_i. after_touch :index! keeps your index up to date when the model changes. per_environment: true will ensure you’re not polluting your development indexes with test data. enqueue will run index updates in the background per the method defined in schema_searchable.rb — but we still need that worker. Here is meilisearch_enqueue_worker.rb:

class MeilisearchEnqueueWorker
  include Sidekiq::Worker
  def perform(klass, record_id, remove)
    if remove
      klass.constantize.index.delete_document(record_id)
    else
      klass.constantize.find(record_id).index!
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

If you’re able to start a fresh Rails console and run Model.reindex! without error, then you’re ready to edit your index action in the controller. Right now using the active pagy search method without creating an N+1 query means we need both pagy_meilisearch and pagy_search like so:

def index
  @pagy, @glossary_terms = pagy_meilisearch(
    GlossaryTerm.includes(GlossaryTerm.search_includes).pagy_search(
      params[:q],
      **{
        filter: "account_id = #{current_account.id}"
      }
    )
  )
end
Enter fullscreen mode Exit fullscreen mode

The search_includes method on GlossaryTerm is just a list of associations needed to avoid N+1 queries. I like keeping that in the model:

def self.search_includes
  %i(
    user
  )
end
Enter fullscreen mode Exit fullscreen mode

Assembling the filter string can get tricky compared to Elasticsearch due to it being a string instead of a hash but it lets you assemble the logic with as many AND and OR’s as your heart desires. For things like filtering by tags with AND logic, you’ll need to do something like this:

filter = "discarded=false"
if @conditions.key?(:tags)
  @conditions[:tags].each do |tag|
    filter += " AND tags='#{tag}'"
  end
end
Enter fullscreen mode Exit fullscreen mode

In this case @conditionals a hash which is populated by processing the query to extract things like tags and sort for ordering. The documentation has some helpful notes about combining logic.

Fixing up the tests should be all that remains and it’s pretty much just changing index for index! and search_index.delete for clear_index!. It was very cool seeing the tests pass again after such minimal test fixing.

Hope you enjoyed! We certainly did here at ScribeHub and we eagerly await multi-index searching 😉.

Top comments (3)

Collapse
katafrakt profile image
Paweł Świątkowski

Funny thing: I user ElasticSearch a lot, but mainly not for search. It's a huge beast with a large overhead (also on devops side), so for use cases like simple search, it nice to see an alternative. I will try Meilisearch for sure.

However, it would be interesting to see how the comparison looks for larger datasets. Because let's be honest, 32k records does not even justify leaving pg_search ;)

Collapse
archonic profile image
Archonic Author

People have built amazing things with the ELK stack for sure. The AI threat detection stuff is especially impressive.

Does pg_search have typo tolerance or synonyms? It certainly can't be beat in terms of simple devops. I look forward to one day writing an app that doesn't need more than pg_search.

Collapse
katafrakt profile image
Paweł Świątkowski

Good points. You can do fuzzy searching to some extent in PostrgeSQL using trigrams: freecodecamp.org/news/fuzzy-string... But not sure about synonyms.

🌚 Friends don't let friends browse without dark mode.

Sorry, it's true.