Molly Struve (she/her)

Posted on Mar 3, 2020

Getting Started with Elasticsearch and Ruby

#elasticsearch #beginners #ruby #tutorial

Recently, DEV has started the migration from Algolia to Elasticsearch. Since I am often asked what is the best way to get started with Elasticsearch, I figured I would share how we have been making the switch. Hopefully, you can use this post as a template if you decide to implement Elasticsearch in your Rails or Ruby app in the future.

Before I get started I want to preface this post by saying that this article assumes you understand the basics of Elasticsearch. You should be familiar with the terms index, mappings, and documents since we will be covering those. If you need a refresher or want to learn about how Elasticsearch works I highly recommend the Elastic docs!

1) Install Elasticsearch

Well, it isn't quite that easy . Before you start hacking away at your code you need to get Elasticsearch up and running so you can talk to it. There are a million different ways to do this depending on your environment so I am going to point you towards the Installing Elasticsearch docs for getting started.

Many of us at DEV use Macs and ended up installing from archive since the Homebrew install seemed to be broken for the majority of us. Once you have Elasticsearch up and running the next step is to get your code talking to it.

2) Install the Elasticsearch Ruby gem

Related Pull Request

The Elasticsearch ruby gem installs just like any other gem, all you have to do is add a line to your Gemfile.



gem "elasticsearch", "~> 7.4"

One important thing to note is what version of Elasticsearch you are planning on using. The gem versions are numberered to match the Elasticsearch versions. If you are on Elasticsearch version 5 then you will want to use the latest version 5 release of the gem.

Another thing you might notice in the pull request that I reference above is that we also installed the Typhoeus gem.



gem "typhoeus", "~> 1.3.1"

The Elasticsearch gem docs suggest using an HTTP library such as Typhoeus for optimal performance because it supports persistent ("keep-alive") connections.

Once the gem has been successfully installed then you need to create a client within your code to talk to Elasticsearch. We choose to do this through an initializer file, config/initializers/elasticsearch.rb and it looks like this.



require "elasticsearch"

SearchClient = Elasticsearch::Client.new(
  url: ApplicationConfig["ELASTICSEARCH_URL"],
  retry_on_failure: 5,
  request_timeout: 30,
  adapter: :typhoeus,
  log: Rails.env.development?,
)

Let's go over the arguments we are passing in here.

url: (required) We are passing the client a URL param. You communicate to Elasticsearch via HTTP so you need a URL that your client can use to make requests to. In development, by default, this will be http://localhost:9200

The rest of the arguments are optional.

retry_on_failure: The number of times the client will retry before it gives up
request_timeout: Sets the time limit for a request to get a response. Any request that takes over 30 seconds to respond will timeout.
adapter: The HTTP library in ruby we want to use to help us make these requests. As stated above, ideally you want to use Typhoeus because of its support for Keep Alive connections.
log: Determines whether your client is outputting logs for each request you are making.

There are many other options you can pass to your client but these are the basic ones that we use. At this point, some people might be inclined to start writing code to throw things in Elasticsearch. I'm not one of those people.

Whenever I add a new external dependency like a database I like to deploy the interface for using it, in this case, the gem, by itself. This way you can deploy and then jump into a console and make sure everything is hooked up correctly before you start using it in your code. If there are any configuration tweaks that need to be made then you can make those without having to worry about the code breaking.

To validate that you have the cluster hooked up correctly you can jump into a Rails console and issue this command with your new SearchClient:



[1] pry(main)> SearchClient.info
ETHON: Libcurl initialized
ETHON: performed EASY effective_url=http://localhost:9200/ response_code=200 return_code=ok total_time=0.392646
=> {"name"=>"mollys_computer",
 "cluster_name"=>"elasticsearch",
 "cluster_uuid"=>"123abc456",
 "version"=>
  {"number"=>"7.5.2",
   "build_flavor"=>"default",
   "build_type"=>"tar",
   "build_hash"=>"8bec50e1e0ad29dad5653712cf3bb580cd1afcdf",
   "build_date"=>"2020-01-15T12:11:52.313576Z",
   "build_snapshot"=>false,
   "lucene_version"=>"8.3.0",
   "minimum_wire_compatibility_version"=>"6.8.0",
   "minimum_index_compatibility_version"=>"6.0.0-beta1"},
 "tagline"=>"You Know, for Search"}

If you get a 200 response back like the one above then you know everything is configured correctly. With the gem setup correctly the next step is to start using Elasticsearch, and we are going to do that by making our first index!

2) Setting Up the Tag Index

Related Pull Request

For this example, I am going to show you how we set up our very simple Tag index. The capabilities of Elasticsearch are tremendous but I want to keep it simple with this example so you have a good base to get you started.

To start, we need to do a couple of different things. First, we need to create our index.



index_settings = { number_of_shards: 1, number_of_replicas: 0 }
settings = { settings: { index: index_settings } }
SearchClient.indices.create(index: "tag_development", body: settings)

Here, we are creating a simple index with 1 shard and 0 replicas. In development, you will often only have a single node, so keeping indexes to a single shard is usually the way to go. However, in production, depending on your data size and number of requests you are making, you may want more shards for your index.

You can run the above command in a console to see it in action. A successful response will look like this:



[37] pry(main)> SearchClient.indices.create(index: "molly", body: settings)
ETHON: performed EASY effective_url=http://localhost:9200/molly response_code=200 return_code=ok total_time=0.65619
2020-02-24 16:00:54 -0500: PUT http://localhost:9200/molly [status:200, request:0.660s, query:n/a]
{"acknowledged":true,"shards_acknowledged":true,"index":"molly"}

Once your index is created, the next thing you will need to do is define your mappings. This is where you will define the fields you want to search for.

I HIGHLY suggest when you are working with Elasticsearch for integrated search within an application that you set your mapping dynamic value to strict. Setting the value to strict means that if you try to index a field that is not in your mappings Elasticsearch will raise an error. When doing integrated search you want to keep your documents lean and mean and this ensures that you don't end up with any surprise fields from possible indexing bugs.

Below are the mappings for our tags index.



{
  "dynamic": "strict",
  "properties": {
    "id": {
      "type": "keyword" 
    },
    "name": {
      "type": "text",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    },
    "hotness_score": {
      "type": "integer"
    },
    "supported": {
      "type": "boolean"
    },
    "short_summary": {
      "type": "text"
    },
    "rules_html": {
      "type": "text"
    }
  }
}

Before I move on, I want to point out a couple of things here. You probably noticed that we are mapping our id field as a keyword rather than an integer. This is because keywords are optimized for terms queries which is what we will be doing with our ID field. However, for a field like hotness_score, we want to use an integer because we will be searching that using range queries with things like greater or less than.

Another thing you will notice is that name has two types. The text datatype means that we will analyze the field and break it up into tokens to make it easier to full-text search. The keyword datatype is viewed by calling name.raw. Our raw field is storing the name as is, in one complete string. Having two field types allows us to search the tokens of the tag name or the entire name itself.

Ok, now that you understand a little bit about our mappings, lets talk about how we apply them to our newly created index. To keep our linters happy we have the mappings stored in a JSON file and then we import them into our Ruby file like so:



MAPPINGS = JSON.parse(File.read("config/elasticsearch/mappings/tags.json"), symbolize_names: true).freeze

Once we have the mappings set, the next step is to apply them to the new index we just created. You can do this by executing the code below



SearchClient.indices.put_mapping(index: "tags_development", body: MAPPINGS)

If the request is successful you should get a response like this



[38] pry(main)> SearchClient.indices.put_mapping(index: "tags_development", body: MAPPINGS)
ETHON: performed EASY effective_url=http://localhost:9200/tags_development/_mapping response_code=200 return_code=ok total_time=0.079915
2020-02-24 16:45:56 -0500: PUT http://localhost:9200/tag_development/_mapping [status:200, request:0.095s, query:n/a]
2020-02-24 16:45:56 -0500: > {"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"text","fields":{"raw":{"type":"keyword"}}},"hotness_score":{"type":"integer"},"supported":{"type":"boolean"},"short_summary":{"type":"text"},"rules_html":{"type":"text"}}}
2020-02-24 16:45:56 -0500: < {"acknowledged":true}

Even though you got a 200 response back, you might still want to double-check that your index was created correctly. Once again, you can do this in a console like so:



[2] pry(main)> SearchClient.indices.get(index: "tags_development")
ETHON: performed EASY effective_url=http://localhost:9200/tag_development response_code=200 return_code=ok total_time=0.048122
=> {"tags_development"=>
   "mappings"=>
    {"dynamic"=>"strict",
     "properties"=>
      {"hotness_score"=>{"type"=>"integer"},
       "id"=>{"type"=>"keyword"},
       "name"=>{"type"=>"text", "fields"=>{"raw"=>{"type"=>"keyword"}}},
       "rules_html"=>{"type"=>"text"},
       "short_summary"=>{"type"=>"text"},
       "supported"=>{"type"=>"boolean"}}},
   "settings"=>
    {"index"=>
      {"creation_date"=>"1581527116462", "number_of_shards"=>"1", "number_of_replicas"=>"0", "uuid"=>"kO-MGUiFSJObSMY_22mrzg", "version"=>{"created"=>"7050299"}, "provided_name"=>"tag_development"}}}}

Now that we have verified that our index is created and has the proper mappings, it's time to start filling it with data!

3) Indexing a Tag Document

Related Pull Request

Before we can send data to Elasticsearch, we first have to get it in the proper format by serializing it. To handle serializing our ActiveRecord model we use the Fast JSON API serializer.



module Search
  class TagSerializer
    include FastJsonapi::ObjectSerializer

    attributes :id, :name, :hotness_score, :supported, :short_summary, :rules_html
  end
end

Once you have a way to serialize your model data, then all that is left to do is make the request to send it to Elasticsearch. Here is how we do that with our SearchClient:



tag = Tag.find(id)
serialized_data = Search::TagSerializer.new(tag).serializable_hash.dig(:data, :attributes)
SearchClient.index(id: tag.id, index: "tags_development", body: serialized_data)

Here is what a successful response to the index request above will look like:



{"_index"=>"tags_development", "_type"=>"_doc", "_id"=>"39", "_version"=>10, "result"=>"created", "_shards"=>{"total"=>1, "successful"=>1, "failed"=>0}, "_seq_no"=>351, "_primary_term"=>3}

Another way we can validate that our indexing worked correctly, is by asking Elasticsearch for the tag document using a GET request.



SearchClient.get(id: tag.id, index: "tags_development")

The above request will give you a response containing all of your tag data in the _source param of the response hash.



{"_index"=>"tags_development",
 "_type"=>"_doc",
 "_id"=>"39",
 "_version"=>10,
 "_seq_no"=>351,
 "_primary_term"=>3,
 "found"=>true,
 "_source"=>
  {"id"=>39,
   "name"=>"coolbean",
   "hotness_score"=>4,
   "supported"=>false,
   "short_summary"=>nil,
   "rules_html"=>""}}

Now that our index is set up and we have data in it, it's time for the best part.

4) Searching Tags

Related Pull Request

For this search example, I am only going to show you how to set up a query string search. However, search is where Elasticsearch(obviously) really shines, so I highly encourage you to checkout the search docs they have and explore all of the possibilities.

Let's say we want to search for all tags who have a name that starts with "python" AND we want to sort them by hotness_score. Here is how we would do that:



SearchClient.search(
  index: "tags_development",
  body: {
    query: {
      query_string: {
        query: "name:python*",
        analyze_wildcard: true,
        allow_leading_wildcard: false
      }
    },
    sort: { hotness_score: "desc" }
  }
)

This request is running a basic query, python*, on the name field in our index. We have also added a wildcard character, *, to indicate that we want all tags that have a name that starts with python. When you run that query you are going to get a result that looks like this:



=> {"took"=>251,
 "timed_out"=>false,
 "_shards"=>{"total"=>1, "successful"=>1, "skipped"=>0, "failed"=>0},
 "hits"=>
  {"total"=>{"value"=>3, "relation"=>"eq"},
   "max_score"=>nil,
   "hits"=>
    [{"_index"=>"tags_development",
      "_type"=>"_doc",
      "_id"=>"10",
      "_score"=>nil,
      "_source"=>{"id"=>10, "name"=>"python", "hotness_score"=>2, "supported"=>true, "short_summary"=>nil, "rules_html"=>nil},
      "sort"=>[2]},
     {"_index"=>"tags_development",
      "_type"=>"_doc",
      "_id"=>"40",
      "_score"=>nil,
      "_source"=>{"id"=>40, "name"=>"PythonBeginners", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},
      "sort"=>[0]},
     {"_index"=>"tags_development",
      "_type"=>"_doc",
      "_id"=>"41",
      "_score"=>nil,
      "_source"=>{"id"=>41, "name"=>"PythonExpert", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},
      "sort"=>[0]}]}}

BOOM! We just ran our first Elasticsearch query! The last thing we need to do is dig out the document hits, aka tags, from our response.



results = SearchClient.search(...)

results = search(query_string)

  results.dig("hits", "hits").map { |tag_doc| tag_doc.dig("_source") }

end

=> [{"id"=>10, "name"=>"python", "hotness_score"=>2, "supported"=>true, "short_summary"=>nil, "rules_html"=>nil},

    {"id"=>40, "name"=>"PythonBeginners", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},

    {"id"=>41, "name"=>"PythonExpert", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil}]

Your turn!

Now that you have all of the pieces, it is time for you to go out and start integrating Elasticsearch into your own Ruby or Rails application. Let me know if you have any questions. Happy Searching! 😃

PS I've been on a Schitt's Creek binge lately, your welcome for all the GIFs

Top comments (12)

aRtoo • Mar 3 '20

Nice article ma'am awesome as always. Question though. So after getting the results, you will still have to query in your own database so you can get the full information needed? like the comments of the posts, the owner, users who liked, and some metadata? thank you again. :)

Molly Struve (she/her) • Mar 3 '20

If you want other data associated with tags then yes. Or you store some of that data in Elasticsearch in another index and fetch it that way.

Steven Torrence • Mar 4 '20

Hey, Molly! Thanks for writing such a great article/walkthrough. I saw there are gems that act as a wrapper for elastic search like Searchkick.

How do things like that differ from the implementation you’ve shown above? Are there performance benefits to doing it either way?

Molly Struve (she/her) • Mar 4 '20

The benefit of using the plain ruby wrapper is that you have much more control over what and how you are searching. The more you abstract the Elasticsearch interactions away like with Searchkick the less control you have. The trade-off is that it can be very easy to get up and running quickly with minimal understanding of Elasticsearch itself.

Steven Torrence • Mar 5 '20

I see. Makes total sense. Thanks for the quick reply!

Joe Zack • Mar 3 '20

Yay for Elasticsearch! We run with docker locally and it makes life easier for us, especially for upgrades and it's nice to be able to wipe our volumes to start with a clean slate.

Great write up, as always!

Antonio Juan Querol • Nov 8 '23

Gracias!! for this great tutorial, only one point for the part of installing the elasticsearch iI want to suggest to use testcontainers ruby github.com/testcontainers/testcont...

Cory McDonald • Aug 7 '20

Thank you so much for writing this! I'm leveraging a lot of what Forem is doing for Brave's creators site. Plus the docs written in the repo are great. 😄

Quentin de Quelen • Mar 5 '20

Hi Molly! Thanks for this article. I would like to know what made you decide to switch from Algolia to Elastic? Is that the price? Or just because you love Elastic in general?

Molly Struve (she/her) • Mar 5 '20

Price
More control over creating indexes and queries
Elasticsearch is open-source which makes it accessible to everyone and contributes to our efforts in making the DEV platform completely open-source

Quentin de Quelen • Mar 5 '20

Cool! As expected. I was wondering because I'm kind of in the business.

I'm working on an Algolia open-source alternative. It's free, self-hosted and built-in user-facing search. I'm putting the link here just in case. github.com/meilisearch/meilisearch