DEV Community

Cover image for The one with Elasticsearch and Spring Boot
Gent Aliti
Gent Aliti

Posted on

The one with Elasticsearch and Spring Boot

This article aims to give you an introduction to Elasticsearch.
First we will go through explanation of basic concepts, then we will build a simple Spring Boot application to see how easily ES can be integrated using Spring Data Elasticsearch.

As a prerequisite for this post you’ll need Docker, Java and Maven installed.

You know, for search…

Elasticsearch is an open source distributed search engine and aims to make full-text search easy by hiding the complexities of Apache Lucene behind a simple Rest API. Instead of storing data in rows and columns Elasticsearch stores data in documents as JSON.

Near real time

There is a slight latency between the time you index a document until the time it becomes searchable.

Clusters and Nodes

Elasticsearch is distributed by nature. It can run on multiple nodes (servers) within a cluster. Indexing and searching is done in parallel by all the nodes.

Documents

I mentioned earlier that in ES data is stored in documents. For example, if you’re developing an e-commerce system, you would have a document for product, one for ratings, comments etc.

Shards and Replicas

In one index you can store large amount of data which may exceed a node’s disk size. This is why ES subdivides your index into multiple pieces called shards. When you create an index, you can define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster. Each Elasticsearch shard is a Lucene index.
As a failover mechanism in case a shard/node goes offline, Elasticsearch provides the opportunity to make one or more copies of your index’s shards into replica shards.

chandler-too-much-information

You don’t need to be an expert in sharding, cluster discovery, or dozens of other distributed concepts. Elasticsearch can happily run on a single node living inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the same way ;).

Now let’s get our hands dirty

With all basic concepts explained now it’s time to run elastic and sample data.
Run Elasticsearch in a docker container:

docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.4.3

Load sample data

Download data using this link.

We can use Elasticsearch’s bulk api to load the data:

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json

Verify if everything is ok:


$ curl -X GET "localhost:9200/_cat/indices?v&pretty"

health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank  MrlzeXq9S-2Z61zKNLEIIA   5   1       1000            0     95.5kb         95.5kb

Now we have indexed 1000 accounts with the following structure:

{
    "account_number": INT,
    "balance": INT,
    "firstname": "String",
    "lastname": "String",
    "age": INT,
    "gender": "M or F",
    "address": "String",
    "employer": "String",
    "email": "String",
    "city": "String",
    "state": "String"
}

Query DSL

Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries.

You can read more about query DSL in Elasticsearch docs.

Spring Data Elasticsearch

Spring Data Elasticsearch project provides integration with the Elasticsearch search engine.

You can clone gentaliti/elasticsearch where I implemented a search functionality from the data we just loaded in ES.

First add Spring Data Elasticsearch dependency:

<dependency>            
   <groupId>org.springframework.boot</groupId>                
   <artifactId>spring-boot-starter-data-elasticsearch</artifactId>        </dependency>

Then you would need to add @EnableElasticsearchRepositories in one of your configuration classes and edit application.properties with your configuration values.

There are two type of parameters by which we can perform filtering and search, query and filters.

Querying

For the query parameter we are using MultiMatchQuery, which allows us to search for a given string in all of our fields.

MultiMatchQueryBuilder multiMatchQuery = QueryBuilders.multiMatchQuery(query);
searchQuery.must(multiMatchQuery);

Filtering

We can use filter parameter to define multiple filters for our data, by separating them with “;”. This logic is handled in SearchController:filter method, which converts the filter string into a list of filters.
Then from that list of filters we build multiple bool queries and add them to the main bool query.

private void prepareFilters(BoolQueryBuilder searchQuery, List<Filter> filters) {
        if (filters == null) {
            return;
        }
        filters.stream().collect(Collectors.groupingBy(Filter::getKey)).forEach((key, values) -> {
            BoolQueryBuilder bool = QueryBuilders.boolQuery();
            values.forEach(value -> bool.should(QueryBuilders.matchQuery(key, value.getValue())));
            searchQuery.must(bool);
        });
    }

I am using here “should” (equivalent of OR) for filters corresponding to the same field and “must” (equivalent of AND) for combining them with other filters.

Let's give it a try

I am not going to cover here how to run the project, as that is explained in repo’s readme.
After you run the Spring Boot application you can send http requests to the only endpoint available, where you can search and filter data.
For example an http request like this:

curl -X GET \
 'http://localhost:8080/accounts?q=rockwell&filter=city:nicholson;city:shaft'

is the equivalent of writing this query:

{  
   "query":{  
      "bool":{  
         "must":[  
            {  
               "multi_match":{  
                  "query":"rockwell"
               }
            },
            {  
               "bool":{  
                  "should":[  
                     {  
                        "match":{  
                           "city":"nicholson"
                        }
                     },
                     {  
                        "match":{  
                           "city":"shaft"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

This finds all accounts which contain rockwell in one of it’s fields and it’s from Nicholson or Shaft.

There is only one account matching our query:

{
    "content": [
        {
            "id": "99",
            "account_number": 99,
            "balance": 47159,
            "first_name": null,
            "last_name": null,
            "address": "806 Rockwell Place",
            "email": "ratliffheath@zappix.com",
            "city": "Shaft",
            "state": "ND",
            "age": 39
        }
    ],
    "pageable": {
        "sort": {
            "sorted": false,
            "unsorted": true,
            "empty": true
        },
        "offset": 0,
        "pageSize": 10,
        "pageNumber": 0,
        "paged": true,
        "unpaged": false
    },
    "facets": [],
    "aggregations": null,
    "scrollId": null,
    "maxScore": 13.007463,
    "totalElements": 1,
    "totalPages": 1,
    "size": 10,
    "first": true,
    "numberOfElements": 1,
    "last": true,
    "number": 0,
    "sort": {
        "sorted": false,
        "unsorted": true,
        "empty": true
    },
    "empty": false
}

Wrapping up

We have covered here Elasticsearch basic concepts as well as a simple Spring Boot application for querying and filtering the data. This of course can be expanded more, but I think is a good example to using Spring Data Elasticsearch.

In the next posts I will write more about Query DSL, Mapping and Data Modelling.

Discussion (1)

Collapse
alafourcadedespaigne profile image
Alejandro Lafourcade Despaigne

search(org.springframework.data.elasticsearch.core.query.Query)' is deprecated