Sometimes the need for using an advanced searching mechanism such as ElasticSearch is not mandatory at the beginning of a project, but as it grows with time this need comes to the surface.
In order to satisfy this technical need and without altering the existing project’s architecture, we can implement a logic to synchronize the database data and index it in ElasticSearch and perform complex querying without impacting the performance of the database in production.
Let’s dive into it and explore how we can implement this …
Disclaimer: I will share only the important code snippets in order to keep the reading quick, you can check the link to the dedicated repo for this article on GitHub at the end.
Setting up the project
First of all, let's create a simple spring boot application with a MySQL database.
Tip: you can use Spring Initializr to set up the application's skeleton structure
In order to interface with ElasticSearch we need to do the following :
- Add the data-elasticsearch dependency to the pom.xml
Tip: the version of the engine depends on spring-jpa dependency used within the project (see the compatibility matrix)
2. Add the following properties to the application.properties
- _spring.data.elasticsearch.cluster-name: _the cluster name, CLUSTER_NAME is an environment variable that we use if we're running the application in a docker container
- spring.data.elasticsearch.cluster-nodes: the cluster nodes
- spring.data.elasticsearch.repositories.enabled: to enable repositories so we can use JPA interfaces with ElasticSearch.
3. Define the package's path where the elastic's repositories reside
In our example it's "com.example.springmysqlelastic.repo.elastic".
For demonstration purposes, we're going to create a single table in our database and wrap it in its own index. In a real project, we will have a much complex schema and indexes to create.
- Create a User entity with first name and last name attributes
- Create a model class to represent our User index within ElasticSearch
- Create a user repo and extend it from ElasticsearchRepository
We're done settings up the main components for our app.
Synchronize data with ElasticSearch
In order to synchronize our data from MySQL with the user index in ElasticSearch, we're gonna be using a Scheduler in which we will implement the synchronization logic.
First of all, we need to enable Spring's scheduled task execution capability by adding the EnableScheduling annotation :
After that and since we can't sync all the records from the database each time the scheduler is running, we will add a modification date attribute to our user entity and use the UpdateTimestamp annotation to delegate the updating process of its value to Hibernate each time the entity will be changed.
Next, we need to create our scheduler service.
In this example, the scheduler will run every 3 minutes, this period will change from a project to another depending on the importance of synched data.
Each time the main method of the scheduler is executed, we will sync our users.
In our syncUsers() method, we have two scenarios :
- When the index is empty at the first launch of the application, we will send all the user records to ElasticSearch
- Otherwise, we retrieve only the records that have been changed within the last INTERVAL_IN_MILLISECONDE starting from the time of the execution of the scheduler (as explained in the getModificationDatePredicate _method above)._
We're done with the synchronization, let's move on to the next part.
Querying data from ElasticSearch
We will start by adding a new rest Endpoint to take the search query as input and return the result as a ResultQuery object that will create later on.
After that, we will create a search service where we will build our custom request and retrieve data from ElasticSearch.
In our main function of the service _searchFromQuery() _we do the following :
- First, we build the body of the request, in our case, we will create a multi-index search using the query_string function and add the list of fields where we want the engine to look for the giving query.
The body of the request will look something like this:
The fields of the query-string are a static list in the code, but they can be parametrized in the application to have more control and flexibility.
- Second, we will execute the post request with the body that we created earlier and wrap the response from ElasticSearch in the ResultQuery
Our ResultQuery class contains three main attributes :
- timeTook: ElasticSearch response time in seconds
- numberOfResults: the number of total elements retrieved
- _elements: a _stringified JSON that represents the total hits found.
Testing the application
To run the application you can either do it locally or in a Docker container. (visit the README file for more info).
I added some records to the database and synced them to the user index.
Tip: you can visualize the index content using the ElasticSearch Head extension in Chrome/Mozilla browsers
As an example, let's look for the users that their first names or last names contain _"est" _:
_Voilà, _we've found 2 records that satisfy our criteria in 0.017 ms.
That's all folks!
Learn more by:
- Visiting the Github repo to see the full code of the application.
- Visiting the ElasticSearch documentation for more complex querying syntaxes.
- For Medium fans you can read the article there.
Top comments (0)