Searching apps, browsers, results found, etc; a huge percentage of apps have something related to finding detailed info in the database, in some cases with many requests per minute. This feature came from finding a word to entire rows of information.
Let’s pretend that we have a hypothetical user, called Bob, he has the requirement for a Star Wars application that allows him, to find info about his favorites character from star wars movies, searching by a complete word, Bob needs an application that has an effective time of response and consumes from different sources of data.
We’re talking about a considerable amount of content, including people, planets, starships, and any interesting topic about that movie. Let’s say we are the team behind that development. First of all, let’s clarify the original premise with the flow:
Basic flow app
As shown in the diagram below, there is one case with 3 possible scenarios:
- Bob searches for a word, the word is found on the API source, and the result list is returned
- Bob searches for a word, the word isn’t found on the API source but is found on the database source, and the result list is returned.
- Bob searches for a word, but the word isn’t found nor database source or API source, no content response is returned.
Bob has his desired application, at least in theory, with a global search around all possible content from his favorite movie with two different data sources.
Let’s bring some code to live, to test this design, I’ll show step by step every component.
Hands-on code
@RestController("/finder")
public class SearchRestController {
@Autowired
private SearchService searchService;
@GetMapping("/")
public ResponseEntity<List<ItemDTO>> searching(@RequestParam String word) {
Long start = System.currentTimeMillis();
Optional<List<ItemDTO>> results = searchService.getListBy(word);
System.out.println("Searching for " + (System.currentTimeMillis() - start) + " ms");
return ResponseEntity.of(results);
}
}
This is a simple controller with one method declared, searching
, with GetMapping
request, that receives a word through params
from request, calls to the SearchService method, and responds with the list of results founds.
@Component
public class SearchService {
@Autowired
private ApiClient apiClient;
@Autowired private StarWarsRepository starWarsRepository;
@Autowired private ObjectMapper mapper;
public Optional<List<ItemDTO>> getListBy(String word) {
return apiClient.findBy(word)
.or(() -> findAllByName(word));
}
private Optional<List<ItemDTO>> findAllByName(String word) {
List<Item> items = starWarsRepository.findAllByName(word);
if (items.isEmpty()) return Optional.empty();
List<ItemDTO> itemDTOS = new ArrayList<>();
items.forEach(item -> itemDTOS.add(mapper.convertValue(item, ItemDTO.class)));
return Optional.ofNullable(itemDTOS);
}
}
It receives a word from the controller and:
-
Call the API client.
@Component public class ApiClient { @Autowired private RestTemplate restTemplate; static final String API_URL = "https://swapi.dev/api/people"; public Optional<List<ItemDTO>> findBy(String word) { ResponseEntity<ResultsDTO> responseEntity = restTemplate .getForEntity(API_URL.concat("?search=").concat(word), ResultsDTO.class); if (Objects.requireNonNull(responseEntity.getBody()).getResults().isEmpty()) return Optional.empty(); return Optional.of(Objects.requireNonNull(responseEntity.getBody()).getResults()); } }
It executes a request to the StarWars Api with the search.
-
If no result has returned, it executes a call to the database repository.
@Repository public interface StarWarsRepository extends CrudRepository<Item, Long> { List<Item> findAllByName(String word); }
It searches for similar terms in the table called star_word. In this case, it will be using a simple h2 database due to the concept test, you could implement it with another type of SQL driver. To fill this database you have to execute the next script. You could adapt it as you need it.
insert into item(name, height) values('name1', '121'); insert into item(name, height) values('name2', '121'); insert into item(name, height) values('name3', '121'); insert into item(name, height) values('name4', '121'); insert into item(name, height) values('name5', '121');
The structure of the principal object would be:
public class Item {
private Long id;
private String name;
private String height;
}
Nothing crazy, just basic fields. Helped by Lombok annotations.
Our pom file has declared the next dependencies:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.24</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>3.9.0</version>
</dependency>
- Jedis: a Java client for Redis.
- Spring-boot-starter-data-redis: Provides easy configuration and access to Redis from Spring Apps, offering low-level and high-level abstractions for interactions with the database without architecture worries.
- h2 In-memory database.
- Lombok: Oriented to decorator pattern to avoid boilerplate code.
- Spring-boot-starter-data-jpa: Implements a data access layer.
Having those configurations and custom classes, now the application had covered the case of the basic design. Let’s see how would be the performance in terms of time(ms) of the search process, having it “up and running”, in the next 2 possible scenarios:
-
API
curl http://localhost:8080/?word=luke { "name": "Luke Skywalker", "height": "172" } Searching for 1237 ms
-
Database
curl http://localhost:8080/?word=name1 { "name": "name1", "height": "121" } Searching for 1289 ms
After executing those cases, take a look at the next table:
Case | API Datasource(ms) | Database source(ms) |
---|---|---|
Results found(time response) | 1237 | 1289 |
As is shown, the database source takes a little more time than the API source. The response time is not a big thing right now but what could happen when the database makes it bigger? or there are not only two sources of data(API, database), but a lot more? Our app could make better searches?
The answer is yes, but it needs to take into count what corner cases it has right now. In the next implementation, we’ll see one of them.
The problem of search the same thing over and over again
What about the case of one o more Bob users, searching by the same word, with many requests but the same word to search? Is it worth the effort of making a searching process for every similar term?
Taking this scenario as an example, let's think about how could improve the performance, in that case, to avoid ping to several API or data sources when a term has previous searching tries, and ping to a source in common that stores this similar searches with a lesser time response, something like that:
Search design with a third source feature.
The third data source act like cache storage, understanding as cache as a temporary store.
A possible new improvement…
Same one case than the beginning but adding 2 additional steps:
- Bob searches for a word, and the word was found on an API source, the result is stored in a third source, and it is returned.
- Bob searches for a word, the word wasn’t found on the API source but is found on the database source, the result is stored in a third source, and it is returned.
- Bob searches for a word, the word was found on a third source, and the result list was returned.
- Bob searches for a word, but the word wasn’t found nor database source nor API source, or third source, no content response is returned.
There are no additional steps when data was searched before because it was stored in the first request, being available for after requests, avoiding repetitive searches, and improving the app performance.
It looks simple, but what would be the characteristics that must have this third data source?
- Low latency.
- Capable of saving and returning data quickly.
- Searching by word.
- Native integration with our tech stack.
An option that probably fits with these specs, would be Spring cache, which offers different options for dealing with this type of design, allowing storing and getting data for recent results with useful configuration related and a minimal time of response.
Making it real with Spring and Redis cache integration
Spring cache is a configuration that uses proxies pattern properties to interrupt request data flow, adding a third component with Redis, acting as cache storage, due to the nature of Redis, easy integration, and fast response time.
Redis also :
- Can perform more than 11000 sets per second and more than 8000 gets per second.
- Due to the principle of no schemas, in this kind of database(key-value), it doesn’t need a strong definition of the objects to store, could be starting from a simple string value to a POJO.
- Has a property to expire values stored, making it temporarily store data.
- Easy indexes configuration.
Updating our previous design, we have the following:
Redis, representing our cache storage
Now, we have the whole picture, with Redis in it as our third data source.
Let’s update our code
Adding EnableCaching annotation to the main class
@SpringBootApplication
@EnableCaching
public class GrapefruitApplication {
public static void main(String[] args) {
SpringApplication.run(GrapefruitApplication.class, args);
}
}
This annotation is responsible of register related components to cache management like the CacheInterceptor and other proxies that allows @Cacheable works. Spring has the next annotations available:
-
@CacheEvict
evict a mapping based in a key. -
@CachePut
causes the method to be invoked and its result to be stored in the associated cache related to thecondition()
andunless()
expressions. -
@Caching
grouping cache annotations. -
@Cacheable
indicate that the result of invoking a method or all method in a class can be cached.
How works @Cacheable
in terms of Redis?
It executes a get-by key to the Redis database, if it doesn’t return a result, it executes a put command to store the newly found result after executing the method invocation. Redis allows the storing of null results by key but this property could be set by the previous configuration.
Let’s see how to look at the service method after updating it.
@Cacheable(value = "itemCache",
key = "{#word}", unless="#result == null")
public Optional<List<ItemDTO>> getListBy(String word) {
return apiClient.findBy(word)
.or(() -> findAllByName(word));
}
As I said before, there is a key
that represents the index of objects stored in Redis, the value
property represents the name of cache storage in Redis, and the unless
the property is for preventing null values.
In the same idea, there are RedisCacheManagerBuilderCustomizer
and RedisCacheConfiguration
beans, which allow set expiring time for cache values, serialization strategy, etc.
@Bean
public RedisCacheConfiguration cacheConfiguration() {
return RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(60))
.disableCachingNullValues()
.serializeValuesWith(
RedisSerializationContext.SerializationPair.fromSerializer(new GenericJackson2JsonRedisSerializer()));
}
@Bean
public RedisCacheManagerBuilderCustomizer redisCacheManagerBuilderCustomizer() {
return (builder) -> builder
.withCacheConfiguration("itemCache", cacheConfiguration());
}
Setting up this with the Redis configuration file(application.yaml).
spring:
datasource:
url: jdbc:h2:mem:starwars
driverClassName: org.h2.Driver
jpa:
defer-datasource-initialization: true
redis:
host: yourHost
port: yourPort
username: yourUsername
password: yourPassword
You could use a local environment for Redis or the Redis Lab environment with the free plan that allows 30MB of storage with high availability and also is easy to set up. In this link, there are some references on how to connect with the Redis cloud environment that you could find useful.
Having our application updated, with all the configurations before, it only takes some test to see how works this new adding feature:
Let’s try with a luke word search:
curl http://localhost:8080/?word=luke
{
"name": "Luke Skywalker",
"height": "172"
}
Searching for 1237 ms
Let’s try again with the same searching:
curl http://localhost:8080/?word=luke
{
"name": "Luke Skywalker",
"height": "172"
}
Searching for 175 ms
175 ms! More than 70% of improvement.
Look what we have here! the response time is lower than its preview results by more than 70 percent, and of course, the last request was executed in a previews step, so the request is redirected through our cache interceptor, which stored the previous data list result.
Updating the previous comparator table, there is the following:
Case | API Datasource(ms) | Database source(ms) | Redis source (ms) |
---|---|---|---|
Results found | 1237 | 1289 | 175 |
If we make a comparison in terms of big o notation efficiency, for a case with previously requested searches, we have an O(1) result, making an improvement in terms of time and steps because there are no additional steps when a search has been requested before.
Think about how this could improve content search, dictionary apps, browsers, and wikis explorer.
Oh yeah! This looks really good but what about the cons?
Well, yes, there is a disadvantage that I would like to mention; this feature cannot be applied in the case where our data sources are constantly changing, because the cache store doesn’t have the last version of them, only the last snapshot when were requested by the user. If we talking about banking accounts and their related balance, this could be a problem. Another case that could be a problem is when you are checking a tracking application about delivery service, this cannot show the recent state of your shopping.
Or when you are searching for items in a catalog of an e-commerce site, you need that item stocks to be updated for not to buy stuff without inventory. An improvement that could fit with this type of app, is recent searches, suggesting to the user based on its first letters written where the key of cache object would bet those starting letters and the object related to the items list.
Surely there are a few more disadvantages coming from the cache store, but like most of the tools that are currently used, this is not a golden hammer to fix any problem but a tool for a particular case.
So, resuming this in a nutshell
- A common search engine that could be found in many apps.
- Identify a way to make it faster.
- Find a third factor, a data source, that could help to achieve it.
- Use Redis mixed with a known framework like Spring.
- Whoala! Run and see results.
Here is the app repo.
JesusIgnacio / grapefruit
Redis with Spring cache for fast search results
Note: Yes, I called it Grapefruit , nothing to do with the app objective but I found refreshing to call my repos as fruits.
This is all by now, I hope you could enjoy this article as I enjoy writing it, and thank you for your time in reading it.
If you have any questions, please let me know.
Best to you!
This post is in collaboration with Redis.
- Try Redis Cloud for free
- Redis Developer Hub - tools, guides, and tutorials about Redis
- RedisInsight Desktop GUI
Top comments (0)