Neural Search? WhatāsĀ That?
In short, neural search is a new approach to retrieving information. Instead of telling a machine a set of rules to understand what data is what, neural search does the same thing with a pre-trained neural network. This means developers donāt have to write every little rule, saving time and headaches, while the system trains itself to get better as it goes along. One such company providing an open-source neural search framework is Jina.
Background
Search is big business, and getting bigger every day. Just a few years ago, searching meant typing something into a text box (ah, those heady days of Yahoo! and Altavista). Now search encompasses text, voice, music, photos, videos, products, and so much more. Just before the turn of the millennium there were only 3.5 million Google searches a day. Today (according to the top result for search term 2020 google searches per day
) that figure could be as high as 5 billion and rising, more than 1,000 times more. Thatās not to mention all the billions of Wikipedia articles, Amazon products, and Spotify playlists searched by millions of people every day from their phones, computers, and virtual assistants.
Just look at the stratospheric growth in Google queriesāāāand thatās only until 2012!
In short, search is huge. Weāre going to look at the reigning champ of search methods, symbolic search, and the plucky upstart contender, neural search.
Note: This article is based on a post by Han Xiao with his permission. Check there if you want a more technical introduction to neural search.
Symbolic Search: Rules areĀ Rules
Google is a huge general-purpose search engine. Other companies canāt just adapt it to their needs and plug it into their systems. Instead, they use frameworks like Elastic and Apache Solr, symbolic search systems that let developers write the rules and create pipelines for searching products, people, messages, or whatever the company needs.
Letās take Shopify for example. They use Elastic to index and search through millions of products across hundreds of categories. This couldnāt be done out-of-the-box or with a general purpose search engine like Google. They have to take Elastic and write specific rules and pipelines to index, filter, sort, and rank products by a variety of criteria, and convert this data into symbols that the system can understand. Hence the name, symbolic search. Hereās Greats, a popular Shopify store for sneakers:
You and I know that if you search for red nike sneakers
you want, well, red Nike sneakers. Those are just words to a typical search system though. Sure, if you type them in youāll hopefully get what you asked for, but what if those sneakers are tagged as trainers? Or even tagged as scarlet for that matter? In cases like this, a developer needs to write rules:
- Red is a color
- Scarlet is a synonym of red
- Nike is a brand
- Sneakers are a type of footwear
- Another name for sneakers is trainers
Or, expressed in JSON as key-value pairs:
{
"color": "red",
"color_synonyms": ["scarlet"],
"brand": "nike",
"type": "sneaker",
"type_synonyms": ["trainers"],
"category": "footwear"
}
Each of these key-value pairs can be thought of as a symbol, hence the name symbolic search. When a user inputs a search query, the system breaks it down into symbols, and matches these symbols with the symbols from the products in its database.
But what if a user types nikke
instead of nike
, or searches shirts
(with an s
) rather than shirt
? There are so many rules in language, and people break them all the time. To get effective symbols (i.e. knowing that nikke
really means {"brand": "nike"}
), you need to define lots of rules and chain them together in a complex pipeline:
Drawbacks of SymbolicĀ Search
You Have to Explain Every. Little.Ā Thing
Our example search query above was red nike sneaker man
. But what if our searcher is British? A Brit would type red nike trainer man
. We would have to explain to our system that sneakers and trainers are just the same thing with different names. Or what is someone is searching LV handbag
? The system would have to be told LV
stands for Louis Vuitton
.
Doing that for every kind of product takes forever and there are always things that fall between the cracks. And if you want to localize for other languages? Youāll have to go through it all over again. That means a lot of hard work, knowledge, and attention to detail.
Itās Fragile
Text is complicated: As we explained above, if a user types in red nikke sneaker man
a classic search system has to recognize that they're searching for a red (color) Nike (brand with corrected spelling) sneaker (type) for men (sub-type). This is done by interpreting the search string and product details to symbols via the pipeline, and these pipelines can have major issues.
- Every component in the chain has an output that is fed as input into the next component along. So a problem early on in the process and break the whole system
- Some components may take inputs from multiple predecessors. That means you have to introduce more mechanisms to stop them blocking each other
- Itās difficult to improve overall search quality. Just improving one or two components may lead to no improvement in actual search results
- If you want to search in another language, you have to rewrite all the language-dependent components in the pipeline, increasing maintenance cost
Neural Search: (Pre-)Train, DonātĀ Explain
An easier way would be a search system trained on existing data. If you train a system on enough different scenarios beforehand (i.e. a pre-trained model), it develops a generalized ability to find outputs that match inputs, whether theyāre flowers, lines from South Park, or PokĆ©mon. You can plug this model directly into your system and start indexing and searching right away.
The code is pretty straightforward. It loads a āFlowā, which in turns loads a series of modules to process, index, and query your data:
from jina.flow import Flow
f = (Flow()
.add(name='my-encoder', image='jinaai/hub.examples.my_encoder',
volumes='./abc', yaml_path='hub/examples/my_encoder/my_encoder_ext.yml',
port_in=55555, port_out=55556)
This way, you donāt need to waste hours writing endless rules for your use case. Instead, just include a line in your code to download the model you want from an āapp storeā (like the upcoming Jina Hub), and get going.
Compared to symbolic search, neural search:
- Removes the fragile pipeline, making the system more resilient and scalable
- Finds a better way to represent the underlying semantics of products and search queries
- Learns as it goes along, so improves over time
Does Neural Search WorkĀ Though?
A search āworksā if it understands and returns quality results for:
- Simple queries: Like searching āredā, ānikeā, or āsneakersā
- Compound queries: Like āred nike sneakersā
If it canāt even do those, thereās no point in checking for fancy things like spell-checking and ability to work in different languages.
Anyway, less talking, more searching:
š¬š§ nike
š©šŖ nike schwarz (different language)
š¬š§ addidsa (misspelled brand)
š¬š§ addidsa trosers (misspelled brand and category)
š¬š§ š©šŖ kleider flowers (mixed languages)
So, as you can see, neural search does pretty well!
Comparing Symbolic and NeuralĀ Search
So, how does neural search compare to the reigning champ that is symbolic search? Letās take a look at the pros and cons of each:
Weāre not trying to choose between Team Symbolic and Team Neural. Both approaches have their own advantages and complement each other pretty well. So a better question to ask is: Which is right for your organization?
Try Neural Search forĀ Yourself
Thereās no better way to test-drive a technology than by diving in and playing with it. Jina provides pre-trained Docker images and jinabox.js, an easy-to-use front-end for searching text, images, audio, or video. Thereās no product search example (yet), but you can search for more light-hearted things like your favorite PokĆ©mon or lines from South Park.
Note: Originally posted on Medium
Top comments (0)