In my last two blogs [https://dev.to/shannonlal/unlocking-the-power-of-hybrid-search-5bej, https://dev.to/shannonlal/building-blocks-for-hybrid-search-combining-keyword-and-semantic-search-236k] I focused on giving an overview of MongoDB's Vector Search with the goal of demonstrating hybrid search in Mongo. In this blog I am going present a solution on how I got hybrid search to work with Mongo; however, it took several attempts and I will talk about my different strategies.
Attempting MongoDB Aggregation for Hybrid Search
My initial strategy was to do a MongoDB aggregation to perform a dual search - one on the text and another on the vectors. The idea was to leverage the power of MongoDB's $search stage to execute a text search followed by a vector search within the same pipeline. Here is the aggregation query that I put together
[
// Stage 1: Text-based search on 'description' field
{
$search: {
index: 'text_index',
text: {
query: 'searchTerm',
path: 'description',
score: { boost: { value: 2 } }
}
}
},
// Stage 2: Incorporate the vector search based on the embedding
{
$search: {
index: 'vector_index',
compound: {
should: [
{
vector: {
path: 'embedding',
query: [/* your vector embedding here */],
score: { boost: { value: 1 } }
}
}
]
}
}
},
{
$sort: {
'score': { $meta: 'textScore' }
}
},
{
$project: {
_id: 0, // excluding the id field
name: 1,
description: 1,
textScore: { $meta: 'textScore' },
vectorScore: { $meta: 'searchScore' }
}
}
];
However, MongoDB only allows one $search stage and it must be at the beginning of the pipeline. As a result it looks like the aggregation pipeline won't work.
Crafting a Combined Search Index
The second strategy I tried involved creating a unified search index that could potentially handle both text and vector searches. Below is the index that I tried to create.
{
"mappings": {
"dynamic": false,
"fields": {
"description": {
"type": "string",
"analyzer": "lucene.standard"
},
"embedding": {
"type": "vector",
"similarity": "cosine",
"numDimensions": 512
}
}
}
}
Unfortunately, this approach hit a roadblock as MongoDB does not recognize 'vector' as a valid type within its index mappings.
Mongo Union with Aggregation
The final approach was to use a unionWith technique with Mongo Aggregation to perform the Vector Search first and then using the unionWith operator perform a Text Search.
The following code is based on my previous blog on Hybrid Search. Here is the aggregation pipeline code for hybrid search
const pipeline = [
{
$vectorSearch: {
index: 'vector_index',
path: 'embedding',
queryVector: embedding,
numCandidates: 10,
limit: 10,
},
},
{ $addFields: { vs_score: { $meta: 'vectorSearchScore' } } },
{
$project: {
vs_score: 1,
_id: 1,
description: 1,
name: 1,
},
},
{
$unionWith: {
coll: 'vector_test',
pipeline: [
{
$search: {
index: 'default',
text: { query: searchTerm, path: 'description' },
},
},
{ $limit: 10 },
{ $addFields: { fts_score: { $meta: 'searchScore' } } },
{
$project: {
fts_score: 1,
_id: 1,
description: 1,
name: 1,
},
},
],
},
},
{
$group: {
_id: '$_id',
vs_score: { $max: '$vs_score' },
fts_score: { $max: '$fts_score' },
description: { $first: '$description' },
name: { $first: '$name' },
},
},
{
$project: {
description: 1,
name: 1,
vs_score: { $ifNull: ['$vs_score', 0] },
fts_score: { $ifNull: ['$fts_score', 0] },
},
},
{
$project: {
description: 1,
name: 1,
score: { $add: ['$fts_score', '$vs_score'] },
_id: 1,
vs_score: 1,
fts_score: 1,
},
},
{ $sort: { score: -1 } },
{ $limit: 10 },
];
The aggregation is a little bit more complex than I would like it to be but it seems to do the job. I think the one thing that I would recommend is paying attention to how the combined score is determined. In this approach we are just adding the two scores (vs_score and fts_score) together; however, this may not be the best solution for your use case. I have included the score results based on my test search that I did below
Query Results:
Search Term | Combined Score | Text Score | Vector Score |
---|---|---|---|
Car for hire | 1.373 | 0.653 | 0.720 |
Limo Hires | 0.775 | 0.037 | 0.737 |
Electric Scooter | 0.733 | 0.044 | 0.689 |
Bike Share | 0.731 | 0.042 | 0.689 |
Car Dealership | .651 | 0.036 | 0.615 |
The Road Ahead
Over the next couple of weeks I am going to be load testing this out to see how the query handles when search for large number of documents. I would definitely welcome any feedback or comments on how I can improve the query or better strategies to get hybrid search working.
Thanks
Top comments (0)