DEV Community

Cover image for Implementing Semantic Search with Storyblok and Orama
Roberto B.
Roberto B.

Posted on

Implementing Semantic Search with Storyblok and Orama

Are you looking to enhance the search capabilities of your F.A.Q. System to provide more precise and relevant results? Incorporating a semantic search feature might be the solution you've been searching for. This article will guide you through creating a semantic search system using Storyblok and Orama.

To implement the semantic search, we are going to follow this process:

  • extracting the embeddings from our FAQ content from Storyblok;
  • creating indexes for the embeddings into Orama;
  • performing a search based on the user input.

We will build a script from implementing the above process in JavaScript, using some libraries as a dependency.

Installing the JavaScript libraries

We need to install these dependencies:

To install all of these packages in your JavaScript project you can use these commands:

npm i --save storyblok-js-client
npm i --save @orama/orama
npm i --save @xenova/transformers
npm i --save prompts
Enter fullscreen mode Exit fullscreen mode

The source code

Now, we are going to create a new JavaScript file. The next sections will cover all the steps needed to implement the search.

Importing the libraries

In the JavaScript file, you can start importing the packages we are going to use:

import { create, insertMultiple, search } from "@orama/orama";
import { pipeline } from "@xenova/transformers";
import prompts from "prompts";
import StoryblokClient from "storyblok-js-client";
Enter fullscreen mode Exit fullscreen mode

Initializing the Storyblok Client

To access the list of the FAQ from Storyblok that we want to index into Orama we have to initialize the StoryblokClient with the access token:

const Storyblok = new StoryblokClient({
    accessToken: "youraccesstoken",
    cache: {
        clear: "auto",
        type: "memory",
    },
});
Enter fullscreen mode Exit fullscreen mode

For retrieving the Storyblok access token: https://www.storyblok.com/faq/retrieve-and-generate-access-tokens

Getting the content from Storyblok

For retrieving content we have to perform a HTTP API call to the stories endpoint, filtering for some specific content (starts_with parameter), and limiting the number of items (per_page parameter):

const response = await Storyblok.get("cdn/stories", {
    starts_with: "faq/",
    per_page: 100,
});
Enter fullscreen mode Exit fullscreen mode

Running the model

For indexing the questions of the FAQ (the text), we have to generate embeddings, which are numerical vectors representing the semantic meaning of the text.
We are going to use the GTE model: https://huggingface.co/Supabase/gte-small

const pipe = await pipeline("feature-extraction", "Supabase/gte-small");

let output = null;
let embedding = null;
Enter fullscreen mode Exit fullscreen mode

Collecting data for indexes

Looping through the Storyblok content, we are going to fill an array. Each element has a name, the "full slug", and the embedding.

const arrayInserting = [];
await response.data.stories.forEach(async (element) => {
    output = await pipe(element.name, {
        pooling: "mean",
        normalize: true,
    });
    embedding = Array.from(output.data);
    arrayInserting.push({
        name: element.name,
        full_slug: element.full_slug,
        embedding: embedding,
    });
});
Enter fullscreen mode Exit fullscreen mode

Creating Orama indexes

To perform a semantic search, we must generate indexes via the schema creation and fill the schema with the content we want indexed.
We must store embeddings as vector[384] for the semantic search. The length of the vector depends on the model used.

const db = await create({
    schema: {
        name: "string",
        full_slug: "string",
        embedding: "vector[384]",
    },
});

await insertMultiple(db, arrayInserting);
Enter fullscreen mode Exit fullscreen mode

Asking the use input

const inputUser = await prompts({
    type: "text",
    name: "question",
    message: "Ask me something about Storyblok",
});

const stringToSearch = inputUser.question;
Enter fullscreen mode Exit fullscreen mode

Performing the search

Now in the stringToSearch, we have the text to match with our indexes. For performing the search with Orama, we are going to calculate the embeddings from the string and perform the search using the vector search with Orama:

output = await pipe(stringToSearch, {
    pooling: "mean",
    normalize: true,
});

embedding = Array.from(output.data);
const results = await search(db, {
    mode: "vector",
    vector: {
        value: embedding,
        property: "embedding",
    },
    similarity: 0.85, // Minimum similarity. Defaults to `0.8`
    includeVectors: true, // Defaults to `false`
    limit: 10, // Defaults to `10`
    offset: 0, // Defaults to `0`
});
Enter fullscreen mode Exit fullscreen mode

Showing the results

Now, we have the results array with the search results, so we can loop into the result and access the data the search found.

console.log("");
if (results.hits.length === 0) {
    console.log(`I can't find anything about ${stringToSearch}`);
    console.log(
        "Maybe you can add a new entry for the Frequent Asked Questions in Storyblok",
    );
}
if (results.hits.length === 1) {
    console.log(`I found a link for you for  : ${stringToSearch}`);
}
if (results.hits.length > 1) {
    console.log(`I found some useful links for you for  : ${stringToSearch}`);
}
console.log("");
results.hits.forEach((element) => {
    console.log(` āœØ ${element.document.name}`);
    console.log(`    šŸ”— https://storyblok.com/${element.document.full_slug}`);
    console.log("");
});
Enter fullscreen mode Exit fullscreen mode

This represents a revolutionary breakthrough in search methodology, surpassing the constraints of syntactic search, which typically involves locating text containing specific isolated words. With this advanced approach to search, discovering similar content based on meaning becomes entirely feasible.

Watch the Video

I also created a short video to show the process

Top comments (0)