Roberto B.

Posted on Mar 5

Implementing Semantic Search with Storyblok and Orama

#tutorial #javascript #embeddings #ai

Are you looking to enhance the search capabilities of your F.A.Q. System to provide more precise and relevant results? Incorporating a semantic search feature might be the solution you've been searching for. This article will guide you through creating a semantic search system using Storyblok and Orama.

To implement the semantic search, we are going to follow this process:

extracting the embeddings from our FAQ content from Storyblok;
creating indexes for the embeddings into Orama;
performing a search based on the user input.

We will build a script from implementing the above process in JavaScript, using some libraries as a dependency.

Installing the JavaScript libraries

We need to install these dependencies:

Storyblok JS Client: for retrieving content from Storyblok ( https://www.npmjs.com/package/storyblok-js-client )
Orama SDK: for creating indexes and performing the search ( https://www.npmjs.com/package/@orama/orama )
Transformer.js: for running the model for creating embeddings from the text ( https://www.npmjs.com/package/@xenova/transformers )
Prompts: for managing the user input ( https://www.npmjs.com/package/prompts ).

To install all of these packages in your JavaScript project you can use these commands:

npm i --save storyblok-js-client
npm i --save @orama/orama
npm i --save @xenova/transformers
npm i --save prompts

The source code

Now, we are going to create a new JavaScript file. The next sections will cover all the steps needed to implement the search.

Importing the libraries

In the JavaScript file, you can start importing the packages we are going to use:

import { create, insertMultiple, search } from "@orama/orama";
import { pipeline } from "@xenova/transformers";
import prompts from "prompts";
import StoryblokClient from "storyblok-js-client";

Initializing the Storyblok Client

To access the list of the FAQ from Storyblok that we want to index into Orama we have to initialize the StoryblokClient with the access token:

const Storyblok = new StoryblokClient({
    accessToken: "youraccesstoken",
    cache: {
        clear: "auto",
        type: "memory",
    },
});

For retrieving the Storyblok access token: https://www.storyblok.com/faq/retrieve-and-generate-access-tokens

Getting the content from Storyblok

For retrieving content we have to perform a HTTP API call to the stories endpoint, filtering for some specific content (starts_with parameter), and limiting the number of items (per_page parameter):

const response = await Storyblok.get("cdn/stories", {
    starts_with: "faq/",
    per_page: 100,
});

Running the model

For indexing the questions of the FAQ (the text), we have to generate embeddings, which are numerical vectors representing the semantic meaning of the text.
We are going to use the GTE model: https://huggingface.co/Supabase/gte-small

const pipe = await pipeline("feature-extraction", "Supabase/gte-small");

let output = null;
let embedding = null;

Collecting data for indexes

Looping through the Storyblok content, we are going to fill an array. Each element has a name, the "full slug", and the embedding.

const arrayInserting = [];
await response.data.stories.forEach(async (element) => {
    output = await pipe(element.name, {
        pooling: "mean",
        normalize: true,
    });
    embedding = Array.from(output.data);
    arrayInserting.push({
        name: element.name,
        full_slug: element.full_slug,
        embedding: embedding,
    });
});

Creating Orama indexes

To perform a semantic search, we must generate indexes via the schema creation and fill the schema with the content we want indexed.
We must store embeddings as vector[384] for the semantic search. The length of the vector depends on the model used.

const db = await create({
    schema: {
        name: "string",
        full_slug: "string",
        embedding: "vector[384]",
    },
});

await insertMultiple(db, arrayInserting);

Asking the use input

const inputUser = await prompts({
    type: "text",
    name: "question",
    message: "Ask me something about Storyblok",
});

const stringToSearch = inputUser.question;

Performing the search

Now in the stringToSearch, we have the text to match with our indexes. For performing the search with Orama, we are going to calculate the embeddings from the string and perform the search using the vector search with Orama:

output = await pipe(stringToSearch, {
    pooling: "mean",
    normalize: true,
});

embedding = Array.from(output.data);
const results = await search(db, {
    mode: "vector",
    vector: {
        value: embedding,
        property: "embedding",
    },
    similarity: 0.85, // Minimum similarity. Defaults to `0.8`
    includeVectors: true, // Defaults to `false`
    limit: 10, // Defaults to `10`
    offset: 0, // Defaults to `0`
});

Showing the results

Now, we have the results array with the search results, so we can loop into the result and access the data the search found.

console.log("");
if (results.hits.length === 0) {
    console.log(`I can't find anything about ${stringToSearch}`);
    console.log(
        "Maybe you can add a new entry for the Frequent Asked Questions in Storyblok",
    );
}
if (results.hits.length === 1) {
    console.log(`I found a link for you for  : ${stringToSearch}`);
}
if (results.hits.length > 1) {
    console.log(`I found some useful links for you for  : ${stringToSearch}`);
}
console.log("");
results.hits.forEach((element) => {
    console.log(` ✨ ${element.document.name}`);
    console.log(`    🔗 https://storyblok.com/${element.document.full_slug}`);
    console.log("");
});

This represents a revolutionary breakthrough in search methodology, surpassing the constraints of syntactic search, which typically involves locating text containing specific isolated words. With this advanced approach to search, discovering similar content based on meaning becomes entirely feasible.

Watch the Video

I also created a short video to show the process

DEV Community

Implementing Semantic Search with Storyblok and Orama

Installing the JavaScript libraries

The source code

Importing the libraries

Initializing the Storyblok Client

Getting the content from Storyblok

Running the model

Collecting data for indexes

Creating Orama indexes

Asking the use input

Performing the search

Showing the results

Watch the Video

Top comments (0)

Read next

Understanding the MLOps Lifecycle

Top 10 AI Code Editors and Developer Tools in 2024

Built a cli for browser

Day 17: Github Actions