DEV Community

Cover image for A Complete Developer Guide to Vector Embeddings!
Pavan Belagatti
Pavan Belagatti

Posted on • Originally published at Medium

A Complete Developer Guide to Vector Embeddings!

In this rapidly growing digital age, the way we represent and process information has become increasingly sophisticated, with vector data standing at the forefront of this evolution. Vector data, fundamentally an array of numbers or elements, encapsulates the essence of objects and phenomena across various domains, from the intricacies of natural language in text analysis to the vibrant details of images and the nuanced tones of audio. Today, we will see everything about vector data with a simple tutorial on storing and retrieving vector data.

What is Vector Data?

Traditional databases excel in storing data in simple formats but the challenge comes when the data is unstructured and high-dimensional? In this case, the data gets converted into vector format, in layman's terms you can say in a numerical format and this data is known as vector data. This is where vector databases shine. These vector databases store the vector data efficiently.

vector data

In the context of machine learning and artificial intelligence, vector data often refers to an array or sequence of numbers or elements that represent characteristics or features of objects or entities. These vectors can be used to perform calculations, support machine learning algorithms, or facilitate searches and recommendations.

The vector data is stored in a vector database in a 3-dimensional space. This form of data representation allows for the efficient encoding of features and characteristics, enabling advanced computations, machine learning algorithms, and data analysis techniques. Whether it's powering semantic search engines, facilitating complex geographic information systems (GIS), or driving the latest advancements in artificial intelligence, vector data serves as a cornerstone for modern computational tasks. Its precision and adaptability make it an invaluable resource in our quest to understand and interact with the world around us in more meaningful ways.

Vector Database: Storing Vector Data

vector database store

Vector data is stored in vector databases, designed to efficiently manage and query vector data representations. These databases are tailored to handle the complexities and unique characteristics of vector data, enabling rapid, flexible, and scalable searches across large datasets. Unlike traditional databases that rely on exact matches or keyword searches, vector databases excel in understanding the semantic context of the data, supporting operations like similarity search, nearest neighbor search, and pattern recognition.

This makes them particularly useful in fields such as machine learning, recommendation systems, and natural language processing, where the ability to find similar items or concepts based on their vector representations is crucial. By leveraging advanced indexing and query mechanisms, vector databases provide a robust infrastructure for applications requiring high-performance analysis and retrieval of vector data, thereby facilitating more nuanced and contextually aware insights.

The robust vector database capabilities of SingleStoreDB are tailored to seamlessly serve AI-driven applications, chatbots, image recognition systems and more. With SingleStoreDB, the necessity for maintaining a dedicated vector database for your vector-intensive workloads becomes obsolete.

If you want to have an in-depth understanding of vector databases, read my article “A Deep Dive into Vector Databases”.

Tutorial: Storing and Retrieving Vector Data using SingleStore

We will use SingleStore DB as a vector database to store our vector data.
Sign up to SingleStore for free to get started.

Once you sign up, create a workspace.

create workspace

Next, create a database under the workspace you just created.

db creation

Click on ‘Create Database’ to create a new database.

I created my database ‘vectordata’.

Now, let’s create a table inside our newly created database to store vector data.

Open the SQL Editor and select the workspace and database we created.

sql code

The table name is reviews. Use the below SQL query to create the table.

CREATE TABLE reviews(id INT not null PRIMARY KEY,
  review TEXT,
  position vector(4),
  category VARCHAR(256));
Enter fullscreen mode Exit fullscreen mode

Let's add some parameters like id, pos (vector embeddings), rev (review) and cat (category).

SET @_id = 1;
SET @pos = '[0.1, 0.19, 0.37, 0.04]';
SET @rev = "The namaste cafe in HSR has a great choco bar";
SET @cat = "Food";
Enter fullscreen mode Exit fullscreen mode

Note: The vector embeddings under @pos are just fictional for the tutorial purpose only. You can create real embeddings using different embedding models from OpenAI, Cohere or HuggingFace.

Insert into the reviews table.

INSERT INTO reviews VALUES (@_id, @rev, @pos, @cat);
Enter fullscreen mode Exit fullscreen mode

Similarly, let’s add some more vector data.

SET @_id = 2;
SET @pos = '[0.2, 0.13, 0.26, 0.39]';
SET @rev = "The asha cafe in BTM has a great milk bar";
SET @cat = "Food";

INSERT INTO reviews VALUES (@_id, @rev, @pos, @cat);

SET @_id = 3;
SET @pos = '[0.7, 0.38, 0.10, 0.97]';
SET @rev = "The empire resobar in JP Nagar has a great Dosa bar";
SET @cat = "Food";

INSERT INTO reviews VALUES (@_id, @rev, @pos, @cat);

SET @_id = 4;
SET @pos = '[0.2, 0.25, 0.58, 0.49]';
SET @rev = "The MTR in Madivala has a great Upma bar";
SET @cat = "Food";

INSERT INTO reviews VALUES (@_id, @rev, @pos, @cat);
Enter fullscreen mode Exit fullscreen mode

Let’s consider our query having the vector data representation as below,

SET @query_vec = '[0.7, 0.38, 0.10, 0.97]' :> VECTOR(4);
Enter fullscreen mode Exit fullscreen mode

Using the vector similarity search, we will be able to find more similar results.

SELECT id, review, category, dot_product(position, @query_vec) 
AS score
   FROM reviews
   ORDER BY score DESC
   LIMIT 2
Enter fullscreen mode Exit fullscreen mode

similarity score

This way, we can easily store and retrieve vector data from SingleStore Database.

Let’s see one more tutorial example where you can create vector embeddings from SingleStore Notebook for real text and store the vector data in a SingleStore database.

convert to embeddings

Complete Notebook code link is here for you to try

Once you create these vector embeddings, storing them in a database is easy.

Create a database and a table inside that database to store these embeddings

database creation

Then, create a table using the SQL editor. Make sure to select the workspace and database you created.

CREATE TABLE IF NOT EXISTS <your table name> (
  text TEXT,
  vector BLOB
Enter fullscreen mode Exit fullscreen mode

Next, add these embeddings into the table created.

INSERT INTO <table name> (text, vector) VALUES ("Hello World", JSON_ARRAY_PACK("<add the embedding values generated>"))
Enter fullscreen mode Exit fullscreen mode

You can see the vector embeddings stored in your database under the table you created.

vector embeddings creation example

In conclusion, vector data plays a pivotal role in the contemporary landscape of data management and analysis, offering an effective means to represent, store, and retrieve complex, high-dimensional data. With advanced databases like SingleStore, you can efficiently manage vector data. This enables organizations to harness the full potential of their unstructured data, facilitating enhanced insights, faster query responses, and more intelligent decision-making processes.

Sign up to SingleStore Database and experience the power of efficient vector data storage today.

Top comments (0)