DEV Community

Cover image for OpenSearch as Vector DB: Supercharge Your LLM
dejanualex for AWS Community Builders

Posted on • Edited on • Originally published at dejanualexandru.Medium

OpenSearch as Vector DB: Supercharge Your LLM

Go beyond interactive log analytics and real-time application monitoring, now you can unlock the ability to deploy ML models in OpenSearch (for a quick intro to OpenSearch check OpenSearch for humans)

Amazon OpenSearch Service allows you to deploy a secured OpenSearch cluster in minutes.

Image description

Setup:

In this particular case, the OpenSearch 2.7 cluster is backed up by r6gd.4xlarge instances. Since we’re not using ML nodes with NVIDIA® V100 Tensor Core GPUs, we need to change the configuration of ml_commons in order to run our model on our Graviton2-based instances.

By using the DevTools we can run queries in the console, first thing is to change the plugin only_run_on_ml_node setting to false.

# change the config
PUT _cluster/settings
{
   "persistent":{
     "plugins.ml_commons.only_run_on_ml_node": false
   }
}
Enter fullscreen mode Exit fullscreen mode

Image description

After updating the plugin configuration, the next step is to upload a pre-trained model using API (OpenSearch currently only supports TorchScript and ONNX formats).Below is a list of some of the pre-trained modelsthat are supported:

Image description

Steps:

⚠️When choosing the sizing for the OpenSearch cluster, ensure you correctly size your nodes in order to have enough memory when making ML inferences and avoid CircuitBreakerException.

Most deep learning models are larger than 100 MB, making it difficult to fit them into a single document, therefore OpenSearch splits the model file into smaller chunks to be stored in a model index. Upload the model using the API, in this case, I’ve chosen the pre-trained sentence-transformer model all-MiniLM-L12-v2.

  • After uploading the model, OpenSearch responds with the task_id which we’re going to use to get the model_id.

  • After getting the model_id, we’re going to load the model from the index into the memory POST /_plugins/_ml/models/<model_id>/_load

  • After the model is loaded successfully we can use the text_embedding algorithm.

POST /_plugins/_ml/_predict/text_embedding/lu14l4kB_GAWF5uBi_Ol
{
  "text_docs":[ "sentence to be embedded"],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}
Enter fullscreen mode Exit fullscreen mode

Image description

That’s it… for an in-depth explanation of what embedding means check embedding algorithm, and LLM and Vector Databases.
As a quick recap, below are the steps:



# get settings
GET /_cluster/settings?include_defaults=true

# get memory usage per node and breaker
GET _nodes/stats/breaker

# If you don't use dedicated ML nodes for cluster update setting to false
PUT _cluster/settings
{
   "persistent":{
     "plugins.ml_commons.only_run_on_ml_node": false
   }
}

# upload pre-trained model
POST /_plugins/_ml/models/_upload
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
  "version": "1.0.1",
  "model_format": "TORCH_SCRIPT"
}

# get model id using the task_id returned by previous request
GET /_plugins/_ml/tasks/<task_id>

# load model
POST /_plugins/_ml/models/<modelId>/_load

# use the task_id to get the status of model load
GET /_plugins/_ml/tasks/<task_id>

# embed text
POST /_plugins/_ml/_predict/text_embedding/lu14l4kB_GAWF5uBi_Ol
{
  "text_docs":[ "test to embed here"],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)