DEV Community

Cover image for Similarity search with images
David Mezzetti for NeuML

Posted on • Updated on • Originally published at neuml.hashnode.dev

Similarity search with images

This article is part of a tutorial series on txtai, an AI-powered semantic search platform.

txtai as the name implies works with text and ai, pretty straightforward. But that doesn't mean it can't work with different types of content. For example, an image can be described with words. We can use that description to compare an image to a query or other documents. This notebook shows how images and text can be embedded into the same space to support similarity search.

Install dependencies

Install txtai and all dependencies. Since this article uses sentence-transformers directly, we need to install the similarity extras package.

pip install txtai[similarity] torchvision ipyplot

# Get test data
wget -N https://github.com/neuml/txtai/releases/download/v3.5.0/tests.tar.gz
tar -xvzf tests.tar.gz
Enter fullscreen mode Exit fullscreen mode

Create an Embeddings model

sentence-transformers has support for the OpenAI CLIP model. This model embeds text and images into the same space, enabling image similarity search. txtai can directly utilize these models through sentence-transformers. Check out the sentence-transformers link above for additional examples on how to use this model.

This section builds an embeddings index over a series of images.

import glob

from PIL import Image

from txtai.embeddings import Embeddings
from txtai.pipeline import Caption

def images():
  # Create image caption pipeline
  caption = Caption()

  for path in glob.glob('txtai/*jpg'):
    # Add image object along with image metadata
    image = Image.open(path)
    yield (path, {"object": image, "format": image.format, "width": image.width, "height": image.height, "caption": caption(image)}, None)

# Index with content and objects
embeddings = Embeddings({"method": "sentence-transformers", "path": "sentence-transformers/clip-ViT-B-32", "content": True, "objects": "image"})
embeddings.index(images())
Enter fullscreen mode Exit fullscreen mode

Next let's query and see what's available in the index.

embeddings.search("select id, object, format, width, height, caption from txtai")
Enter fullscreen mode Exit fullscreen mode
[{'caption': 'a book shelf filled with many books',
  'format': 'JPEG',
  'height': 682,
  'id': 'txtai/books.jpg',
  'object': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1024x682 at 0x7F47C64A6450>,
  'width': 1024},
 {'caption': 'a large building with many windows in a city',
  'format': 'JPEG',
  'height': 466,
  'id': 'txtai/buildings.jpg',
  'object': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=700x466 at 0x7F47C64A6CD0>,
  'width': 700},
 {'caption': 'a bird is perched on top of a tree branch',
  'format': 'JPEG',
  'height': 466,
  'id': 'txtai/chop.jpg',
  'object': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=700x466 at 0x7F47C64A6690>,
  'width': 700}]
Enter fullscreen mode Exit fullscreen mode

The query above shows the metadata that was added in addition to the image object. These fields can be retrieved on search and/or used to filter results.

Search the index

Now that we have an index, let's search it! This section runs a list of queries against the index and shows the top result for each query. Have to say this is pretty 🔥🔥🔥

import ipyplot
from PIL import Image

images, labels = [], []
for query in ["Walking into the office", "Saturday cleaning the yard", "Working on the latest analysis", "Working on my homework", "Watching an exciting race",
              "The universe is massive", "Time lapse video of traffic", "Relaxing Thanksgiving day"]:
  result = embeddings.search(f"select object from txtai where similar(\"{query}\")", 1)[0]
  images.append(result["object"])
  labels.append(query)

ipyplot.plot_images(images, labels, img_width=425, force_b64=True)
Enter fullscreen mode Exit fullscreen mode

Alt Text

Multilingual Support

sentence-transformers also has a model that supports over 50+ languages. This enables running queries using those languages with an image index.

Note this model only supports text, so images must first be indexed with the model used above.

import ipyplot
from PIL import Image

from txtai.pipeline import Translation

# Update model at query time to support multilingual queries
embeddings.config["path"] = "sentence-transformers/clip-ViT-B-32-multilingual-v1"
embeddings.model = embeddings.loadvectors()

# Translate queries to German
queries = ["Walking into the office", "Saturday cleaning the yard", "Working on the latest analysis", "Working on my homework", "Watching an exciting race",
           "The universe is massive", "Time lapse video of traffic", "Relaxing Thanksgiving day"]
translate = Translation()
translated = translate(queries, "de")

images, labels = [], []
for x, query in enumerate(translated):
  result = embeddings.search(f"select object from txtai where similar(\"{query}\")", 1)[0]

  images.append(result["object"])
  labels.append("%s<br/>(%s)" % (query, queries[x]))

ipyplot.plot_images(images, labels, img_width=425, force_b64=True)
Enter fullscreen mode Exit fullscreen mode

Alt Text

Discussion (0)