DEV Community

David Mezzetti for NeuML

Posted on • Updated on • Originally published at neuml.hashnode.dev

Generate image captions and detect objects

This article is part of a tutorial series on txtai, an AI-powered semantic search platform.

txtai as the name implies works with text and ai, pretty straightforward. But that doesn't mean it can't work with different types of content. For example, an image can be described with words. We can use that description to compare an image to a query or other documents. This notebook shows how images and text can be embedded into the same space to generate image captions and detect objects.

Install dependencies

Install txtai and all dependencies. We will install the api, pipeline and workflow optional extras packages, along with the datasets package

pip install ipyplot txtai[pipeline]

# Get test data
wget -N https://github.com/neuml/txtai/releases/download/v3.5.0/tests.tar.gz
tar -xvzf tests.tar.gz
Enter fullscreen mode Exit fullscreen mode

Create a captions instance

The captions pipeline takes an image or list of images and generates captions. This pipelines works using a combination of an image encoder model and a text model.

from txtai.pipeline import Caption

# Create caption pipeline
caption = Caption()
Enter fullscreen mode Exit fullscreen mode

Generate captions

The example below shows how to generate captions. A list of images are read from a directory, passed to a caption model and text descriptions are returned.

import glob
import ipyplot

from PIL import Image

# Get list of images
images = glob.glob('txtai/*jpg')

# Generate captions
captions = caption(images)

# Show image/caption pairs
ipyplot.plot_images([Image.open(image) for image in images], captions, img_width=425, force_b64=True)
Enter fullscreen mode Exit fullscreen mode

Reviewing the captions, they are all generally in the right ballpark but far from perfect. The default model does a decent job but more robust models are necessary to fully deploy an image captioning model.

Create an objects instance

The objects pipeline takes an image or list of images and generates a list of detected objects. This pipeline works using an object detection model.

from txtai.pipeline import Objects

# Create objects pipeline
objects = Objects()
Enter fullscreen mode Exit fullscreen mode

Detect objects

The example below shows how to detect objects. A list of images are read from a directory, passed to an object detection model and detected objects are returned.

import glob
import ipyplot

from PIL import Image

# Get list of images
images = glob.glob('txtai/*jpg')

# Detect objects
detected = objects(images)

# Show image/objects pairs
ipyplot.plot_images([Image.open(image) for image in images], detected, img_width=425, force_b64=True)
Enter fullscreen mode Exit fullscreen mode

Reviewing the detected objects, once again they are all generally in the right ballpark but far from perfect.

This model or larger models may do well for a specific use cases in which the model has a high accuracy. For example, the results could be filtered on only accept certain types of objects, which have shown to have high accuracy.

Wrapping up

This notebook introduced image captions and object detection. While the default models for both tasks aren't where we'd like them to be, they provide a good baseline to build on. For certain, targeted use cases where the models excel, they can be used now. This is a fast-evolving area and it is fully expected these models will improve!

Discussion (0)