This article is part of a tutorial series on txtai, an AI-powered semantic search platform.
txtai as the name implies works with text and ai, pretty straightforward. But that doesn't mean it can't work with different types of content. For example, an image can be described with words. We can use that description to compare an image to a query or other documents. This notebook shows how images and text can be embedded into the same space to generate image captions and detect objects.
txtai and all dependencies. We will install the api, pipeline and workflow optional extras packages, along with the datasets package
pip install ipyplot txtai[pipeline] # Get test data wget -N https://github.com/neuml/txtai/releases/download/v3.5.0/tests.tar.gz tar -xvzf tests.tar.gz
The captions pipeline takes an image or list of images and generates captions. This pipelines works using a combination of an image encoder model and a text model.
from txtai.pipeline import Caption # Create caption pipeline caption = Caption()
The example below shows how to generate captions. A list of images are read from a directory, passed to a caption model and text descriptions are returned.
import glob import ipyplot from PIL import Image # Get list of images images = glob.glob('txtai/*jpg') # Generate captions captions = caption(images) # Show image/caption pairs ipyplot.plot_images([Image.open(image) for image in images], captions, img_width=425, force_b64=True)
Reviewing the captions, they are all generally in the right ballpark but far from perfect. The default model does a decent job but more robust models are necessary to fully deploy an image captioning model.
The objects pipeline takes an image or list of images and generates a list of detected objects. This pipeline works using an object detection model.
from txtai.pipeline import Objects # Create objects pipeline objects = Objects()
The example below shows how to detect objects. A list of images are read from a directory, passed to an object detection model and detected objects are returned.
import glob import ipyplot from PIL import Image # Get list of images images = glob.glob('txtai/*jpg') # Detect objects detected = objects(images) # Show image/objects pairs ipyplot.plot_images([Image.open(image) for image in images], detected, img_width=425, force_b64=True)
Reviewing the detected objects, once again they are all generally in the right ballpark but far from perfect.
This model or larger models may do well for a specific use cases in which the model has a high accuracy. For example, the results could be filtered on only accept certain types of objects, which have shown to have high accuracy.
This notebook introduced image captions and object detection. While the default models for both tasks aren't where we'd like them to be, they provide a good baseline to build on. For certain, targeted use cases where the models excel, they can be used now. This is a fast-evolving area and it is fully expected these models will improve!