DEV Community

Cover image for Pictures are a worth a thousand words
David Mezzetti for NeuML

Posted on • Updated on • Originally published at neuml.hashnode.dev

Pictures are a worth a thousand words

One of the hottest🔥 models as of June 2022 is DALL-E mini. There are a number of projects and examples utilizing this model as seen here, here and here. There have even been mainstream news articles covering this model.

This article presents a workflow to build a summary of a webpage and then generate an image of this summary. While there are a number of potential use cases for text to image models, this example is focused on showing the power of workflows and also provides interesting insights into how DALL-E mini "sees" the world.

Install dependencies

Install txtai and all dependencies.

All credit for the DALL-E mini models and code goes to https://github.com/borisdayma/dalle-mini and https://github.com/kuprel/min-dalle.

pip install txtai tika min-dalle ipyplot
Enter fullscreen mode Exit fullscreen mode

Build a DALL-E pipeline

Let's first construct a txtai pipeline that generates images using DALL-E mini.

from min_dalle import MinDalle

from txtai.pipeline import Pipeline

class Dalle(Pipeline):
  def __init__(self):
    self.model = MinDalle(is_mega=False, is_verbose=False)

  def __call__(self, texts, seed, prefix):
    results = []
    for text in texts:
      text = prefix + text
      results.append(self.model.generate_image(text, seed))

    return results
Enter fullscreen mode Exit fullscreen mode

Build DALL-E workflow

Next we'll define a txtai workflow as YAML. This workflow extracts text at a specified URL, builds a summary and then generates an image for the summary text.

This workflow can be run from Python as shown below or as a API service.

from txtai.app import Application

app = Application("""
__main__.Dalle:
summary:
  path: sshleifer/distilbart-cnn-12-6
textractor:
  join: true
  lines: false
  minlength: 100
  paragraphs: true
  sentences: false
workflow:
  draw:
    tasks:
    - action: textractor
      task: url
    - action: summary
      args: [0, 60, 0]
    - action: __main__.Dalle
      args: [1024, "Illustration of "]
""")
Enter fullscreen mode Exit fullscreen mode

Generate webpage summary images

Now that the workflow is up, let's generate some images! The following example generates images for a set of Wikipedia articles. Give this a try for articles, recipes or any other descriptive web page.

It's also fun to generate images for random Wikipedia pages using this url: https://en.wikipedia.org/wiki/Special:Random

import ipyplot

# Build list of Wikipedia article URLs
captions = ["Catholic Church", "Magic Kingdom", "Epcot", "Mountain", "Tundra", "War of 1812", "Chinese Cuisine", "Indian Cuisine", "Italian Cuisine",
            "Romanian cuisine", "Football", "Artificial Intelligence", "Galaxy", "Fjord", "Island"]
urls = [f"https://en.wikipedia.org/wiki/{url.replace(' ', '_')}" for url in captions]

# Run workflow to generate images with DALL-E mini
images = list(app.workflow("draw", urls))

# Plot images
ipyplot.plot_images(images, captions, img_width=256)
Enter fullscreen mode Exit fullscreen mode

Image description

Wrapping up

This article walked through an example on how to build a txtai workflow to generate webpage summary images. DALL-E mini is a fascinating model and it's a lot of fun to "see" how the model works. Give it a try yourself!

Top comments (0)