DEV Community

Cover image for Comparison Of Blip2 Captioning Models With 1 Click Windows & RunPod Installer
Furkan Gözükara
Furkan Gözükara

Posted on

Comparison Of Blip2 Captioning Models With 1 Click Windows & RunPod Installer

I have recently coded from a scratch Gradio app for the famous Blip2 captioning models.

1 Click auto installers with instructions are posted here : https://www.patreon.com/posts/sota-image-for-2-90744385

This post also have 1 click Windows & RunPod installers with Gradio interfaces supporting batch captioning as well for the following image vision models : LLaVA (4-bit, 8-bit, 16-bit, 7b, 13b, 34b), Qwen-VL (4-bit, 8-bit, 16-bit), Clip_Interrogator Gradio APP that supports 115 Clip Vision models with combination of 5 caption models.

All precisions are working on Windows as well with our special installers.

16-bit mode works fastest meanwhile 8-bit mode works slowest. 4-bit mode is slower than 16-bit precision but faster than 8-bit precision.

Look at all the information below.

Blip 2 Models Batch Image Captioning App

The testings are as below.

When doing batch processing, only 1 image at a time is captioned. So there weren't parallel captioning of images.

Salesforce/blip2-opt-6.7b — 16-bit precision

Batch processing speed on RTX A6000 : Speed: 0.32 second/image

Salesforce/blip2-opt-6.7b — 8-bit precision

Batch processing speed on RTX A6000 : Speed: 1.7 second/image

Salesforce/blip2-opt-6.7b — 4-bit precision

Batch processing speed on RTX A6000 : Speed: 0.65 second/image

Salesforce/blip2-flan-t5-xxl— 16-bit precision

Batch processing speed on RTX A6000 : Speed: 0.41 second/image

Salesforce/blip2-flan-t5-xxl — 8-bit precision

Batch processing speed on RTX A6000 : Speed: 1.6 second/image

Salesforce/blip2-flan-t5-xxl — 4-bit precision

Batch processing speed on RTX A6000 : Speed: 0.82 second/image

Salesforce/blip2-opt-6.7b-coco— 16-bit precision

Batch processing speed on RTX A6000 : Speed: 0.39 second/image

Salesforce/blip2-opt-6.7b-coco — 8-bit precision

Batch processing speed on RTX A6000 : Speed: 2.01 second/image

Salesforce/blip2-opt-6.7b-coco — 4-bit precision

Batch processing speed on RTX A6000 : Speed: 0.74 second/image

Top comments (0)