DEV Community

Export Segment Anything neural network to ONNX: the missing parts

Andrey Germanov on November 15, 2023

Table of Contents Introduction What is a problem ? Diving to the SAM model structure Export SAM to ONNX - the right way Export the i...

Read full post

Matt Cummins • Mar 1 '24

Thanks so much for sharing this!

kieunHyeong • Dec 4 '23

Thank you! so much!

david • Dec 29 '23 • Edited

This is great! Thank you!

One question. In the section Encode the prompt , what does the line input_labels = np.array([2,3]) mean when the input is a bounding box? In the official instruction, I didn't see any label required for box input.

Andrey Germanov • Dec 29 '23 • Edited

Each coordinate (x,y) should have a label, so it means that top left corner of bounding box has label 2 and bottom right corner has label 3.

david • Jan 5 '24

Thanks!

Bald man • Feb 3 '24

I have a problem and I wonder if you can help me solve it?

[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_151' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41 onnxruntime::ReshapeHelper::ReshapeHelper gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,8,256}, requested shape:{1,10,8,32}
  File "C:\Users\ASUS\Desktop\avatarget\TinySAM-main\TinySAM-main\onnx.py", line 157, in <module>
    outputs = decoder.run(None,{
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_151' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:41 onnxruntime::ReshapeHelper::ReshapeHelper gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,8,256}, requested shape:{1,10,8,32}

Andrey Germanov • Feb 3 '24 • Edited

Hi, could you provide more context? If you share your code and the input image, resulted in this error, then I can say more.

At the first glance in looks like the input tensor for SAM decoder (the image embeddings, returned by the encoder) has incorrect shape. It should be (1,256,64,64), but can confirm this only after seeing the code.

Bald man • Feb 4 '24

Hello, I have solved this problem, may I ask how SAM supports the input of multiple bounding boxes?

Andrey Germanov • Feb 5 '24 • Edited

As far as I know, it's impossible now due to the SAM model limitations. You can read some workarounds here: github.com/facebookresearch/segmen...

However, you can partially emulate this, by specifying 4 points, that belong to the object and located close to its corners. So, you can specify 8 points to define two boxes. It's not the same as two real bounding boxes, but better than nothing.

If follow the sample, provided in this article, to detect both dog and cat on the image, using the points of top left and bottom right corners of these objects, you can use the following code:

# ENCODE PROMPT (two boxes, defined as points, that belong to the objects)
input_box = np.array([155, 246, 241, 297, 279, 110, 444, 287]).reshape(4,2)
# Labels defined as for points, that belong to the objects (not as box points)
input_labels = np.array([1,1,1,1])

onnx_coord = input_box[None, :, :]
onnx_label = input_labels[None, :].astype(np.float32)

coords = deepcopy(onnx_coord).astype(float)
coords[..., 0] = coords[..., 0] * (resized_width / orig_width)
coords[..., 1] = coords[..., 1] * (resized_height / orig_height)

onnx_coord = coords.astype("float32")

# RUN DECODER TO GET MASK
onnx_mask_input = np.zeros((1, 1, 256, 256), dtype=np.float32)
onnx_has_mask_input = np.zeros(1, dtype=np.float32)

decoder = ort.InferenceSession("vit_b_decoder.onnx")
masks,_,_ = decoder.run(None,{
    "image_embeddings": embeddings,
    "point_coords": onnx_coord,
    "point_labels": onnx_label,
    "mask_input": onnx_mask_input,
    "has_mask_input": onnx_has_mask_input,
    "orig_im_size": np.array([orig_height, orig_width], dtype=np.float32)
})

# POSTPROCESS MASK
mask = masks[0][0]
mask = (mask > 0).astype('uint8')*255

# VISUALIZE MASK
img_mask = Image.fromarray(mask, "L")
img_mask

If run this in the sam_onnx_inference.ipynb notebook, the output should be like this:

david • Jan 8 '24 • Edited

Hi, I tried to export onnx file for the vit_h model by modifying the line:

sam = sam_model_registry["vit_b"](checkpoint="./sam_vit_b_01ec64.pth")

sam = sam_model_registry["vit_h"](checkpoint="./checkpoint/sam_vit_h_4b8939.pth")

It then generate 455 files. Some of them are:

And the encoder onnx file is only about 1Mb size (vs 350Mb of the vit_b )

Did I miss changing anything in the script?

Filip • Feb 6 '24 • Edited

Encountered the same issue, turns out its an onnx limitation, models over 2Gb will be exported like this. It's still usable however, I followed the rest of the tutorial and all worked. As a workaround I went and quantized the split model and received a nice 600mb encoder in onnx format, as far as I know the quality loss should be minimal. You can give this a try:

import onnxruntime
from onnxruntime.quantization import QuantType
from onnxruntime.quantization.quantize import quantize_dynamic

out_onnx_model_quantized_path = "vit_h_encoder_quantized.onnx"
input_onnx_encoder_path = "vit_h_encoder.onnx" # Path to the encoder onnx file, all the other blocks files must be in the same dir!

quantize_dynamic(
    model_input="input_onnx_encoder_path ",
    model_output=onnx_model_quantized_path,
    optimize_model=True,
    per_channel=False,
    reduce_range=False,
    weight_type=QuantType.QUInt8,
)

Or simply use one of the smaller SAM models.

david • Feb 9 '24

Thank you!

fragakos • May 28 '24

Great works, thanks.

Can the automatic mask generator be exported to onnx though?

Andrey Germanov • May 30 '24

Hello, thank you.

The automatic mask generator is not a different model, that can be exported to ONNX. It's a Python class, that uses the same model many times for different points of the image and combines the resulted masks.

fragakos • May 30 '24

Is there a guide on how to use that in the best way?

Andrey Germanov • May 30 '24

github.com/facebookresearch/segmen...

Also, the source code of SamAutomaticMaskGenerator class can help to understand how it exactly works: github.com/facebookresearch/segmen...

fragakos • Jul 5 '24

Yes I saw that, but how is it done using the onnx type files? Maybe can you make a quide on that too?

fragakos • Jul 8 '24

Is it possible to optimize the encoder for GPU?

Abhinav Kumar • Feb 15 '24

Wanted to ask if SAM model can take tiled images (like openseadragon themed images) , If yes , can you provide some resources or references to apply that?
Thanks,

Andrey Germanov • Feb 16 '24 • Edited

The SAM does not take images, it takes tensors of 1024x1024x3 size. The image should be converted to a tensor before passing to the SAM model. I am not familiar with OpenSeadragon images, but if they can be exported to standard PNG or JPG and then converted to the tensors, as described in this article, then yes.

fragakos • Sep 2 '24

I think it is time to try the new SAM2, we need your guidance! <3

Vincenzo La Forgia • Nov 19 '24

Hi great guide, do you think it is possible to export to onnx segGPT? could you make a guide for it too?