Jimmy Guerrero for Voxel51

Posted on Aug 22 • Originally published at voxel51.com

SAM 2 Is Now Available in FiftyOne!

#computervision #machinelearning #ai #datascience

Author: Prerna Dhareshwar (Machine Learning / Customer Success at Voxel51)

Segment Anything 2 (SAM 2) which was released on July 29th, 2024 represents a major leap forward in segmentation technology, offering cutting-edge performance in both images and videos. Building on the foundation of the original Segment Anything, which Meta released in April 2023, SAM 2 not only enhances image segmentation but also introduces advanced video capabilities. With SAM 2, users can achieve precise segmentation and tracking in video sequences using simple prompts—like bounding boxes or points—from a single frame. This enhanced functionality opens up exciting new possibilities for a wide array of video applications.

In this post you will see how to load and apply SAM 2 models on both images as well as videos in FiftyOne.

Using SAM 2 in FiftyOne for Images

FiftyOne makes it easy for AI builders to work with visual data. With SAM 2 in FiftyOne you can now seamlessly generate segmentation labels and visualize them on your datasets. With just a few simple commands you can download SAM 2 models and run inference on your FiftyOne datasets directly from the FiftyOne Model Zoo, which contains a collection of pretrained models.

To get started, ensure that you have FiftyOne installed:

pip install fiftyone

You also need to install SAM 2 using the instructions in the segment-anything-2 github repository.

The following code snippet demonstrates how you can load a dataset in Fiftyone and provide bounding box prompts to a SAM 2 model to generate segmentations.

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart", max_samples=25, shuffle=True, seed=51
)

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with boxes
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="ground_truth",
)

We can now look at our data with the segmentation labels created by SAM 2.

session = fo.launch_app(dataset)

We can see that the inference of the SAM 2 model prompted with bounding box detections are stored under the field segmentations.

You can also prompt with keypoints instead of bounding boxes. To do this, we first filter the images in the quickstart dataset that contain the label person.

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart")
dataset = dataset.filter_labels("ground_truth", F("label") == "person")

Next we need to generate keypoints on this dataset. We can use another FiftyOne Zoo model to generate these keypoints.

# Generate some keypoints
model = foz.load_zoo_model("keypoint-rcnn-resnet50-fpn-coco-torch")
dataset.default_skeleton = model.skeleton
dataset.apply_model(model, label_field="gt")

Let us look at this dataset and the keypoints that were generated.

session = fo.launch_app(dataset)

Now we can run a SAM 2 model on this dataset using the keypoints field gt_keypoints to prompt the model.

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with keypoints
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="gt_keypoints",
)

session = fo.launch_app(dataset)

You can also use SAM 2 to automatically generate masks for the whole image without any prompts!

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart", max_samples=5, shuffle=True, seed=51
)

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Automatic segmentation
dataset.apply_model(model, label_field="auto")

session = fo.launch_app(dataset)

Using SAM 2 in FiftyOne for Video

SAM 2’s video segmentation and tracking capabilities make the process of propagating masks from one frame to another seamless. Let’s load a video dataset and only retain the bounding boxes for the first frame.

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart-video", max_samples=2)

# Only retain detections on the first frame of each video
(
    dataset
    .match_frames(F("frame_number") > 1)
    .set_field("frames.detections", None)
    .save()
)

session = fo.launch_app(dataset)

We see that only the first frame has annotations retained. So now we can use this prompt to generate segmentations using SAM 2 for the first frame and propagate it to all the frames of the video. It is as simple as calling apply_model on the dataset.

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-video-torch")

# Prompt with boxes
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="frames.detections", # Can be a detections or a keypoint field
)

session = fo.launch_app(dataset)

SAM 2’s segmentation and tracking capabilities in videos are very powerful. In this tutorial we have used the sam2_hiera_tiny model but you can use any of the following models now available in the Fiftyone Model Zoo:

Image models:

segment-anything-2-hiera-tiny-image-torch
segment-anything-2-hiera-small-image-torch
segment-anything-2-hiera-base-plus-image-torch
Segment-anything-2-hiera-large-image-torch

Video models:

segment-anything-2-hiera-tiny-video-torch
segment-anything-2-hiera-small-video-torch
segment-anything-2-hiera-base-plus-video-torch
Segment-anything-2-hiera-large-video-torch

Conclusion & Next Steps

In this tutorial we showed you how to, with just a few commands, download SAM 2 models and run inference on your FiftyOne image or video datasets. If you’d like to learn more, here are a few ways to get started:

Join the 3000+ AI builders in the FiftyOne Community Slack. This is the place to ask questions and get answers from fellow developers and scientists working on Visual AI in production.
Attend one of our Getting Started Workshops that cover all the topics you need to get up and running with FiftyOne and your datasets and models.
Hit up the FiftyOne GitHub repo to find everything you need to use FiftyOne for your Visual AI projects.

DEV Community

SAM 2 Is Now Available in FiftyOne!

Using SAM 2 in FiftyOne for Images

Using SAM 2 in FiftyOne for Video

Conclusion & Next Steps

Top comments (0)

Read next

Tech AI Fridays

5 Best Open-Source LLMs for AI Companionship

🌐 Frontend Development with AI: Hype vs. Reality

How many 'r's are in strawberry? And do LLMs know how to spell?