DEV Community

Cover image for How to Visualise MediaPipe’s Human Pose Tracking in 2D and 3D with Rerun
Rerun
Rerun

Posted on

How to Visualise MediaPipe’s Human Pose Tracking in 2D and 3D with Rerun

Try it in browser Source Code Explore Other Examples

Human pose tracking is a task in computer vision that focuses on identifying key body locations, analyzing posture, and categorizing movements. At the heart of this technology is a pre-trained machine-learning model to assess the visual input and recognize landmarks on the body in both image coordinates and 3D world coordinates. The use cases and applications of this technology include but are not limited to Human-Computer Interaction, Sports Analysis, Gaming, Virtual Reality, Augmented Reality, Health, etc.

In this example, the MediaPipe Pose Landmark Detection solution was utilized to detect and track human pose landmarks and produces segmentation masks for humans. Rerun was employed to visualize the output of the Mediapipe solution over time to make it easy to analyze the behavior.


Logging and visualizing with Rerun

The visualizations in this example were created with the following Rerun code.


Timelines

For each processed video frame, all data sent to Rerun is associated with the two timelines time and frame_idx.

rr.set_time_seconds("time", bgr_frame.time)
rr.set_time_sequence("frame_idx", bgr_frame.idx)
Enter fullscreen mode Exit fullscreen mode

Video

The input video is logged as a sequence of Image objects to the 'Video' entity.

rr.log(
    "video/rgb",
    rr.Image(rgb).compress(jpeg_quality=75)
)
Enter fullscreen mode Exit fullscreen mode

Segmentation mask

The segmentation result is logged through a combination of two archetypes. The segmentation image itself is logged as an SegmentationImage and contains the id for each pixel. The color is determined by the AnnotationContext which is logged with timeless=True as it should apply to the whole sequence.

Label mapping

rr.log(
    "video/mask",
    rr.AnnotationContext(
        [
            rr.AnnotationInfo(id=0, label="Background"),
            rr.AnnotationInfo(id=1, label="Person", color=(0, 0, 0)),
        ]
    ),
    timeless=True,
)
Enter fullscreen mode Exit fullscreen mode

Segmentation image

Image description

rr.log(
    "video/mask",
    rr.SegmentationImage(segmentation_mask.astype(np.uint8))
)
Enter fullscreen mode Exit fullscreen mode

Body Pose Points

Logging the body pose landmarks involves specifying connections between the points, extracting pose landmark points and logging them to the Rerun SDK.

The 2D and 3D points are logged through a combination of two archetypes. First, a timeless ClassDescription is logged, that contains the information which maps keypoint ids to labels and how to connect the keypoints. Defining these connections automatically renders lines between them. Mediapipe provides the POSE_CONNECTIONS variable which contains the list of (from, to) landmark indices that define the connections. Second, the actual keypoint positions are logged in 2D and 3D as Points2D and Points3D archetypes, respectively.

Label mapping and keypoint connections

rr.log(
    "/",
    rr.AnnotationContext(
        rr.ClassDescription(
            info=rr.AnnotationInfo(id=1, label="Person"),
            keypoint_annotations=[rr.AnnotationInfo(id=lm.value, label=lm.name) for lm in mp_pose.PoseLandmark],
            keypoint_connections=mp_pose.POSE_CONNECTIONS,
        )
    ),
    timeless=True,
)
Enter fullscreen mode Exit fullscreen mode

2D points

The 2D points are visualized over the image/video for a better understanding and visualization of the body pose.
Image description

rr.log(
    "video/pose/points",
    rr.Points2D(landmark_positions_2d, class_ids=1, keypoint_ids=mp_pose.PoseLandmark)
)
Enter fullscreen mode Exit fullscreen mode

3D points

The 3D points allows the creation of a 3D model of the body posture for a more comprehensive representation of the human pose.
Image description

rr.log(
    "person/pose/points",
    rr.Points3D(landmark_positions_3d, class_ids=1, keypoint_ids=mp_pose.PoseLandmark),
)
Enter fullscreen mode Exit fullscreen mode

Join us on Github

GitHub logo rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

Build time aware visualizations of multimodal data

Use the Rerun SDK (available for C++, Python and Rust) to log data like images, tensors, point clouds, and text. Logs are streamed to the Rerun Viewer for live visualization or to file for later use.

A short taste

import rerun as rr  # pip install rerun-sdk
rr.init("rerun_example_app")

rr.connect()  # Connect to a remote viewer
# rr.spawn()  # Spawn a child process with a viewer and connect
# rr.save("recording.rrd")  # Stream all logs to disk

# Associate subsequent data with 42 on the “frame” timeline
rr.set_time_sequence("frame", 42))

# Log colored 3D points to the entity at `path/to/points`
rr.log("path/to/points", rr.Points3D(positions, colors=colors
Enter fullscreen mode Exit fullscreen mode

Top comments (0)