DEV Community

bedbad
bedbad

Posted on

Vectorized Py Plot Library

I wrote a plotting lib: https://github.com/bedbad/justpyplot

It has 0 loops in plotting exec path;
Not even for connecting N input point lines with thickness, color style.
And it lets you get your plots straight in your numpy array without extra copys, which you can then use and control with all the flexibility you need, as you control any basic element of plot like color, fonts, grids, etc - Want just overlay a bunch of dynamically rendered figures on your camera feed - just set opacities to 0 and put it where you need.

Here's rough first demo, I'd update with a more appealing one:

Image description

The Backstory

I had pretty important and intense use case - visually debugging applications for computer vision and robotics (the thing has actual human users whom it helps with disability so it important for them - and me as a developer). I have physically meaningful cogs to be measured at different depths of computer vision pipeline that evaluates and calibrates non-trivial physical things - through camera, and acts on them real-time. I need to the plots to see how it worked in the last few milliseconds I manually interfered with it. This real time plots together with manipulation give me good feel of what's off in the system.

First thing I tried to satisfy my use case were the status quo libraries - mostly just matplotlib. I got really frustrated and fed up with it trying to adapt it. Not for the lack of trying - tried to ask how to adapt it for my use cases for better half of the week. The first issue for me - It being a plotting library doesn't let you actually get a plot, if we consider plot to be first of all an image.

If you go beyond storing images, image is supposed to be held in the most ubiquitous Python type - array (Numpy array, torch tensor), which uniquely fit to hold the image data.I can't make my mind work around the fact that matplotlib just violates the first principles of system design such as being clear on the input, the side-effect and the output - to maximize the output divided by input with controlling the side-effect.

You have bend backwards to get the bytes of the plot image:

fig.canvas.draw()io_buf = io.BytesIO()

fig.savefig(io_buf, format='raw', dpi=DPI)

io_buf.seek(0)

img_arr = np.reshape(np.frombuffer(io_buf.getvalue(), dtype=np.uint8),newshape=(int(fig.bbox.bounds[3]), int(fig.bbox.bounds[2]), -1))

io_buf.close()
Enter fullscreen mode Exit fullscreen mode

One full loop run for single plot takes at least several milliseconds up to, more usually dozens, even hundreds of milliseconds. All similar ways of doing it 1) getting the canvas 2) making sure figure is drawn 3) returning/disabling interactive mode displa of the plot, although it works differently on different OSes 4) when you finally got the bits to copy I need, copy and mold it into NumPy ndarray is on you.

That's not the only concern with it, there are others that have popular Reddit posts: https://www.reddit.com/r/Python/comments/9fb9i3/am_i_the_only_one_who_hates_matplotlib/ https://www.reddit.com/r/Python/comments/u8j6fn/unpopular_opinion_matplotlib_is_a_bad_library/ https://www.reddit.com/r/bioinformatics/comments/t13510/matplotlib_sucks/

Performance

With fancy features for N=50 points justpyplot runs in:

avg 382 µs ± 135 µs, max 962 µs

And plotting is roughly 2/3 of it so it might be soon improved

The library is a good showcase of how to do vectorized things when you just starting with ML Python.
For examople, simplest is the grid
pxdelta adjusted by mod of grid density and grid is simply:

img_array[top_left[1]:bottom_right[1]+1:pxdelta, top_left[0]:bottom_right[0]+1,:] = grid_color

img_array[top_left[1]:bottom_right[1]+1, top_left[0]:bottom_right[0]+1:pxdelta,:] = grid_color
Enter fullscreen mode Exit fullscreen mode

Points (just scatter plot) and line segments(connected plot) are more advanced -take a look at the code.
To make a point that nothing is beyond reach make plotting/rendering as performant as ML python is I went fancy and done fully vectorized thickness parametrazation for segment lines connecting all the points - that is nontrivial teaser for ML/CV Python enthusiasts. Similar problems come up in geometric ML all the time.

I want to get a feel for two things - how much that matplotlib is a pain and/or how people behave in similar use cases.
Thanks for reading!

Top comments (0)