Skip to content

Add export methods for PIL and popular array libraries to the ScreenShot object#480

Merged
BoboTiG merged 26 commits into
BoboTiG:mainfrom
jholveck:feat-array-library-methods
Jun 30, 2026
Merged

Add export methods for PIL and popular array libraries to the ScreenShot object#480
BoboTiG merged 26 commits into
BoboTiG:mainfrom
jholveck:feat-array-library-methods

Conversation

@jholveck

@jholveck jholveck commented Feb 24, 2026

Copy link
Copy Markdown
Contributor

Add mss.ScreenShot.to_pil, to_numpy, to_torch, and to_tensorflow methods to mss.ScreenShot.

This change enhances integration with popular scientific and AI frameworks, simplifying image export, and improving documentation and examples. The most significant changes are the introduction of new export methods for ScreenShot objects, especially .to_pil() and .to_numpy(), across demos and documentation.

This gives users a set of tested and high-performance methods to easily export the pixel data to their preferred framework for processing. We can also update these methods as we improve MSS or as the frameworks add capabilities that we can use, allowing users to automatically gain these advantages without changing their code.

The documentation, demos, and examples are all changed to use the new methods. This simplified a lot of them; it's probably worth revisiting the examples page, possibly removing some of the export examples that are now just one-liners, or expanding on them a little to show usage beyond just the export step.

A new section is added to the usage documentation page, walking through the basic ideas of grabbing a screenshot and exporting the pixel data to another framework.

New tests are added. The CI/CD GitHub workflow will now install PyTorch and TensorFlow when available to test the relevant methods, but these are not added to the mss[tests] optional dependency group.

As an aside: this is the first MSS code change that adds some explicit GPU support. The new to_torch method has a device= parameter, and the new to_tensorflow method will respect the current TensorFlow device placement policy. (This asymmetry is because the two frameworks have very different traditions about device placement.) The change doesn't involve any GPU-specific code, and the initial screenshot's pixel data is still transferred to a CPU buffer, but we now provide a simple way for users to send it back to the GPU for further processing.

Changes proposed in this PR

Fixes #478

  • Tests added/updated
  • Documentation updated
  • Changelog entry added
  • ./check.sh passed

Currently, this shares memory with the .raw bytearray in some cases.
We should consider whether we want to change that to always copy, or
just wait until we have the full shared buffer support.

This implementation always does the array manipulation on the CPU.
For PyTorch or TensorFlow, the user is likely to want to do the array
manipulation on the GPU for performance.  This is very easy to
implement.
@BoboTiG

BoboTiG commented Feb 24, 2026

Copy link
Copy Markdown
Owner

Python 3.9 will be dropped in the next major release, so no need to take care of test failures.

I think we can do a release without this patch, right?

@jholveck

Copy link
Copy Markdown
Contributor Author

Certainly!

Python 3.9 doesn't define TypeAlias, and it's not required in this
case, so we can just remove it.
@jholveck

Copy link
Copy Markdown
Contributor Author

We can still wait to merge this, but the fix for Python 3.9 was simple, so I went ahead and committed it.

@BoboTiG

BoboTiG commented Apr 25, 2026

Copy link
Copy Markdown
Owner

You can revert the Python 3.9 fix commit ;)

@jholveck

jholveck commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

I've updated this a bit, but should probably review it before committing. It's been a while!

jholveck added 2 commits June 24, 2026 14:49
The reverted change was from a temporary test, but I accidentally
included it in the commit.  It's not relevant to the issue at hand.
@jholveck

Copy link
Copy Markdown
Contributor Author

This is on hold pending a resolution to #536, which will affect how we should document these methods and their sharing semantics.

jholveck added 4 commits June 24, 2026 17:42
The docstrings weren't all in Sphinx format, and could use a little
clarification in places.

TensorFlow itself allows NumPy dtypes and DataType enums, so expand
to_tensorflow to accept those too.
The main point of this commit is to be explicit about the memory-sharing
semantics: mostly, that we don't guarantee memory sharing one way or
another.  More to the point, that if they modify the pixel data in one
of the exported objects, it may or may not affect the other exported
objects.

Memory sharing isn't expected to affect most users, since they
are unlikely to access the pixel data with two different exported
objects, but it's best to be explicit.

To have a good place to put the memory-sharing discussion, I added a new
section to the usage.rst file, listing the ways to access the pixel
data.  This is also where I documented the NumPy array interface
protocol, which previously didn't show up in our Sphinx docs.  (It might
have been in the docs for 10.1, which didn't use autodoc.)

I also put a discussion of the alpha channel in this new section: one
thing I keep forgetting is to delete the alpha channel when using
Matplotlib, and wondering why it's blank.

TODO: We probably should document `grab` in the usage.rst file for the
new section to make sense.

In the process, I've also added intersphinx links to libraries like PIL
and NumPy, so that our Sphinx docs will link to them.  While checking
that, I fixed a few other references (like referring to bytes with
:py:type:, when it's indexed under :py:class:).
@jholveck

Copy link
Copy Markdown
Contributor Author

My last commit added a lot to the docs. There's a couple more things I want to add to the docs before closing this, even if they're somewhat out of scope for this PR.

@BoboTiG

BoboTiG commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Great! The documentation is very very good since you both started to work on the project with @halldorfannar 🤟

Go ahead for out-of-scope commits, I'll merge when you're ready.

jholveck added 10 commits June 26, 2026 06:21
Doc fixes.  Speed up converting to a NumPy array in RGB order.  That
sort of thing.
While I was writing this, I realized something else.  We may want to
change the to_torch and to_tensorflow methods to accept device and
stream arguments, and do the channel and layout shuffling on the GPU
instead of the CPU.  This wouldn't require us to natively support CUDA
or anything like that; we'd just be using PyTorch / TensorFlow's
existing CUDA support, which is easy.

This demo still (as before) shows the GPU-focused way to do the
channel and layout shuffling, presented as an alternative to using
to_torch.  A quick test shows about 107 ms / frame (to_torch)
vs. about 62 ms / frame (GPU-focused routine), and that's including
the CNN itself too.  So there's a very significant benefit there.
(That's not quite a fair comparison, since the GPU-focused routine is
exactly what this program needs, but not so it'd be a big difference.)
While improving performance earlier in this PR, I had introduced a
case where to_torch would fail only for the RGB order.  It also seemed
likely that some cases might fail only if using particular
combinations of channel order and layout, so we just test them all.
This adds a device= flag to to_torch, so the user can specify the
destination device.  (TensorFlow has more automatic device management,
and we don't need to do anything explicit.)

We also do all the channel shuffling to the GPU.  For cases when
channel shuffling or dtype conversion requires a copy, this is immensely
faster than the alternative.

I ran a rough test with the cat detector on my 4k monitor (for
everything, including the CNN inference).  On a GPU, there's a
significant performance boost.  However, this is now slower when run
on the CPU.

* Old code, with GPU: 99 ms/frame
* New code, with GPU: 56 ms/frame
* Old code, no GPU: 3164 ms/frame
* New code, no GPU: 3219 ms/frame
* New code, running to_torch(device="cpu") and and then transferring
  to the GPU for inference: 117 ms/frame

It seems the new to_torch is slower when running on the CPU.  I
suspect (but this is a loose guess) that NumPy, which always did the
channel shuffles in the old code, was faster at that.  It might also
be specific to the extra copy that I think this change introduces in
the RGB CHW conversion path in this, due to PyTorch limitations.

I don't think this is a significant problem: PyTorch and TensorFlow
users are almost all going to be using GPUs.  I will still look at
ways to improve the CPU performance on PyTorch, though.
The PyTorch implementation of the channel shuffling operations on the
CPU is significantly slower (like by 5x or so) than the NumPy
implementation.  (It's still much faster on the GPU, were the PyTorch
devs spend their focus, and also have much faster RAM.)

So, if the user requests a CPU tensor, we now do the channel shuffling
work in NumPy.
It's fragile, harder to manage lifetimes, and doesn't win us
measurable gains.
@jholveck

Copy link
Copy Markdown
Contributor Author

Well, that test was weird. Glad you caught it.

All the stages seem to have completed in seconds. So I think this was a transient issue in GitHub's orchestration, not a problem with MSS. Orchestrating CI/CD on macOS is hard.

If it recurs, we can look more at it. But I'm not concerned yet.

jholveck added 3 commits June 29, 2026 18:47
TensorFlow is designed around graphs of ops for a DNN, not eager
channel shuffling and similar operations.  It's much faster to do the
channel shuffles in NumPy, although we still do the dtype conversion
in TensorFlow.
@jholveck

Copy link
Copy Markdown
Contributor Author

I think this is now feature-complete, and ready for @BoboTiG and @halldorfannar 's reviews. Sorry it's such a long PR!

# manipulation library called PIL, or Pillow, that can do that. Let's transfer the raw pixels in
# the ScreenShot object into a PIL Image.
original_image = Image.frombuffer("RGB", screenshot.size, screenshot.bgra, "raw", "BGRX", 0, 1)
original_image = screenshot.to_pil("RGB")

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so much cleaner that way!

Comment thread docs/source/usage.rst Outdated
Comment thread src/tests/third_party/array_frameworks/test_numpy_method.py
Comment thread src/tests/third_party/array_frameworks/test_tensorflow_method.py
Comment thread src/tests/third_party/array_frameworks/test_torch_method.py
As suggested by @BoboTiG

Also a couple of other tiny improvements in that area I noticed.
@jholveck

Copy link
Copy Markdown
Contributor Author

It seems that one of the macOS tests is stuck again.

Also, I previously said the stages had completed, but was mistaken: this is hanging during PyTest. If the CI/CD logs are complete (that is, not buffering several lines at a time), then it seems like it's now executing src/tests/test_issue_220.py::test_regression.

I'm not sure if this is a change in macOS, Python, or what. It could conceivably be related to my code, but I'm not sure how.

The default is six hours for the whole job.

We've been seeing PyTest freezing during some macOS runs.  Not sure
what's up, but at least we can not wait for a six hour timeout.

Most tests seem to take between two and three minutes, so this seems
like quite a generous timeout.
@BoboTiG

BoboTiG commented Jun 30, 2026

Copy link
Copy Markdown
Owner

This test stressing tkinter, and the OS being macOS, we are not yet ready for all potential suprprises we could hit. Lets see if it happens on a regular basis, for now if retrying manually on timout fixes it, that's good.

@jholveck

jholveck commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

The original issue, #220, was Linux-specific. You also tracked it down to an issue specific to Xlib error handling. Error handling is one big reason I wanted to migrate to XCB: it has a model that's much easier to work with when you have multiple libraries in a single program, let alone multiple threads.

I'll take a bit of a closer look at the issue and see if maybe it's making assumptions about things like event ordering that tkinter doesn't actually promise, and so has a race condition that just didn't happen to be a problem until recently. But if I don't find a way to fix the hang, I also wouldn't feel bad about limiting that test to just Linux.

Edit: if I do pursue changes related to this, it'll be in a separate PR. I just put the timeout in this PR so it didn't keep bugging us as we close it up.

@halldorfannar

Copy link
Copy Markdown
Contributor

Looks good to me!

@jholveck jholveck marked this pull request as ready for review June 30, 2026 19:31
@jholveck

Copy link
Copy Markdown
Contributor Author

Ok! I think there's always more to improve, but I feel like it's at a good point to merge.

@BoboTiG BoboTiG merged commit 5983317 into BoboTiG:main Jun 30, 2026
18 checks passed
@jholveck jholveck deleted the feat-array-library-methods branch July 1, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add .to_pil, etc methods to ScreenShot

3 participants