Add export methods for PIL and popular array libraries to the ScreenShot object#480
Conversation
Currently, this shares memory with the .raw bytearray in some cases. We should consider whether we want to change that to always copy, or just wait until we have the full shared buffer support. This implementation always does the array manipulation on the CPU. For PyTorch or TensorFlow, the user is likely to want to do the array manipulation on the GPU for performance. This is very easy to implement.
|
Python 3.9 will be dropped in the next major release, so no need to take care of test failures. I think we can do a release without this patch, right? |
|
Certainly! |
Python 3.9 doesn't define TypeAlias, and it's not required in this case, so we can just remove it.
|
We can still wait to merge this, but the fix for Python 3.9 was simple, so I went ahead and committed it. |
|
You can revert the Python 3.9 fix commit ;) |
|
I've updated this a bit, but should probably review it before committing. It's been a while! |
The reverted change was from a temporary test, but I accidentally included it in the commit. It's not relevant to the issue at hand.
|
This is on hold pending a resolution to #536, which will affect how we should document these methods and their sharing semantics. |
The docstrings weren't all in Sphinx format, and could use a little clarification in places. TensorFlow itself allows NumPy dtypes and DataType enums, so expand to_tensorflow to accept those too.
The main point of this commit is to be explicit about the memory-sharing semantics: mostly, that we don't guarantee memory sharing one way or another. More to the point, that if they modify the pixel data in one of the exported objects, it may or may not affect the other exported objects. Memory sharing isn't expected to affect most users, since they are unlikely to access the pixel data with two different exported objects, but it's best to be explicit. To have a good place to put the memory-sharing discussion, I added a new section to the usage.rst file, listing the ways to access the pixel data. This is also where I documented the NumPy array interface protocol, which previously didn't show up in our Sphinx docs. (It might have been in the docs for 10.1, which didn't use autodoc.) I also put a discussion of the alpha channel in this new section: one thing I keep forgetting is to delete the alpha channel when using Matplotlib, and wondering why it's blank. TODO: We probably should document `grab` in the usage.rst file for the new section to make sense. In the process, I've also added intersphinx links to libraries like PIL and NumPy, so that our Sphinx docs will link to them. While checking that, I fixed a few other references (like referring to bytes with :py:type:, when it's indexed under :py:class:).
|
My last commit added a lot to the docs. There's a couple more things I want to add to the docs before closing this, even if they're somewhat out of scope for this PR. |
|
Great! The documentation is very very good since you both started to work on the project with @halldorfannar 🤟 Go ahead for out-of-scope commits, I'll merge when you're ready. |
Doc fixes. Speed up converting to a NumPy array in RGB order. That sort of thing.
While I was writing this, I realized something else. We may want to change the to_torch and to_tensorflow methods to accept device and stream arguments, and do the channel and layout shuffling on the GPU instead of the CPU. This wouldn't require us to natively support CUDA or anything like that; we'd just be using PyTorch / TensorFlow's existing CUDA support, which is easy. This demo still (as before) shows the GPU-focused way to do the channel and layout shuffling, presented as an alternative to using to_torch. A quick test shows about 107 ms / frame (to_torch) vs. about 62 ms / frame (GPU-focused routine), and that's including the CNN itself too. So there's a very significant benefit there. (That's not quite a fair comparison, since the GPU-focused routine is exactly what this program needs, but not so it'd be a big difference.)
While improving performance earlier in this PR, I had introduced a case where to_torch would fail only for the RGB order. It also seemed likely that some cases might fail only if using particular combinations of channel order and layout, so we just test them all.
This adds a device= flag to to_torch, so the user can specify the destination device. (TensorFlow has more automatic device management, and we don't need to do anything explicit.) We also do all the channel shuffling to the GPU. For cases when channel shuffling or dtype conversion requires a copy, this is immensely faster than the alternative. I ran a rough test with the cat detector on my 4k monitor (for everything, including the CNN inference). On a GPU, there's a significant performance boost. However, this is now slower when run on the CPU. * Old code, with GPU: 99 ms/frame * New code, with GPU: 56 ms/frame * Old code, no GPU: 3164 ms/frame * New code, no GPU: 3219 ms/frame * New code, running to_torch(device="cpu") and and then transferring to the GPU for inference: 117 ms/frame It seems the new to_torch is slower when running on the CPU. I suspect (but this is a loose guess) that NumPy, which always did the channel shuffles in the old code, was faster at that. It might also be specific to the extra copy that I think this change introduces in the RGB CHW conversion path in this, due to PyTorch limitations. I don't think this is a significant problem: PyTorch and TensorFlow users are almost all going to be using GPUs. I will still look at ways to improve the CPU performance on PyTorch, though.
The PyTorch implementation of the channel shuffling operations on the CPU is significantly slower (like by 5x or so) than the NumPy implementation. (It's still much faster on the GPU, were the PyTorch devs spend their focus, and also have much faster RAM.) So, if the user requests a CPU tensor, we now do the channel shuffling work in NumPy.
It's fragile, harder to manage lifetimes, and doesn't win us measurable gains.
|
Well, that test was weird. Glad you caught it. All the stages seem to have completed in seconds. So I think this was a transient issue in GitHub's orchestration, not a problem with MSS. Orchestrating CI/CD on macOS is hard. If it recurs, we can look more at it. But I'm not concerned yet. |
TensorFlow is designed around graphs of ops for a DNN, not eager channel shuffling and similar operations. It's much faster to do the channel shuffles in NumPy, although we still do the dtype conversion in TensorFlow.
|
I think this is now feature-complete, and ready for @BoboTiG and @halldorfannar 's reviews. Sorry it's such a long PR! |
| # manipulation library called PIL, or Pillow, that can do that. Let's transfer the raw pixels in | ||
| # the ScreenShot object into a PIL Image. | ||
| original_image = Image.frombuffer("RGB", screenshot.size, screenshot.bgra, "raw", "BGRX", 0, 1) | ||
| original_image = screenshot.to_pil("RGB") |
There was a problem hiding this comment.
That's so much cleaner that way!
As suggested by @BoboTiG Also a couple of other tiny improvements in that area I noticed.
|
It seems that one of the macOS tests is stuck again. Also, I previously said the stages had completed, but was mistaken: this is hanging during PyTest. If the CI/CD logs are complete (that is, not buffering several lines at a time), then it seems like it's now executing I'm not sure if this is a change in macOS, Python, or what. It could conceivably be related to my code, but I'm not sure how. |
The default is six hours for the whole job. We've been seeing PyTest freezing during some macOS runs. Not sure what's up, but at least we can not wait for a six hour timeout. Most tests seem to take between two and three minutes, so this seems like quite a generous timeout.
|
This test stressing tkinter, and the OS being macOS, we are not yet ready for all potential suprprises we could hit. Lets see if it happens on a regular basis, for now if retrying manually on timout fixes it, that's good. |
|
The original issue, #220, was Linux-specific. You also tracked it down to an issue specific to Xlib error handling. Error handling is one big reason I wanted to migrate to XCB: it has a model that's much easier to work with when you have multiple libraries in a single program, let alone multiple threads. I'll take a bit of a closer look at the issue and see if maybe it's making assumptions about things like event ordering that tkinter doesn't actually promise, and so has a race condition that just didn't happen to be a problem until recently. But if I don't find a way to fix the hang, I also wouldn't feel bad about limiting that test to just Linux. Edit: if I do pursue changes related to this, it'll be in a separate PR. I just put the timeout in this PR so it didn't keep bugging us as we close it up. |
|
Looks good to me! |
|
Ok! I think there's always more to improve, but I feel like it's at a good point to merge. |
Add
mss.ScreenShot.to_pil,to_numpy,to_torch, andto_tensorflowmethods tomss.ScreenShot.This change enhances integration with popular scientific and AI frameworks, simplifying image export, and improving documentation and examples. The most significant changes are the introduction of new export methods for
ScreenShotobjects, especially.to_pil()and.to_numpy(), across demos and documentation.This gives users a set of tested and high-performance methods to easily export the pixel data to their preferred framework for processing. We can also update these methods as we improve MSS or as the frameworks add capabilities that we can use, allowing users to automatically gain these advantages without changing their code.
The documentation, demos, and examples are all changed to use the new methods. This simplified a lot of them; it's probably worth revisiting the examples page, possibly removing some of the export examples that are now just one-liners, or expanding on them a little to show usage beyond just the export step.
A new section is added to the usage documentation page, walking through the basic ideas of grabbing a screenshot and exporting the pixel data to another framework.
New tests are added. The CI/CD GitHub workflow will now install PyTorch and TensorFlow when available to test the relevant methods, but these are not added to the
mss[tests]optional dependency group.As an aside: this is the first MSS code change that adds some explicit GPU support. The new
to_torchmethod has adevice=parameter, and the newto_tensorflowmethod will respect the current TensorFlow device placement policy. (This asymmetry is because the two frameworks have very different traditions about device placement.) The change doesn't involve any GPU-specific code, and the initial screenshot's pixel data is still transferred to a CPU buffer, but we now provide a simple way for users to send it back to the GPU for further processing.Changes proposed in this PR
Fixes #478
./check.shpassed