WebGPURenderer: Add compute shader bounds check#33186
Conversation
📦 Bundle sizeFull ESM build, minified and gzipped.
🌳 Bundle size after tree-shakingMinimal build including a renderer, camera, empty scene, and dependencies.
|
|
This is not something that should be automatically done for the user in my opinion. There's workgroup size, dispatch size, and now I see there's a "count" value? I'm actually a bit confused as to how these are supposed to work together. How is a single dimensional "count" value supposed to work with the higher dimensional dispatch and workgroup size values? edit Perhaps this is more of a comment on why this "count" value seems to already exist for compute kernels? There already seem to be so many different ways to do the same thing in TSL and many of them do not seem to interoperate cleanly. |
|
@gkjohnson what do you think would be a better solution? The The workgroups logic is not exposed to the basic user either. As that user I'd expect that |
I think better documentation may be the first step - but everything needed to run and length check a compute shader is already available.
Can you explain what you mean by this? You can set the workgroup size like so: // set the work group size for the shader
const kernel = fn.computeKernel( [ x, y, z ] );The work group size is associated with the shader itself with a compute kernel function annotation meaning a new shader needs to be generated when changing it, so it makes sense for this to be associated with the TSL fn kernel rather than anywhere else. And it can be accessed via One primary issue with setting "count" on the kernel is that it doesn't actually do what it says. The workgroup size * dispatch size is actually what dictates the number of threads that get run, not this variable. Looking at the code it would seem more appropriate to call this variable a "maxThreadCount" since it will (imo confusingly) truncate the number of threads that was actually requested by the user using the dispatch syntax: renderer.computeAsync( kernel, [ dispatch_x, dispatch_y, dispatch_z ] );Having a thread count associated with the kernel itself does not make sense because it's common to use a compute kernel for processing multiple different buffers meaning you want to issue it with different dispatch counts. So now if you had set this "maxThreadCount" value for the kernel and then switch the data being processed to a differently sized buffer it will not work as expected. Users should be writing their kernels to be resilient to the data sizes they are processing. It's possible to interrogate the dimensions of the buffers you're writing to and guard against writing outside the bounds of the buffer: const fn = wgslFn( /* wgsl */`
fn compute( globalId: vec3u ) -> void {
let size = textureDimensions( tex );
if ( globalId.x >= size.x || globalId.y >= size.y ) {
return;
}
textureStore( tex, globalId.xy, vec4( 0, 0, 0, 1 ) );
}
` )( { globalId } );Now the kernel is resilient to the size of the buffer being written to and you can still take advantage of the performance benefits of a more optimally set workGroupSize. If "count" were to truncate my compute shader here it may be doing so in a way that isn't aligned with my function definition. -- If you're not interested in tuning workgroup size for your use case then you can set the work group size to "1" and then dispatch based directly on the size of your buffer. Then you have no risk of overrunning the bounds of your data if you don't want to add a guard: const kernel = fn( { data } ).computeKernel( [ 1, 1, 1 ] );
renderer.compute( kernel, [ data.length, 1, 1 ] );I wasn't aware of this "count" value but based on the fact that it's inappopriately shaped as a single dimension, this seems vestigial from before 2d or 3d compute kernels were able to be run at all (see #31393). In my opinion this should be removed or at the least rethought significantly. |
|
Sorry that the docs are still not clear; I intend to improve that this year. |
|
@gkjohnson yes, this does not apply when using I moved the logic to |
|
This doesn't really address what I'm saying. These are objectively competing mental models of how to run a compute shader in TSL: the dispatch count being associated with the kernel vs being passed during the compute call. Nor is it even a complete implementation with support for multi dimension kernels. I understand the intent is for these to be used separately but with no warning or anything when using both paths together this is absolutely going to cause a user bug at some point - either within a single project or inconsistent use between a library and user expectations. There's value in having a well-defined and consistent way to do things. And again, why do we need so many ways of doing the same thing here? Is there actually a reason this feature should be included now? Or is it just "because it's already implemented"? |
|
this is more of a question for @sunag, and imo not related to this fix now since I am not adding a property to |
From simple to complex; you can do simple things in a complex way if you prefer for more control for example like WebGPU does natively, but that’s not the idea of TSL, and that’s why we have two main approaches here.
|
Related issue: Closes #33185
Description
See issue for details, added the internal WGSL bounds check with: