feat: Add Motif-Video model and pipelines#13551
feat: Add Motif-Video model and pipelines#13551waitingcheung wants to merge 88 commits intohuggingface:mainfrom
Conversation
|
Quick ping for visibility. This PR adds Motif-Video (T2V/I2V + new transformer and pipelines). Would appreciate your feedback, especially on dependency/version constraints:
This is currently blocking some diffusers-side integration, so your input would help. A working branch for this integration is available here. |
…dance support Add complete Motif Video implementation to diffusers: New Models: - Add MotifVideoTransformer3DModel with T5Gemma2Encoder for multimodal conditioning - Supports text-to-video and image-to-video generation with vision tower integration New Pipelines: - Add MotifVideoPipeline for text-to-video generation - Default resolution: 736x1280, 121 frames, 25 fps - Supports classifier-free guidance and AdaptiveProjectedGuidance - Add MotifVideoImage2VideoPipeline for image-to-video generation - First frame conditioning with vision encoder - Same defaults as T2V pipeline Enhanced Guidance: - Update AdaptiveProjectedGuidance with normalization_dims parameter - Support "spatial" normalization for 5D tensors (per-frame spatial normalization) - Support custom dimension lists for flexible normalization - Update AdaptiveProjectedMixGuidance with same parameter Documentation & Tests: - Add comprehensive API documentation for transformer and pipelines - Add test suites for both T2V and I2V pipelines - Register all new components in __init__ files - Add dummy objects for torch and transformers backends Total: 18 files changed, 3416 insertions(+), 2 deletions(-)
cd20ffc to
81cce23
Compare
I think we can guard the transformers import in the pipeline with something like
|
We have something like this at the top of the pipeline code to guide the users to upgrade the # Check transformers version before importing T5Gemma2Encoder
if not is_transformers_version(">=", "5.1.0"):
import transformers
raise ImportError(
f"MotifVideoPipeline requires transformers>=5.1.0. "
f"Found: {transformers.__version__}. "
"Please upgrade transformers: pip install transformers --upgrade"
) |
|
Then it will cut it. |
… into feat/motif-video
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
… into feat/motif-video
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
There was a problem hiding this comment.
Thanks for iterating! I think this PR is close to merge. I also have the following comments:
- The following model offloading tests fail for the I2V pipeline:
test_pipeline_level_group_offloading_inference,test_sequential_cpu_offload_forward_pass, andtest_sequential_offload_forward_pass_twice. I think the reason is thatT5Gemma2Encoder'svision_towercurrently doesn't support either block-level or leaf-level offloading. So I think it's fine to skip these tests for now. - For the
_keep_in_fp32_modulesissue (#13551 (comment)), I believe there are some bugs in the way GGUF interacts with_keep_in_fp32_modules. I will open a separate PR for this (EDIT: opened at #13697).
I'm not sure what to do about the HF Hub CI test failures or the PR documentation build failure, which look like they are both due to the fact that Motif-Video requires transformers>=5.1.0 for T5Gemma2Encoder. @sayakpaul do you have any ideas?
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
… into feat/motif-video
|
@dg845 @sayakpaul I’m also open to other suggestions or approaches here if there’s a better way to handle the HF Hub CI and docs build failures in the meantime. |
What does this PR do?
This PR adds support for Motif-Video - a text-to-video (T2V) and image-to-video (I2V) diffusion model from Motif Technologies. The implementation includes the transformer architecture, both pipeline variants, guiding configurations, and comprehensive documentation.
Changes
New Files
src/diffusers/models/transformers/transformer_motif_video.py- MotifVideoTransformer3DModelsrc/diffusers/pipelines/motif_video/pipeline_motif_video.py- Text-to-Videosrc/diffusers/pipelines/motif_video/pipeline_motif_video_image2video.py- Image-to-Videosrc/diffusers/pipelines/motif_video/pipeline_output.pytests/pipelines/motif_video/test_motif_video.pytests/pipelines/motif_video/test_motif_video_image2video.pydocs/source/en/api/models/motif_video_transformer_3d.mddocs/source/en/api/pipelines/motif_video.mdKey Features
Version Requirements
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.