feat: Add Motif-Video model and pipelines by waitingcheung · Pull Request #13551 · huggingface/diffusers

waitingcheung · 2026-04-23T05:54:56Z

What does this PR do?

This PR adds support for Motif-Video - a text-to-video (T2V) and image-to-video (I2V) diffusion model from Motif Technologies. The implementation includes the transformer architecture, both pipeline variants, guiding configurations, and comprehensive documentation.

Changes

New Files

Model: src/diffusers/models/transformers/transformer_motif_video.py - MotifVideoTransformer3DModel
Pipelines:
- src/diffusers/pipelines/motif_video/pipeline_motif_video.py - Text-to-Video
- src/diffusers/pipelines/motif_video/pipeline_motif_video_image2video.py - Image-to-Video
Output: src/diffusers/pipelines/motif_video/pipeline_output.py
Tests:
- tests/pipelines/motif_video/test_motif_video.py
- tests/pipelines/motif_video/test_motif_video_image2video.py
Documentation:
- docs/source/en/api/models/motif_video_transformer_3d.md
- docs/source/en/api/pipelines/motif_video.md

Key Features

Architecture: DiT-based transformer with T5Gemma2Encoder for text encoding
Flow Match: Uses FlowMatchEulerDiscreteScheduler
Guiding: Supports ClassifierFreeGuidance, SkipLayerGuidance, and AdaptiveProjectedGuidance
Video Processing: Wan-style VAE for video encoding/decoding

Version Requirements

transformers>=5.1.0 - Required for T5Gemma2Encoder (critical bug fix in PR #43633)
The pipeline includes a version check that raises a clear error with upgrade instructions if the transformers version is too old

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

waitingcheung · 2026-04-23T06:06:59Z

@yiyixuxu @asomoza @sayakpaul

Quick ping for visibility. This PR adds Motif-Video (T2V/I2V + new transformer and pipelines).

Would appreciate your feedback, especially on dependency/version constraints:

transformers>=5.1.0 for T5Gemma2Encoder (currently enforced via an assertion with an upgrade message)
compel requiring transformers<5, which may conflict with diffusers usage

This is currently blocking some diffusers-side integration, so your input would help.

A working branch for this integration is available here.

…dance support Add complete Motif Video implementation to diffusers: New Models: - Add MotifVideoTransformer3DModel with T5Gemma2Encoder for multimodal conditioning - Supports text-to-video and image-to-video generation with vision tower integration New Pipelines: - Add MotifVideoPipeline for text-to-video generation - Default resolution: 736x1280, 121 frames, 25 fps - Supports classifier-free guidance and AdaptiveProjectedGuidance - Add MotifVideoImage2VideoPipeline for image-to-video generation - First frame conditioning with vision encoder - Same defaults as T2V pipeline Enhanced Guidance: - Update AdaptiveProjectedGuidance with normalization_dims parameter - Support "spatial" normalization for 5D tensors (per-frame spatial normalization) - Support custom dimension lists for flexible normalization - Update AdaptiveProjectedMixGuidance with same parameter Documentation & Tests: - Add comprehensive API documentation for transformer and pipelines - Add test suites for both T2V and I2V pipelines - Register all new components in __init__ files - Add dummy objects for torch and transformers backends Total: 18 files changed, 3416 insertions(+), 2 deletions(-)

sayakpaul · 2026-04-23T10:25:30Z

transformers>=5.1.0 for T5Gemma2Encoder (currently enforced via an assertion with an upgrade message)

I think we can guard the transformers import in the pipeline with something like is_transformers_version("<", "5.1.0")?

compel conflict is fine IMO.

waitingcheung · 2026-04-23T10:29:10Z

transformers>=5.1.0 for T5Gemma2Encoder (currently enforced via an assertion with an upgrade message)

I think we can guard the transformers import in the pipeline with something like is_transformers_version("<", "5.1.0")?

compel conflict is fine IMO.

We have something like this at the top of the pipeline code to guide the users to upgrade the transformers package before importing T5Gemma2Encoder

# Check transformers version before importing T5Gemma2Encoder
if not is_transformers_version(">=", "5.1.0"):
    import transformers

    raise ImportError(
        f"MotifVideoPipeline requires transformers>=5.1.0. "
        f"Found: {transformers.__version__}. "
        "Please upgrade transformers: pip install transformers --upgrade"
    )

sayakpaul · 2026-04-23T10:30:11Z

Then it will cut it.

waitingcheung · 2026-04-28T05:53:25Z

@dg845 , @yiyixuxu
I would appreciate your feedback on this PR when you have a moment.

… into feat/motif-video

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

… into feat/motif-video

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

dg845

Thanks for iterating! I think this PR is close to merge. I also have the following comments:

The following model offloading tests fail for the I2V pipeline: test_pipeline_level_group_offloading_inference, test_sequential_cpu_offload_forward_pass, and test_sequential_offload_forward_pass_twice. I think the reason is that T5Gemma2Encoder's vision_tower currently doesn't support either block-level or leaf-level offloading. So I think it's fine to skip these tests for now.
For the _keep_in_fp32_modules issue (#13551 (comment)), I believe there are some bugs in the way GGUF interacts with _keep_in_fp32_modules. I will open a separate PR for this (EDIT: opened at #13697).

I'm not sure what to do about the HF Hub CI test failures or the PR documentation build failure, which look like they are both due to the fact that Motif-Video requires transformers>=5.1.0 for T5Gemma2Encoder. @sayakpaul do you have any ideas?

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

… into feat/motif-video

waitingcheung · 2026-05-08T07:49:14Z

@dg845
Thanks for the detailed review and suggestions!
I’ve now skipped the failing model offloading tests for the I2V pipeline as suggested.

@sayakpaul
For the transformers>=5.1.0 compatibility issue, I’ve followed up both on the upstream PR and with the repo author regarding the Transformers 5 integration for Compel:
damian0815/compel#129

I’m also open to other suggestions or approaches here if there’s a better way to handle the HF Hub CI and docs build failures in the meantime.

github-actions Bot added documentation Improvements or additions to documentation models tests utils pipelines guiders size/L PR with diff > 200 LOC labels Apr 23, 2026

waitingcheung changed the title ~~Add Motif Video model and pipelines~~ Add Motif-Video model and pipelines Apr 23, 2026

waitingcheung marked this pull request as ready for review April 23, 2026 06:07

waitingcheung changed the title ~~Add Motif-Video model and pipelines~~ feat: Add Motif-Video model and pipelines Apr 23, 2026

waitingcheung force-pushed the feat/motif-video branch from cd20ffc to 81cce23 Compare April 23, 2026 07:08

github-actions Bot added single-file size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 23, 2026

sayakpaul requested review from dg845 and yiyixuxu April 23, 2026 10:25

Merge branch 'main' into feat/motif-video

44045b2

github-actions Bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 27, 2026

Merge branch 'main' into feat/motif-video

127810b

github-actions Bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 28, 2026

waitingcheung and others added 2 commits April 29, 2026 02:09

Merge branch 'main' into feat/motif-video

c2f1a14

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

e3230cc

… into feat/motif-video

waitingcheung added 6 commits May 7, 2026 13:08

Fix test_cpu_offload_forward_pass_twice

b0dbfad

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

576c22e

… into feat/motif-video

Merge branch 'main' into feat/motif-video

9de497f

Merge branch 'main' into feat/motif-video

175a05d

Merge branch 'main' into feat/motif-video

ebed9ac

Merge branch 'main' into feat/motif-video

ab4273e

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video.py Outdated

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video.py Outdated

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video.py Outdated

waitingcheung and others added 3 commits May 8, 2026 15:44

Update tests/pipelines/motif_video/test_motif_video.py

3ee4218

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

Update tests/pipelines/motif_video/test_motif_video.py

754a547

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

Update tests/pipelines/motif_video/test_motif_video.py

fe938cb

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video_image2video.py Outdated

Update tests/pipelines/motif_video/test_motif_video_image2video.py

0d11cc4

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video_image2video.py Outdated

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video_image2video.py

waitingcheung and others added 4 commits May 8, 2026 06:56

Address test_attention_slicing_forward_pass comment

0fcbd64

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

f3325dd

… into feat/motif-video

Update tests/pipelines/motif_video/test_motif_video_image2video.py

b33ec8e

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

Update tests/pipelines/motif_video/test_motif_video_image2video.py

65c9dc6

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video_image2video.py

dg845 reviewed May 8, 2026

View reviewed changes

Comment thread tests/pipelines/motif_video/test_motif_video_image2video.py Outdated

dg845 reviewed May 8, 2026

View reviewed changes

waitingcheung and others added 4 commits May 8, 2026 16:35

Update tests/pipelines/motif_video/test_motif_video_image2video.py

b9de465

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

Skip I2V test cases

9c38b3d

Merge branch 'feat/motif-video' of github.com:waitingcheung/diffusers…

0e89d56

… into feat/motif-video

Fix style and quality

f2aab3c

dg845 mentioned this pull request May 8, 2026

Fix GGUF to Work Better with modules_to_not_convert / keep_in_fp32_modules #13697

Open

yiyixuxu mentioned this pull request May 8, 2026

[ernie-image] use concrete Mistral3Model / Ministral3ForCausalLM types #13687

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Motif-Video model and pipelines#13551

feat: Add Motif-Video model and pipelines#13551
waitingcheung wants to merge 88 commits intohuggingface:mainfrom
waitingcheung:feat/motif-video

waitingcheung commented Apr 23, 2026 •

edited

Loading

Uh oh!

waitingcheung commented Apr 23, 2026 •

edited

Loading

Uh oh!

sayakpaul commented Apr 23, 2026

Uh oh!

waitingcheung commented Apr 23, 2026

Uh oh!

sayakpaul commented Apr 23, 2026

Uh oh!

waitingcheung commented Apr 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dg845 left a comment •

edited

Loading

Uh oh!

waitingcheung commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

waitingcheung commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

New Files

Key Features

Version Requirements

Before submitting

Who can review?

Uh oh!

waitingcheung commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Apr 23, 2026

Uh oh!

waitingcheung commented Apr 23, 2026

Uh oh!

sayakpaul commented Apr 23, 2026

Uh oh!

waitingcheung commented Apr 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dg845 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

waitingcheung commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

waitingcheung commented Apr 23, 2026 •

edited

Loading

waitingcheung commented Apr 23, 2026 •

edited

Loading

dg845 left a comment •

edited

Loading