Skip to content

Multi-LoRA SFT support FSDP2 #155

Merged
kevssim merged 20 commits into
modelscope:mainfrom
kevssim:multilora_fsdp
May 11, 2026
Merged

Multi-LoRA SFT support FSDP2 #155
kevssim merged 20 commits into
modelscope:mainfrom
kevssim:multilora_fsdp

Conversation

@kevssim
Copy link
Copy Markdown
Collaborator

@kevssim kevssim commented Apr 14, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Multi-LoRA support FSDP2

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements FSDP2 support for MultiLoraTransformersModel by integrating it into the shared strategy and lazy-wrap lifecycle and introducing sharding-aware parameter access helpers. Review feedback identifies critical bugs in the distributed tensor handling: _write_param_tensor may incorrectly double-shard local data, set_state_dict risks shape mismatches when applying global state to local shards, and get_state_dict returns sharded tensors that could lead to corrupt checkpoints. Furthermore, the model's initialization should be refactored to properly use the parent class, and internal imports should be moved to the module level.

Comment thread src/twinkle/model/multi_lora.py
Comment thread src/twinkle/model/multi_lora.py Outdated
Comment thread src/twinkle/model/multi_lora.py
Comment thread src/twinkle/model/transformers/multi_lora_transformers.py
Comment thread src/twinkle/model/multi_lora.py Outdated
@kevssim kevssim changed the title Multi-LoRA SFT support FSDP2 [WIP] Multi-LoRA SFT support FSDP2 Apr 16, 2026
@xichengpro
Copy link
Copy Markdown
Contributor

I'd love to have this feature! Just curious — why was this PR changed to draft? Any other plans in the works?

@kevssim
Copy link
Copy Markdown
Collaborator Author

kevssim commented Apr 21, 2026

I'd love to have this feature! Just curious — why was this PR changed to draft? Any other plans in the works?

cause working in progress, when finished, will merge into branch main

@kevssim kevssim marked this pull request as ready for review May 9, 2026 06:22
@kevssim kevssim changed the title [WIP] Multi-LoRA SFT support FSDP2 Multi-LoRA SFT support FSDP2 May 9, 2026
@kevssim
Copy link
Copy Markdown
Collaborator Author

kevssim commented May 9, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements distributed training support for MultiLora, introducing helper methods for tensor sharding and updating model lifecycle methods to handle distributed contexts. The review identified critical issues where local shards are incorrectly processed as global tensors, potentially leading to corrupted weights during sharding and incomplete state dicts. Feedback emphasizes the need to gather tensors before saving or returning them to ensure compatibility with standard loaders and correct distributed behavior.

Comment thread src/twinkle/model/multi_lora.py
Comment thread src/twinkle/model/multi_lora.py
Comment thread src/twinkle/model/multi_lora.py
Comment thread src/twinkle/model/multi_lora.py
@modelscope modelscope deleted a comment May 11, 2026
@modelscope modelscope deleted a comment May 11, 2026
@modelscope modelscope deleted a comment May 11, 2026
Comment thread src/twinkle/model/transformers/multi_lora_transformers.py
@kevssim kevssim merged commit c216f7a into modelscope:main May 11, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants