[feat, npu] Liger-Kernel is now available for NPU#9227
[feat, npu] Liger-Kernel is now available for NPU#9227zheliuyu wants to merge 1 commit intomodelscope:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the NPU support documentation to reflect that Liger-Kernel is now supported. The changes include updating the status in the support matrix and removing Liger-Kernel from the list of unsupported items. A review comment points out a table alignment issue in the support matrix caused by the change in text length and provides a suggestion to restore visual consistency.
| | | PPO | 已支持 | | ||
| | 性能优化 | FA 等融合算子 | 已支持 | | ||
| | | Liger-Kernel | 暂不支持 | | ||
| | | Liger-Kernel | 已支持 | |
There was a problem hiding this comment.
The alignment of the table column is broken. To maintain consistent alignment with other rows (like line 370), "已支持" should be followed by three spaces instead of one. This accounts for the width difference between "已支持" (3 characters) and "暂不支持" (4 characters) in a fixed-width context.
| | | Liger-Kernel | 已支持 | | |
| | | Liger-Kernel | 已支持 | |
Motivation
As title, this pr enables the use of Liger-Kernel on npu. A Qwen3-30B-AB SFT fine-tuning example is provided to illustrate its application.
Experiment results
Visualization
Script
This example runs on a single NPU.
qwen3_moe.sh
use_liger_kernel=False
{ 'loss': '7.281', 'grad_norm': '6.169', 'learning_rate': '0.0001', 'token_acc': '0.3935', 'epoch': '0.0261', 'global_step/max_steps': '1/2', 'elapsed_time': '3m 57s', 'remaining_time': '3m 57s', 'memory(GiB)': '60.3', 'train_speed(s/it)': '237' } { 'loss': '6.51', 'grad_norm': '5.505', 'learning_rate': '0', 'token_acc': '0.4681', 'epoch': '0.0522', 'global_step/max_steps': '2/2', 'elapsed_time': '8m 14s', 'remaining_time': '0s', 'memory(GiB)': '60.3', 'train_speed(s/it)': '247' } { 'train_runtime': '494.6', 'train_samples_per_second': '0.065', 'train_steps_per_second': '0.004', 'train_loss': '6.895', 'epoch': '0.0522', 'global_step/max_steps': '2/2', 'elapsed_time': '8m 15s', 'remaining_time': '0s', 'memory(GiB)': '60.3', 'train_speed(s/it)': '247.3' }use_liger_kernel=True
{ 'loss': '7.229', 'grad_norm': '6.363', 'learning_rate': '0.0001', 'epoch': '0.0261', 'global_step/max_steps': '1/2', 'elapsed_time': '2m 36s', 'remaining_time': '2m 36s', 'memory(GiB)': '60.5', 'train_speed(s/it)': '155.7' } { 'loss': '6.497', 'grad_norm': '5.669', 'learning_rate': '0', 'epoch': '0.0522', 'global_step/max_steps': '2/2', 'elapsed_time': '5m 22s', 'remaining_time': '0s', 'memory(GiB)': '60.5', 'train_speed(s/it)': '160.8' } { 'train_runtime': '322.2', 'train_samples_per_second': '0.099', 'train_steps_per_second': '0.006', 'train_loss': '6.863', 'epoch': '0.0522', 'global_step/max_steps': '2/2', 'elapsed_time': '5m 22s', 'remaining_time': '0s', 'memory(GiB)': '60.5', 'train_speed(s/it)': '161.1' }PR type