[DPO] Drop subprocess inference, add more framework support, doc update by wheresmyhair · Pull Request #970 · OptimalScale/LMFlow

wheresmyhair · 2026-05-21T15:30:37Z

Overview

Support more inference frameworks when doing DPO, trl version update, and readme update

Detailed Description

iterative_dpo_aligner
- Choose inference engine in-process - VLLMInferencer or SGLangInferencer based on inference_engine arg;
- fix DataProto -> text_to_textlist conversion left misaligned by [Data] Apply DataProto to vLLM Inference & Align API with SGLang #967 (n>1 rollouts are repeat-interleaved by prepare_inputs_for_inference and need to be ungrouped via meta_info["actual_n_rollouts"]).
vllm_inferencer
- mark MemorySafeVLLMInferencer deprecated with DeprecationWarning;
- scheduled for removal in lmflow 1.1.0.
auto_pipeline
- relax iterative_dpo_aligner gate from vllm AND trl AND ray to trl AND (vllm OR sglang);
- ray is only needed for the opt-in distributed reward inference path, and will be removed in the future to achieve a ray-less pipeline.
setup.py
- bump trl 0.8.0 -> trl>=0.11,<0.12;
- add pybase64 to [sglang] and rich to [trl] to work around upstream packaging gaps (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich).
README + 5 localized READMEs
- document optional dependency extras and the vllm/sglang environment incompatibility.

- iterative_dpo_aligner: dispatch in-process VLLMInferencer or SGLangInferencer based on inference_engine; fix DataProto -> text_to_textlist conversion left misaligned by #967 (n>1 rollouts are repeat-interleaved by prepare_inputs_for_inference and need to be ungrouped via meta_info["actual_n_rollouts"]). - vllm_inferencer: mark MemorySafeVLLMInferencer deprecated with DeprecationWarning; scheduled for removal in lmflow 1.1.0. - auto_pipeline: relax iterative_dpo_aligner gate from vllm AND trl AND ray to trl AND (vllm OR sglang); ray is only needed for the opt-in distributed reward inference path. - setup.py: bump trl 0.8.0 -> trl>=0.11,<0.12; add pybase64 to [sglang] and rich to [trl] to work around upstream packaging gaps (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich). - README + 5 localized READMEs: document optional dependency extras and the vllm/sglang environment incompatibility.

wheresmyhair changed the title ~~[DPO] Drop subprocess inference, add SGLang support, doc update~~ [DPO] Drop subprocess inference, Add more framework support, doc update May 22, 2026

wheresmyhair changed the title ~~[DPO] Drop subprocess inference, Add more framework support, doc update~~ [DPO] Drop subprocess inference, add more framework support, doc update May 22, 2026

wheresmyhair closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] Drop subprocess inference, add more framework support, doc update#970

[DPO] Drop subprocess inference, add more framework support, doc update#970
wheresmyhair wants to merge 1 commit into
mainfrom
lmflow-dpo

wheresmyhair commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wheresmyhair commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Detailed Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wheresmyhair commented May 21, 2026 •

edited

Loading