Skip to content

[DPO] Drop subprocess inference, add more framework support, doc update#970

Closed
wheresmyhair wants to merge 1 commit into
mainfrom
lmflow-dpo
Closed

[DPO] Drop subprocess inference, add more framework support, doc update#970
wheresmyhair wants to merge 1 commit into
mainfrom
lmflow-dpo

Conversation

@wheresmyhair
Copy link
Copy Markdown
Collaborator

@wheresmyhair wheresmyhair commented May 21, 2026

Overview

Support more inference frameworks when doing DPO, trl version update, and readme update

Detailed Description

  • iterative_dpo_aligner
    • Choose inference engine in-process - VLLMInferencer or SGLangInferencer based on inference_engine arg;
    • fix DataProto -> text_to_textlist conversion left misaligned by [Data] Apply DataProto to vLLM Inference & Align API with SGLang #967 (n>1 rollouts are repeat-interleaved by prepare_inputs_for_inference and need to be ungrouped via meta_info["actual_n_rollouts"]).
  • vllm_inferencer
    • mark MemorySafeVLLMInferencer deprecated with DeprecationWarning;
    • scheduled for removal in lmflow 1.1.0.
  • auto_pipeline
    • relax iterative_dpo_aligner gate from vllm AND trl AND ray to trl AND (vllm OR sglang);
    • ray is only needed for the opt-in distributed reward inference path, and will be removed in the future to achieve a ray-less pipeline.
  • setup.py
    • bump trl 0.8.0 -> trl>=0.11,<0.12;
    • add pybase64 to [sglang] and rich to [trl] to work around upstream packaging gaps (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich).
  • README + 5 localized READMEs
    • document optional dependency extras and the vllm/sglang environment incompatibility.

- iterative_dpo_aligner: dispatch in-process VLLMInferencer or
  SGLangInferencer based on inference_engine; fix DataProto ->
  text_to_textlist conversion left misaligned by #967 (n>1 rollouts are
  repeat-interleaved by prepare_inputs_for_inference and need to be
  ungrouped via meta_info["actual_n_rollouts"]).
- vllm_inferencer: mark MemorySafeVLLMInferencer deprecated with
  DeprecationWarning; scheduled for removal in lmflow 1.1.0.
- auto_pipeline: relax iterative_dpo_aligner gate from
  vllm AND trl AND ray to trl AND (vllm OR sglang); ray is only needed
  for the opt-in distributed reward inference path.
- setup.py: bump trl 0.8.0 -> trl>=0.11,<0.12; add pybase64 to [sglang]
  and rich to [trl] to work around upstream packaging gaps
  (sglang.utils eagerly imports pybase64; trl 0.11.x lazy-imports rich).
- README + 5 localized READMEs: document optional dependency extras
  and the vllm/sglang environment incompatibility.
@wheresmyhair wheresmyhair changed the title [DPO] Drop subprocess inference, add SGLang support, doc update [DPO] Drop subprocess inference, Add more framework support, doc update May 22, 2026
@wheresmyhair wheresmyhair changed the title [DPO] Drop subprocess inference, Add more framework support, doc update [DPO] Drop subprocess inference, add more framework support, doc update May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant