feat(openai): route reasoning + tools to the Responses API (refs #785)#786
feat(openai): route reasoning + tools to the Responses API (refs #785)#786andrew-woblavobla wants to merge 6 commits into
Conversation
OpenAI reasoning models (gpt-5.x, o-series) reject `reasoning_effort`
together with function tools on /v1/chat/completions:
"Function tools with reasoning_effort are not supported for gpt-5.5 in
/v1/chat/completions. Please use /v1/responses instead." So
`chat.with_thinking(effort:).with_tools(...)` is impossible for the entire
gpt-5 reasoning family today.
This transparently routes that combo to /v1/responses inside the OpenAI
provider: render_payload sets @openai_responses_mode when thinking && tools,
and completion_url / parse_completion_response branch on it. The default
chat/completions path is unchanged (gated). Translates request (input items,
flat tools, reasoning:{effort:}, text.format) and response (output[] ->
Message/ToolCall/Thinking + usage).
Verified live against gpt-5.5: reasoning (88 reasoning tokens) + a function
tool complete in one turn.
Prototype scope — not yet implemented: Responses streaming (guarded), image
input, reasoning-item round-trip across turns, cassette tests.
Chat#render_payload is also called directly as a module function in specs (RubyLLM::Providers::OpenAI::Chat.render_payload). Calling responses_api? from there raised NoMethodError because that helper lives in OpenAI::Responses, which is mixed into the provider *instance*, not the Chat module — breaking 3 schema render_payload specs. Move the thinking+tools -> Responses routing into an OpenAI#render_payload override (instance context, both modules mixed in); the Chat module's render_payload is pure chat/completions again. Gate on instance_of?(OpenAI) so the OpenAI subclasses (Azure/OpenRouter/Mistral/Perplexity/xAI/GPUStack) keep chat/completions — they have no /v1/responses endpoint. Re-verified live: gpt-5.5 + with_thinking + a function tool completes the tool loop with reasoning tokens.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #786 +/- ##
==========================================
+ Coverage 87.21% 87.43% +0.21%
==========================================
Files 121 122 +1
Lines 5703 5802 +99
Branches 1442 1478 +36
==========================================
+ Hits 4974 5073 +99
Misses 729 729 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ion + routing) Adds keyless unit specs for OpenAI::Responses: responses_api? gating, the render_payload routing (thinking+tools -> /v1/responses; subclasses + no-thinking stay on chat/completions), render_responses_payload request shape (input items, flat tools, reasoning effort, instructions, text.format), function_call / function_call_output round-trip, parse_responses_response (message/tool_call/ reasoning/usage + output_text fallback + error body), tool_choice, and the streaming guard. Raises patch coverage flagged by Codecov.
Adds cases for responses_tool_for provider_params deep-merge and the responses_text_content Content/.text and to_s fallbacks — the 4 lines Codecov flagged. responses.rb is now fully covered.
…ches Exercises the last partial branches Codecov flagged: an assistant message with text content rendering an output_text input item, and parse_responses_response returning nil for an empty body.
Covers the 8 partial branches Codecov folded into patch %: render without effort (no :reasoning), tool_prefs choice/parallel_calls, unknown output items, non-output_text / non-summary_text content blocks, empty tool-call arguments, and empty-content message build. responses.rb now 100% line + branch coverage.
|
Hi does anyone know how close this is to being landed as this is an issue that breaks this libraries amazing interface with openai models unless it is fixed |
|
@reeganviljoen My guess is it won't be merged until streaming is supported (in my case that is essential). |
|
Responses API requires us to make a new layer in RubyLLM's architecture: protocols. It will be done that way. |
|
@zavan streaming is supported since 1.0 |
|
@crmne I was talking specifically about streaming support in this pull request: Known follow-ups
|
|
@crmne thanks, I didn't see that comment, is there any assistance I can provide to help get that working ? |
|
Following up here too: Responses support landed on main via the new protocols layer (0875ce2), including the semantic streaming events, so the streaming follow-up discussed in this thread is covered. Thanks for pushing on this @andrew-woblavobla. |
What
Transparently routes
with_thinking(effort:) + toolsfor OpenAI to/v1/responses— the only endpoint that acceptsreasoningtogether with function tools for gpt-5.x / o-series. The default/v1/chat/completionspath is unchanged (gated by@openai_responses_mode).Why
OpenAI reasoning models 400 on
reasoning_effort+ function tools via chat/completions ("use /v1/responses instead"), sochat.with_thinking(effort:).with_tools(...)is impossible for the whole gpt-5 reasoning family. Details + repro in #785.How (auto-route within the OpenAI provider)
OpenAI#render_payloadsets@openai_responses_mode = instance_of?(OpenAI) && responses_api?(tools:, thinking:)(true only when both are present) and renders a Responses payload;completion_url/parse_completion_responsebranch on it. TheChatmodule'srender_payloadstays pure chat/completions, and theinstance_of?guard keeps subclasses (Azure/OpenRouter/Mistral/Perplexity/xAI/GPUStack) on chat/completions — they have no/v1/responses.OpenAI::Responsesmodule: request translation —inputitems (incl.function_call/function_call_outputround-trip), flat{type:"function",…}tools, top-levelreasoning:{effort:},text.formatfor structured output,store:false; response parsing —output[]→Message/ToolCall/Thinking+ usage (incl.reasoning_tokens).stream_responseraises a clear error in responses mode (Responses SSE streaming not implemented yet).Verified
gpt-5.5+with_thinking(:high)+ a function tool completes the tool loop with reasoning tokens (previously a 400).Known follow-ups
input_imagemultimodalinclude: ["reasoning.encrypted_content"])I'm open to design changes — e.g. extracting this into a dedicated
:openai_responsesprovider rather than auto-routing withinOpenAI, or any other shape you'd prefer.Refs #785.