Skip to content

Python: Harness console for python#6312

Open
westey-m wants to merge 6 commits into
microsoft:mainfrom
westey-m:harness-console-python
Open

Python: Harness console for python#6312
westey-m wants to merge 6 commits into
microsoft:mainfrom
westey-m:harness-console-python

Conversation

@westey-m
Copy link
Copy Markdown
Contributor

@westey-m westey-m commented Jun 3, 2026

Motivation and Context

Having a harness console app is useful to showcase the agent harness behavior

#6153

Description

  • Port of .net harness console, but using textual as the UI framework.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings June 3, 2026 15:48
@moonbox3 moonbox3 added documentation Improvements or additions to documentation python labels Jun 3, 2026
@github-actions github-actions Bot changed the title Harness console for python Python: Harness console for python Jun 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Python “harness console” sample app (ported from the .NET harness) using the Textual TUI framework, and updates the existing research assistant sample to run inside this console so users can observe streaming, tool calls, planning/approval prompts, and usage.

Changes:

  • Introduces a new python/samples/02-agents/harness/console/ package implementing a Textual UI, agent runner, observers, and slash commands.
  • Updates harness_research.py to launch the new console UI instead of a raw stdin/stdout loop.
  • Adds textual to Python dev dependencies and updates uv.lock.

Reviewed changes

Copilot reviewed 34 out of 35 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
python/uv.lock Locks new transitive deps introduced by adding textual.
python/pyproject.toml Adds textual to the dev dependency group.
python/samples/02-agents/harness/harness_research.py Switches the research sample to run via the new console UI helpers.
python/samples/02-agents/harness/console/init.py Exposes the console’s public API (runner + factories).
python/samples/02-agents/harness/console/README.md Documents console usage, structure, and key concepts.
python/samples/02-agents/harness/console/app.py Implements the main Textual app and UI/state synchronization.
python/samples/02-agents/harness/console/app_state.py Defines core state, enums, output entries, and follow-up actions.
python/samples/02-agents/harness/console/agent_runner.py Orchestrates agent streaming, observers, and follow-up actions.
python/samples/02-agents/harness/console/formatters.py Provides tool-call formatting helpers for display.
python/samples/02-agents/harness/console/harness_console.py Adds the top-level run_agent_async() entrypoint.
python/samples/02-agents/harness/console/state_driver.py Defines the UX state driver protocol and a simple console driver.
python/samples/02-agents/harness/console/textual_state_driver.py Concrete UX driver mutating app state + notifying UI.
python/samples/02-agents/harness/console/commands/init.py Exports and builds default slash command handlers.
python/samples/02-agents/harness/console/commands/base.py Defines the CommandHandler ABC (currently has a syntax issue).
python/samples/02-agents/harness/console/commands/exit_handler.py Implements /exit.
python/samples/02-agents/harness/console/commands/mode_handler.py Implements /mode to show/switch agent mode.
python/samples/02-agents/harness/console/commands/session_handler.py Implements /session-export and /session-import.
python/samples/02-agents/harness/console/commands/todo_handler.py Implements /todos display using a TodoProvider.
python/samples/02-agents/harness/console/components/init.py Exports Textual UI widgets used by the app.
python/samples/02-agents/harness/console/components/agent_status.py Spinner + usage display widget.
python/samples/02-agents/harness/console/components/list_selection.py Choice list widget with optional custom text input.
python/samples/02-agents/harness/console/components/mode_help.py Mode indicator + help text widget.
python/samples/02-agents/harness/console/components/prompt_rule.py Mode-colored horizontal rules widget.
python/samples/02-agents/harness/console/components/scroll_panel.py Conversation history widget with streaming support.
python/samples/02-agents/harness/console/components/text_input.py Prompt + input field widget with submit message.
python/samples/02-agents/harness/console/observers/init.py Observer factories (default + planning-enabled).
python/samples/02-agents/harness/console/observers/base.py Observer base class for lifecycle hooks.
python/samples/02-agents/harness/console/observers/error_display.py Displays error content items.
python/samples/02-agents/harness/console/observers/planning_models.py Pydantic models for structured planning output.
python/samples/02-agents/harness/console/observers/planning_output.py Plan-mode structured output handling and follow-up prompts.
python/samples/02-agents/harness/console/observers/reasoning_display.py Displays reasoning/thinking content items.
python/samples/02-agents/harness/console/observers/text_output.py Streams assistant text chunks to the UI driver.
python/samples/02-agents/harness/console/observers/tool_approval.py Collects approval requests and prompts the user to approve/deny.
python/samples/02-agents/harness/console/observers/tool_call_display.py Displays function/tool calls using formatters.
python/samples/02-agents/harness/console/observers/usage_display.py Displays token usage updates as they arrive.

Comment thread python/samples/02-agents/harness/console/commands/base.py
Comment thread python/samples/02-agents/harness/console/commands/base.py
Comment thread python/samples/02-agents/harness/console/app.py
Comment thread python/samples/02-agents/harness/console/harness_console.py
Comment thread python/samples/02-agents/harness/console/components/prompt_rule.py Outdated
Comment thread python/samples/02-agents/harness/console/components/scroll_panel.py Outdated
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 86%

✓ Security Reliability

No actionable issues found in this dimension.

✓ Test Coverage

No actionable issues found in this dimension.

✗ Design Approach

I found two design issues in the new console port. First, the app disables all slash-command handling whenever the caller uses the public default session=None path, which means commands like /exit and /session-import get sent to the agent instead of being handled locally. Second, the specialized mode-tool formatter was ported with .NET-style tool names, so it never matches the Python harness provider’s actual mode_set / mode_get tools and falls back to generic rendering. I found two design-level issues in the new console observers. The reasoning observer is wired to a content shape the framework does not emit, so provider reasoning streams will be silently dropped. The tool-approval observer also offers two persistent-approval choices that are not represented in the approval response it sends back, so those selections behave exactly like one-time approval and will mislead users.

Flagged Issues

  • Reasoning content is silently ignored because the observer checks for content.type == "reasoning", while the framework contract and existing samples use text_reasoning for this stream (_types.py:338-341, _types.py:606-625, anthropic_advanced.py:61-64).
  • The approval UI advertises persistent choices ('Always approve…'), but every non-deny branch sends the same request.to_function_approval_response(approved=True) payload. The framework contract only supports approve/reject at this step, so choosing either 'always' option has no lasting effect and the prompt will recur.

Automated review by westey-m's agents

Comment thread python/samples/02-agents/harness/console/observers/reasoning_display.py Outdated
@westey-m westey-m marked this pull request as ready for review June 3, 2026 17:47
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 79%

✓ Security Reliability

The harness console sample enables Rich markup parsing (markup=True) on the RichLog widget that displays all output, including untrusted agent text and agent-controlled function call arguments. This allows a malicious or compromised agent to inject Rich markup (fake error styling, links, invisible text) to deceive the user. Additionally, the streaming truncation logic manipulates RichLog internals (del self.lines[...]) without invalidating the widget's internal _line_cache, which can cause stale rendering during streaming updates.

✓ Test Coverage

No actionable issues found in this dimension.

✓ Design Approach

That makes the console work only for the built-in plan/execute configuration and silently break for agents using custom modes or a non-default mode provider source ID. I found two design-level issues in the new console observers. The planning observer does not actually honor its documented raw-text fallback for schema-invalid JSON, because model_validate_json() can raise ValidationError and that path is uncaught. Separately, the tool approval UI offers two persistent “Always approve...” choices, but every non-deny path is encoded as the same one-shot boolean approval response, so those choices silently do not do what they promise.


Automated review by westey-m's agents

**kwargs,
auto_scroll=True, # Automatically scroll to bottom
wrap=True, # Wrap long lines instead of horizontal scroll
markup=True, # Enable Rich markup
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: markup=True causes all text written to this widget to be parsed for Rich markup. Agent output (streaming text, tool call argument values from formatters.py) is untrusted and flows here without escaping. A malicious agent could inject Rich markup (e.g., [link=https://evil.site]click here[/link], fake [red]ERROR[/red] styling) to mislead the user.

Consider escaping untrusted text with rich.markup.escape() before writing, or set markup=False and pass pre-built rich.text.Text objects for styled content.

Suggested change
markup=True, # Enable Rich markup
markup=False, # Disable markup parsing of untrusted agent text


choices = [
"Approve this call",
"Always approve this tool (any arguments)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two "Always approve…" choices are non-functional. All non-Deny paths call request.to_function_approval_response(approved=True), and the framework's approval response type only carries a boolean plus the current call — there is no persistence mechanism. Selecting either "Always approve" option will still prompt on the next identical approval request.

Copy link
Copy Markdown
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick first look, will do a deeper dive later

Comment thread python/pyproject.toml
"rich>=13.7.1,<16.0.0",
"tomli==2.4.1",
"prek==0.4.3",
"textual>=6.2.1",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be in here, it should be in a specific pyproject in the sample.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also use the inline script notation here for the extra dependency (and the AF ones)

display_text = f"{i + 1}. {option_text}" if i < 9 else f" {option_text}"
option_list.add_option(Option(display_text, id=str(i)))
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we narrow this catch? This is the method that renders the actual choices for follow-up questions and tool-approval prompts. A bare except Exception: pass (compounded by contextlib.suppress(Exception) in watch_options at 190-193) means any real failure in add_option/option construction leaves the user in LIST_SELECTION mode with an empty or stale list, no error line, nothing logged, turn wedged with zero signal. The only benign case here is the widget not being mounted yet, which is NoMatches.

Suggested change
pass
try:
option_list = self.query_one("#option-list", OptionList)
option_list.clear_options()
for i, option_text in enumerate(self.options):
display_text = f"{i + 1}. {option_text}" if i < 9 else f" {option_text}"
option_list.add_option(Option(display_text, id=str(i)))
except NoMatches:
pass

(needs from textual.css.query import NoMatches; same swap would fit watch_options / watch_title / watch_allow_custom_text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants