Python: Harness console for python#6312
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Python “harness console” sample app (ported from the .NET harness) using the Textual TUI framework, and updates the existing research assistant sample to run inside this console so users can observe streaming, tool calls, planning/approval prompts, and usage.
Changes:
- Introduces a new
python/samples/02-agents/harness/console/package implementing a Textual UI, agent runner, observers, and slash commands. - Updates
harness_research.pyto launch the new console UI instead of a raw stdin/stdout loop. - Adds
textualto Python dev dependencies and updatesuv.lock.
Reviewed changes
Copilot reviewed 34 out of 35 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Locks new transitive deps introduced by adding textual. |
| python/pyproject.toml | Adds textual to the dev dependency group. |
| python/samples/02-agents/harness/harness_research.py | Switches the research sample to run via the new console UI helpers. |
| python/samples/02-agents/harness/console/init.py | Exposes the console’s public API (runner + factories). |
| python/samples/02-agents/harness/console/README.md | Documents console usage, structure, and key concepts. |
| python/samples/02-agents/harness/console/app.py | Implements the main Textual app and UI/state synchronization. |
| python/samples/02-agents/harness/console/app_state.py | Defines core state, enums, output entries, and follow-up actions. |
| python/samples/02-agents/harness/console/agent_runner.py | Orchestrates agent streaming, observers, and follow-up actions. |
| python/samples/02-agents/harness/console/formatters.py | Provides tool-call formatting helpers for display. |
| python/samples/02-agents/harness/console/harness_console.py | Adds the top-level run_agent_async() entrypoint. |
| python/samples/02-agents/harness/console/state_driver.py | Defines the UX state driver protocol and a simple console driver. |
| python/samples/02-agents/harness/console/textual_state_driver.py | Concrete UX driver mutating app state + notifying UI. |
| python/samples/02-agents/harness/console/commands/init.py | Exports and builds default slash command handlers. |
| python/samples/02-agents/harness/console/commands/base.py | Defines the CommandHandler ABC (currently has a syntax issue). |
| python/samples/02-agents/harness/console/commands/exit_handler.py | Implements /exit. |
| python/samples/02-agents/harness/console/commands/mode_handler.py | Implements /mode to show/switch agent mode. |
| python/samples/02-agents/harness/console/commands/session_handler.py | Implements /session-export and /session-import. |
| python/samples/02-agents/harness/console/commands/todo_handler.py | Implements /todos display using a TodoProvider. |
| python/samples/02-agents/harness/console/components/init.py | Exports Textual UI widgets used by the app. |
| python/samples/02-agents/harness/console/components/agent_status.py | Spinner + usage display widget. |
| python/samples/02-agents/harness/console/components/list_selection.py | Choice list widget with optional custom text input. |
| python/samples/02-agents/harness/console/components/mode_help.py | Mode indicator + help text widget. |
| python/samples/02-agents/harness/console/components/prompt_rule.py | Mode-colored horizontal rules widget. |
| python/samples/02-agents/harness/console/components/scroll_panel.py | Conversation history widget with streaming support. |
| python/samples/02-agents/harness/console/components/text_input.py | Prompt + input field widget with submit message. |
| python/samples/02-agents/harness/console/observers/init.py | Observer factories (default + planning-enabled). |
| python/samples/02-agents/harness/console/observers/base.py | Observer base class for lifecycle hooks. |
| python/samples/02-agents/harness/console/observers/error_display.py | Displays error content items. |
| python/samples/02-agents/harness/console/observers/planning_models.py | Pydantic models for structured planning output. |
| python/samples/02-agents/harness/console/observers/planning_output.py | Plan-mode structured output handling and follow-up prompts. |
| python/samples/02-agents/harness/console/observers/reasoning_display.py | Displays reasoning/thinking content items. |
| python/samples/02-agents/harness/console/observers/text_output.py | Streams assistant text chunks to the UI driver. |
| python/samples/02-agents/harness/console/observers/tool_approval.py | Collects approval requests and prompts the user to approve/deny. |
| python/samples/02-agents/harness/console/observers/tool_call_display.py | Displays function/tool calls using formatters. |
| python/samples/02-agents/harness/console/observers/usage_display.py | Displays token usage updates as they arrive. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 86%
✓ Security Reliability
No actionable issues found in this dimension.
✓ Test Coverage
No actionable issues found in this dimension.
✗ Design Approach
I found two design issues in the new console port. First, the app disables all slash-command handling whenever the caller uses the public default
session=Nonepath, which means commands like/exitand/session-importget sent to the agent instead of being handled locally. Second, the specialized mode-tool formatter was ported with.NET-style tool names, so it never matches the Python harness provider’s actualmode_set/mode_gettools and falls back to generic rendering. I found two design-level issues in the new console observers. The reasoning observer is wired to a content shape the framework does not emit, so provider reasoning streams will be silently dropped. The tool-approval observer also offers two persistent-approval choices that are not represented in the approval response it sends back, so those selections behave exactly like one-time approval and will mislead users.
Flagged Issues
- Reasoning content is silently ignored because the observer checks for
content.type == "reasoning", while the framework contract and existing samples usetext_reasoningfor this stream (_types.py:338-341,_types.py:606-625,anthropic_advanced.py:61-64). - The approval UI advertises persistent choices ('Always approve…'), but every non-deny branch sends the same
request.to_function_approval_response(approved=True)payload. The framework contract only supports approve/reject at this step, so choosing either 'always' option has no lasting effect and the prompt will recur.
Automated review by westey-m's agents
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 79%
✓ Security Reliability
The harness console sample enables Rich markup parsing (
markup=True) on the RichLog widget that displays all output, including untrusted agent text and agent-controlled function call arguments. This allows a malicious or compromised agent to inject Rich markup (fake error styling, links, invisible text) to deceive the user. Additionally, the streaming truncation logic manipulates RichLog internals (del self.lines[...]) without invalidating the widget's internal_line_cache, which can cause stale rendering during streaming updates.
✓ Test Coverage
No actionable issues found in this dimension.
✓ Design Approach
That makes the console work only for the built-in
plan/executeconfiguration and silently break for agents using custom modes or a non-default mode provider source ID. I found two design-level issues in the new console observers. The planning observer does not actually honor its documented raw-text fallback for schema-invalid JSON, becausemodel_validate_json()can raiseValidationErrorand that path is uncaught. Separately, the tool approval UI offers two persistent “Always approve...” choices, but every non-deny path is encoded as the same one-shot boolean approval response, so those choices silently do not do what they promise.
Automated review by westey-m's agents
| **kwargs, | ||
| auto_scroll=True, # Automatically scroll to bottom | ||
| wrap=True, # Wrap long lines instead of horizontal scroll | ||
| markup=True, # Enable Rich markup |
There was a problem hiding this comment.
Security: markup=True causes all text written to this widget to be parsed for Rich markup. Agent output (streaming text, tool call argument values from formatters.py) is untrusted and flows here without escaping. A malicious agent could inject Rich markup (e.g., [link=https://evil.site]click here[/link], fake [red]ERROR[/red] styling) to mislead the user.
Consider escaping untrusted text with rich.markup.escape() before writing, or set markup=False and pass pre-built rich.text.Text objects for styled content.
| markup=True, # Enable Rich markup | |
| markup=False, # Disable markup parsing of untrusted agent text |
|
|
||
| choices = [ | ||
| "Approve this call", | ||
| "Always approve this tool (any arguments)", |
There was a problem hiding this comment.
The two "Always approve…" choices are non-functional. All non-Deny paths call request.to_function_approval_response(approved=True), and the framework's approval response type only carries a boolean plus the current call — there is no persistence mechanism. Selecting either "Always approve" option will still prompt on the next identical approval request.
eavanvalkenburg
left a comment
There was a problem hiding this comment.
quick first look, will do a deeper dive later
| "rich>=13.7.1,<16.0.0", | ||
| "tomli==2.4.1", | ||
| "prek==0.4.3", | ||
| "textual>=6.2.1", |
There was a problem hiding this comment.
this should not be in here, it should be in a specific pyproject in the sample.
There was a problem hiding this comment.
you can also use the inline script notation here for the extra dependency (and the AF ones)
| display_text = f"{i + 1}. {option_text}" if i < 9 else f" {option_text}" | ||
| option_list.add_option(Option(display_text, id=str(i))) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Should we narrow this catch? This is the method that renders the actual choices for follow-up questions and tool-approval prompts. A bare except Exception: pass (compounded by contextlib.suppress(Exception) in watch_options at 190-193) means any real failure in add_option/option construction leaves the user in LIST_SELECTION mode with an empty or stale list, no error line, nothing logged, turn wedged with zero signal. The only benign case here is the widget not being mounted yet, which is NoMatches.
| pass | |
| try: | |
| option_list = self.query_one("#option-list", OptionList) | |
| option_list.clear_options() | |
| for i, option_text in enumerate(self.options): | |
| display_text = f"{i + 1}. {option_text}" if i < 9 else f" {option_text}" | |
| option_list.add_option(Option(display_text, id=str(i))) | |
| except NoMatches: | |
| pass |
(needs from textual.css.query import NoMatches; same swap would fit watch_options / watch_title / watch_allow_custom_text)
Motivation and Context
Having a harness console app is useful to showcase the agent harness behavior
#6153
Description
Contribution Checklist