Skip to content

docs/examples: team-eval starter pack for coordinated multi-agent workflows #1077

Description

@christso

Objective

Add a first-party team-eval example pack that shows how to evaluate coordinated multi-agent workflows in AgentV without requiring users to reverse-engineer the pattern from research notes.

Why this is needed

Current AgentV examples are strong for single-agent and single-test flows, but the frontier benchmark/story around agent teams keeps recurring:

  • coordinated specialist agents
  • judgeable intermediate artifacts
  • role adherence and division of labor
  • end-to-end scoring of the team result

Even before dependency-aware DAG execution lands, AgentV can already demonstrate useful team-eval patterns with existing primitives: multi-turn transcripts, composite evaluators, code graders, tool trajectory checks, and imported session data.

Suggested example coverage

  1. Two-role handoff example — planner -> implementer, scored on final output + role adherence
  2. Team transcript import example — evaluate an existing multi-agent / multi-role transcript offline
  3. Composite team score example — combine outcome quality, tool-use constraints, and collaboration-specific rubric signals
  4. Future-facing note for dependency DAGs — show how richer orchestration examples could layer on top of feat(eval): multi-agent swarm evaluation — dependency-aware eval ordering and cross-agent scoring #331 once available

Acceptance signals

Non-goals

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsImprovements or additions to documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions