feat(cli): bundle agentv-dev skills into npm package and add "agentv skills" subcommand

## Objective

Add an `agentv skills` subcommand that serves skill content from inside the CLI tarball, version-matched to the binary. Bundle the contents of `plugins/agentv-dev/skills/**` into the published `agentv` npm package so a single `npm i -g agentv` is sufficient for skill-driven use, without requiring a separate `allagents plugin install` step.

## Why

Today the `agentv` npm package and its skills ship as **two independently-installed artifacts** with no version coupling:

- `apps/cli/package.json` declares `files: ["dist", "README.md"]` — no skill content in the tarball.
- Skills live in `plugins/agentv-dev/skills/**` and are distributed through the marketplace declared in `.claude-plugin/marketplace.json`. The canonical setup in `apps/web/src/content/docs/docs/getting-started/installation.mdx` is:

  ```bash
  npm install -g agentv
  npx allagents plugin marketplace add EntityProcess/agentv
  npx allagents plugin install agentv-dev@agentv
  ```

The CLI version a user installs (e.g. `agentv@4.20`) is not coupled to the skill version their plugin manager resolves. This permits silent drift:

1. **Skill references a CLI command that doesn't exist yet.** User on `agentv@4.20` with `agentv-dev` at `main`, where `agentv-bench/SKILL.md` documents commands introduced in `agentv@4.22`. The agent runs them, the CLI errors with "unknown command", and there's no version-skew signal.
2. **CLI exposes flags the skill never mentions.** User on `agentv@4.25` with a months-old plugin install. Newer flags (`--target copilot-sdk`, etc.) are unknown to the skill — the agent works around them.
3. **Renamed primitives appear inconsistently.** Grader type renames like `llm_judge` → `llm-grader` are normalized in YAML inputs by `grader-parser.ts`, but skill text and JSONL output use different names, which is hard to diagnose without the version context.

The skills also reference each other (`agentv-bench` hands off to `agentv-eval-review` and `agentv-trace-analyst`). A partial mismatch between two skills installed at different times is harder to diagnose than a clean version error.

A second, related limitation: today's skill content is reachable only via a plugin host. Agents running in Cursor, Codex, raw Claude API, or any harness that can run a shell can't load AgentV's skill content without first reaching into a Claude-specific plugin install path.

### Prior art

`vercel-labs/agent-browser` (v0.26.0) ships skill content **inside the CLI npm package** and serves it via a built-in `agent-browser skills` subcommand. The package's `files` array lists `skill-data` and `skills` alongside `bin`. The Rust binary's `cli/src/skills.rs` (~600 LOC including tests) implements `skills list/get/path` with a deterministic resolution order. The discovery stub in `skills/agent-browser/SKILL.md` carries `hidden: true` and its entire body redirects the agent to `agent-browser skills get core`, with the design statement made explicit:

> The CLI serves skill content that always matches the installed version, so instructions never go stale. The content in this stub cannot change between releases, which is why it just points at `skills get core`.

Drift is impossible by construction because skill content and binary live in the same tarball at the same version. `npm i -g agent-browser@latest` updates both atomically. The skills are also harness-portable — anything that can shell out to `agent-browser skills get core` can load them.

Full comparison: https://github.com/agentevals/agentevals-research/blob/main/research/findings/agent-browser/skill-delivery-vs-agentv.md

## Design latitude

### Surface

```bash
agentv skills list
agentv skills get <name>
agentv skills get <name> --full       # also print files under references/ and templates/
agentv skills get --all
agentv skills get <name> --json       # machine-readable
agentv skills path [<name>]
```

`skills get` output is the SKILL.md content (frontmatter included), matching how Claude Code consumes skills today.

### Bundling

1. Bundle skill content under `plugins/agentv-dev/skills/**` into `apps/cli/dist/skills/` at build time (tsup `onSuccess` step, or a `cp` script run before `tsup`). The existing `files: ["dist", "README.md"]` then carries the skills into the published tarball without changes.
2. Implement `agentv skills` under `apps/cli/src/commands/skills/`. Resolution walks from the CLI binary's location to find the bundled `dist/skills/` directory. agent-browser's `find_package_root` in `cli/src/skills.rs` is the model — adapt the npm install layout (binary in `dist/`, skills in `dist/skills/`).
3. Estimated cost: ~300–500 LOC TypeScript plus tests, plus one wiring change in `apps/cli/tsup.config.ts` (or equivalent build script).

### Marketplace plugin

Keep `agentv-dev` in the marketplace for Claude Code users who prefer plugin-based discovery. **Rewrite each plugin SKILL.md to be a discovery stub**: short body, redirect to `agentv skills get <name>`. This is the agent-browser pattern — the stub lives outside the install, so it must not try to teach. Concretely, today's full SKILL.md content moves into the bundled `dist/skills/<name>/SKILL.md`, and the marketplace plugin's SKILL.md becomes a few lines pointing the agent at the CLI subcommand.

### Open design choices

- **Subset vs. all.** Minimum viable bundle is `agentv-bench` (most invoked) plus `agentv-eval-writer`. Bundling all six (`agentv-bench`, `agentv-eval-writer`, `agentv-eval-review`, `agentv-trace-analyst`, `agentv-onboarding`, `agentv-governance`) is also fine — skill markdown is small.
- **Source path.** Today's `plugins/agentv-dev/skills/**` can stay as the source of truth and be copied into `dist/skills/` at build time, or skills can be moved to `apps/cli/skills/` with the marketplace plugin referencing them from there. Either is acceptable; the first preserves the marketplace plugin's current layout.
- **Subcommand naming.** `agentv skills` matches `agent-browser skills`. `agentv skill` (singular) is also fine.
- **`--json`.** agent-browser exposes `--json` for programmatic agents loading skills into their own context machinery. AgentV could ship `--json` from day one or land it as a follow-up — the win from bundling is independent of output format.

## Acceptance signals

- `agentv skills list` returns the same set of skill names that the marketplace plugin currently exposes (or the explicitly-chosen subset, if subsetting).
- `agentv skills get agentv-bench` returns content byte-equivalent to the `agentv-bench` SKILL.md at the same release tag.
- A clean `npm i -g agentv@<release>` is sufficient to drive evals via skills, with no `allagents plugin install` step required.
- The marketplace plugin still installs in Claude Code, but each plugin SKILL.md is a stub redirecting to the CLI subcommand. No duplicate skill bodies in two locations.
- The release pipeline has exactly one source of truth for skill content. CI (or a pre-publish check) verifies bundled skills match source.
- `agentv skills get <name> --json` parses as valid JSON with a stable schema (mirroring agent-browser: `{ "success": true, "data": [{ "name", "content", "files"? }] }`).
- Tests cover: discovery from the install layout, frontmatter parsing including `hidden: true`, `--full` collecting `references/` and `templates/`, and the not-found error path.

## Non-goals

- Does not move `agentic-engineering` or `agentv-claude-trace` out of the marketplace. Those plugins are not tightly coupled to the CLI version and the existing model fits them.
- Not a rewrite of skill content. Skills move location but bodies are unchanged.
- No deprecation of the marketplace path — it stays as a Claude Code surface, just no longer the canonical setup in install docs.
- Not a new skill format or schema. Existing SKILL.md frontmatter (Claude Code skill conventions) is preserved.

## Docs follow-up (in the same PR)

- Update `apps/web/src/content/docs/docs/getting-started/installation.mdx` so the "Canonical Setup" is `npm install -g agentv` alone. The `npx allagents plugin marketplace add ...` block becomes a "Claude Code plugin" section, not the canonical path.
- Update `agentv init`'s `printSkillFirstInstructions()` (in `apps/cli/src/commands/init/index.ts`) to drop the marketplace step from the recommended setup, or reframe it as the Claude-Code-specific path.
- Update each `plugins/agentv-dev/skills/<name>/SKILL.md` to its discovery-stub form.

## Related

- Research findings: https://github.com/agentevals/agentevals-research/blob/main/research/findings/agent-browser/skill-delivery-vs-agentv.md
- Draft proposal (more design notes than this issue body): https://github.com/agentevals/agentevals-research/blob/main/research/proposals/agentv-bundled-skills-subcommand.md
- agent-browser CLI skills implementation: https://github.com/vercel-labs/agent-browser/blob/main/cli/src/skills.rs
- agent-browser discovery stub: https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): bundle agentv-dev skills into npm package and add "agentv skills" subcommand #1224

Objective

Why

Prior art

Design latitude

Surface

Bundling

Marketplace plugin

Open design choices

Acceptance signals

Non-goals

Docs follow-up (in the same PR)

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(cli): bundle agentv-dev skills into npm package and add "agentv skills" subcommand #1224

Description

Objective

Why

Prior art

Design latitude

Surface

Bundling

Marketplace plugin

Open design choices

Acceptance signals

Non-goals

Docs follow-up (in the same PR)

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions