Test Commander / Documentation · Phases 0–13 shipped

End-user documentation

Test Commander is a Claude Code plugin plus a small Python runtime. It turns a project’s requirements, source, specs, recorded API traffic, and exploratory recordings into one committed workspace of structured quality artifacts — then runs that workspace from the terminal, a web console, an API, an MCP server, sandboxes, and a continuous agent. This page is the practical reference: install, the workflow order, the workspace, all 64 commands, governance, and the autonomy modes.

Getting started

Install once, then init per project.

Test Commander supports macOS, Linux, WSL2, and Git Bash (Decision D13). You install the plugin once from its repo, then run /tc:init inside each project you want to test. The workspace lives at .test-commander/ and is committed to git.

install + initialize

# 1. Clone and provision the environment (Python 3.12, PDM, Docker, git, make).
$ git clone https://github.com/NickBaynham/test-commander
$ cd test-commander
$ ./bootstrap.sh          # verify prerequisites; auto-install the safe ones
$ make install            # validate manifests, register the local marketplace,
                          # install the plugin, verify the 20 skills

# 2. Inside YOUR project, initialize the workspace.
$ cd ~/projects/your-app
$ /tc:init                # copy the workspace template into .test-commander/
$ /tc:status              # read-only snapshot of the workspace + per-phase status
$ /tc:next                # ask what to do next for this project

The workflow order — let /tc:next guide you, or follow the phases

the workflow, phase by phase

/tc:init                    # 1.  workspace
/tc:review-requirements     # 2.  requirements quality
/tc:learn-from-docs ...     # 3.  project knowledge ingestion
/tc:create-charter          # 4.  scope an exploratory session
/tc:explore                 #     explore + classify
/tc:test-ideas              #     enrich the seeds
/tc:generate-bdd            # 5.  BDD + traceability
/tc:automation-plan         # 6.  score, then generate
/tc:automate                #     the Playwright suite
/tc:run                     # 7.  execute + collect evidence
/tc:report                  #     publish the quality report
/tc:learn                   # 8.  governed learning loop
/tc:visualize               # 9.  diagrams + infographics
/tc:web-start               # 10. read-only web console
# 10.5 governance · 11 API+MCP · 12 sandboxes · 13 continuous quality

The workspace

One committed directory is the source of truth.

Everything Test Commander produces lands under .test-commander/ as plain Markdown, YAML, and JSON. Every helper is idempotent and byte-deterministic, so re-running against unchanged input produces identical bytes — the workspace reviews like any other source, in real git diffs.

.test-commander/ — committed workspace layout

your-app/
└── .test-commander/            # committed to git like any other source
    ├── project.md              # project metadata
    ├── config.yaml             # YOUR domain extensions (D19)
    ├── journal/                # append-only narrative log
    ├── documents/uploaded/     # your requirements + docs (input)
    ├── requirements/           # reviews + open questions
    ├── product-knowledge/      # ingested entities, journeys, rules, impact map
    ├── charters/ · sessions/   # exploratory testing
    ├── exploration-notes/
    ├── test-ideas/             # tc-test-idea/v1 seeds, enriched
    ├── bdd/features/           # Gherkin with @req:/@cs: linkage
    ├── traceability/           # requirement + scenario maps, coverage
    ├── test-data/              # seed JSON, never inlined in test code (D6)
    ├── runs/ · evidence/       # execution records + screenshots/traces
    ├── quality-report/         # current report + committed history
    ├── lessons/                # governed learning inbox
    ├── visuals/                # Mermaid sources + rendered assets
    ├── policy/ · audit/        # governance: permissions, approvals, audit log
    ├── sandbox/                # sandbox config + state
    └── continuous/             # autonomy config + analysis artifacts

Command reference

64 commands across 20 skills.

Each tc-* skill owns a set of /tc:* commands (Decision D1 — every skill is vendored in-repo). Three skills — tc-evidence, tc-governance, and tc-mcp — ship runtime rather than commands. Commands are read-only or proposal-first by default; anything that mutates the workspace or runs code flows through the governed pipeline.

tc-core

Phase 1

Workspace orchestration: initialize, inspect, journal, and recommend.

/tc:init: Copy the workspace template into .test-commander/. Idempotent — existing files preserved.
/tc:status: Print a read-only snapshot: per-bucket file counts, populated counts, per-phase status.
/tc:journal: Append a timestamped narrative entry to today's journal. Append-only.
/tc:next: Read the workspace state and recommend the next /tc:* command for this project.

tc-requirements

Phase 2

Requirements quality: rubric review, INVEST, acceptance criteria, coverage, and test-idea seeds.

/tc:review-requirements: Run the 16-dimension rubric on uploaded requirements; emit a review plus open questions.
/tc:review-user-stories: INVEST review of user stories: independent, negotiable, valuable, estimable, small, testable.
/tc:review-acceptance-criteria: Review acceptance criteria for testability, completeness, and clarity.
/tc:requirements-coverage: Build the requirement coverage map across the workspace.
/tc:requirements-to-tests: Seed a tc-test-idea/v1 file for every requirement (skip-not-overwrite).

tc-knowledge

Phase 3

Project knowledge ingestion: five helpers extract structured artifacts with file:line provenance.

/tc:learn-from-docs: Extract entities, terms, and user journeys from uploaded documents.
/tc:learn-from-specs: Extract endpoints and contracts from API specifications.
/tc:learn-from-code: Extract modules and business rules from source, each with path:line provenance.
/tc:learn-from-api: Extract behavior from recorded API traffic.
/tc:learn-from-tests: Extract existing coverage from the project's current tests.

tc-explore

Phase 4

Charter-based exploratory testing against a recorded Playwright session.

/tc:create-charter: Scope an exploratory session against the project knowledge and risk areas.
/tc:explore: Classify each recorded event into six universal observation types and six anomaly categories.
/tc:session-summary: Synthesize the exploration session summary with a charter-coverage matrix.
/tc:test-ideas: Enrich the Phase-2 test-idea seeds with the exploration findings.

tc-bdd

Phase 5

BDD generation and review with machine-readable traceability tags.

/tc:generate-bdd: Render one Gherkin scenario per enrichment candidate with @req:/@cs: provenance.
/tc:review-bdd: Run a six-category universal BDD quality rubric.

tc-traceability

Phase 5

The cross-cutting map tying requirements to the scenarios that exercise them.

/tc:traceability-map: Rebuild the requirement and scenario-level traceability chains.

tc-build-framework

Phase 6

The lazily-scaffolded Playwright + TypeScript framework.

/tc:build-framework: Scaffold the project-root tests/ tree, playwright.config.ts, and package.json only when first needed (D8).

tc-automation-plan

Phase 6

The strategic gate before any code is generated.

/tc:automation-plan: Score every scenario against a seven-factor suitability rubric: automate / consider / manual.

tc-automate

Phase 6

Generate and mechanically review the automation.

/tc:automate: Generate page objects, fixtures, and specs with @req:/@cs: provenance and fixture-mediated data.
/tc:review-automation: Mechanically review the generated suite for quality and framework compliance.

tc-test-data

Phase 6

The data discipline (D6): nothing inlined in test code.

/tc:generate-test-data: Populate test-data/ seed JSON and a per-area spec consumed via fixtures.

tc-run

Phase 7

Execution and failure triage.

/tc:run: Orchestrate suite execution; capture per-run records and route evidence per config policy.
/tc:analyze-results: Classify failures by responsible layer without weakening assertions or adding sleeps.

tc-quality-report

Phase 7

The quality report and release-readiness gate.

/tc:report: Write the current quality report and snapshot a copy to committed history.
/tc:quality-gate: Evaluate release-readiness against project-defined thresholds; separate facts from interpretation.

tc-evidence

Phase 7

Runtime, not commands: the evidence indexer that routes screenshots, traces, and logs into evidence/ per the config policy.

runtime — no /tc:* commands

tc-learning

Phase 8

The governed learning loop — nothing promoted silently.

/tc:learn: Open and seed the governed lessons inbox.
/tc:learn-from-failures: Derive candidate lessons from test failures.
/tc:learn-from-exploration: Derive candidate lessons from exploration sessions.
/tc:learn-from-feedback: Derive candidate lessons from human feedback.
/tc:review-lessons: Review candidate lessons: accept, reject, or flag for human review.
/tc:promote-lessons: Promote accepted lessons into project guidance — every promotion visible in git diff.

tc-visualize

Phase 9

Visual documentation: eight diagram types, infographics, and a deterministic renderer.

/tc:visualize: Generate the workspace's full diagram set.
/tc:diagram-architecture: Architecture diagram (Mermaid source).
/tc:diagram-coverage: Coverage diagram.
/tc:diagram-flow: User-flow diagram.
/tc:diagram-risk: Risk diagram.
/tc:diagram-sequence: Sequence diagram.
/tc:diagram-state: State diagram.
/tc:diagram-test-strategy: Test-strategy diagram.
/tc:diagram-traceability: Traceability diagram.
/tc:generate-infographic: Build a quality-report infographic.
/tc:render-visuals: Render Mermaid sources to SVG/PNG deterministically (degrades gracefully without the CLI).

tc-web

Phase 10

The read-only web console over the committed workspace.

/tc:web-init: Provision the console config (.web/console.json) inside the workspace. Idempotent.
/tc:web-start: Bring up the console stack (Next.js + FastAPI) on docker compose. Read-only.
/tc:web-sync: Reconcile the SQLite index with the workspace (a clean rebuild).
/tc:web-index-artifacts: Rebuild the index from the workspace into .web/index.db.
/tc:web-export: Export the console view as a deterministic static bundle.

tc-governance

Phase 10.5

Runtime, not commands: the controlled-execution pipeline behind the console's /api/execute — intent → plan → permission policy → approval gate → bounded execution → output validation → audit. Default deny; no backdoor.

runtime — no /tc:* commands

tc-mcp

Phase 11

Runtime, not commands: an expanded Runtime API (apps/api) and a schema-first MCP server (apps/mcp). Both are alternative front-ends to the same governed pipeline; the seven permission levels are enforced server-side.

runtime — no /tc:* commands

tc-sandbox

Phase 12

On-demand, team-accessible environments launched from GitHub Actions, governed and safe-by-default.

/tc:sandbox-init: Write the sandbox config (provider, target, allow-list, private-range block). Skip-not-overwrite.
/tc:sandbox-launch: Launch the sandbox via its provider and persist state. Idempotent; dry-run by default.
/tc:sandbox-status: Report the persisted sandbox state (none / running / stopped).
/tc:sandbox-sync: Push the committed workspace into the sandbox.
/tc:sandbox-stop: Tear the sandbox down. Idempotent.
/tc:sandbox-export: Write a shareable bundle of endpoints, labels, and status.

tc-continuous-quality

Phase 13

The continuous quality agent: watch → analyze → propose → PR, gated by the autonomy mode.

/tc:watch-changes: Detect changed files from a pull-request or push diff.
/tc:impact-analysis: Map changed files to impacted features and requirements (deterministic; never invents impact).
/tc:coverage-gap-analysis: Find impacted features that lack coverage (never invents coverage).
/tc:propose-tests: Propose BDD/automation for the gaps — safe-write; never opens a PR.
/tc:create-test-pr: Open a clearly-labeled PR through the pipeline; gated by the autonomy mode.
/tc:continuous-quality-check: Run the whole loop under the configured autonomy mode.

Governance

One execution path. Default deny. No backdoor.

From Phase 10.5 on, every action above read-only flows through a single controlled-execution pipeline. The web console, the Runtime API, the MCP server, sandboxes, and the continuous agent are all front-ends to it — none can bypass it.

Intent router
→
Command planner
→
Permission policy
→
Approval gate
→
Bounded execution
→
Output validation
→
Audit log

An unsafe request is blocked before the agent is ever reached. A privileged action that is not approved is held — no execution, no change. An approved action is executed in bounds, its diff validated against the plan, and the whole action written to an append-only audit journal. The seven permission levels — read-only, safe-write, code-write, execute-tests, external-network, destructive, admin — are resolved per role and enforced server-side.

Autonomy modes

The agent’s autonomy is a ceiling, not a license.

The continuous quality agent (Phase 13) runs through the same pipeline. The configured autonomy mode decides which permission levels it may auto-approve; nothing above the mode executes without explicit human approval. The modes are cumulative.

Mode	Name	Auto-approves	Can open PRs
0	read-only-advisor	nothing (read-only)	no
1	assisted-testing	safe-write	no
2	approved-execution	+ execute-tests	no
3	pull-request-automation	+ code-write	yes (labeled)
4	governed-autonomy	+ external-network	yes (labeled)

destructive and admin are never auto-approved at any mode, and the agent never auto-merges. A team starts at Mode 0 (advice only) and raises the mode deliberately.

Customize

Universal cores, additive project tuning.

Per Decision D19, every shipped rubric, tag taxonomy, and detector uses universal English and software-engineering vocabulary only — the same workflow runs against a banking app, a hospital system, or an internal dashboard. Your domain enters additively.

<workspace>/config.yaml
Extend rubric keyword sets, tag namespaces, and policy overrides. Extensions union with the universal core — you cannot remove a default.
documents/uploaded/
Drop your real requirements and docs as Markdown. The Phase-2/3 helpers find and parse them — the requirements Test Commander reviews are yours.
policy/permissions.yaml + approvals.yaml
Define roles and which permission levels each may reach, and which actions require an approval. The seven levels are the fixed contract; you tune the roles.
continuous/config.yaml
Set the autonomy mode (0–4) and the label applied to agent-opened pull requests.

The full guides live in the repo.

Every shipped phase ships an end-to-end walkthrough under docs/user-guide/ in the test-commander repo, each driving the seeded fixture with verbatim output. These are the authoritative references.

Command reference
Every shipped command, indexed by phase, linking each per-command page.
Workspace reference
Per-directory ownership inside .test-commander/.
Customizing for your project
The D19 extension model: config.yaml, tag namespaces, autonomy + sandbox config.
Autonomy levels
The five continuous-quality modes and how the gate maps them to the pipeline.
Governance guide
The controlled-execution pipeline, end to end.
Install guide
Prerequisites and the full make install path across macOS, Linux, WSL2, Git Bash.

View the repository Back to the overview

End-user documentation

tc-core

tc-requirements

tc-knowledge

tc-explore

tc-bdd

tc-traceability

tc-build-framework

tc-automation-plan

tc-automate

tc-test-data

tc-run

tc-quality-report

tc-evidence

tc-learning

tc-visualize

tc-web

tc-governance

tc-mcp

tc-sandbox

tc-continuous-quality

Command reference

Workspace reference

Customizing for your project

Autonomy levels

Governance guide

Install guide