Skip to main content
Nick Baynham

Test Commander / Documentation · Phases 0–13 shipped

End-user documentation

Test Commander is a Claude Code plugin plus a small Python runtime. It turns a project’s requirements, source, specs, recorded API traffic, and exploratory recordings into one committed workspace of structured quality artifacts — then runs that workspace from the terminal, a web console, an API, an MCP server, sandboxes, and a continuous agent. This page is the practical reference: install, the workflow order, the workspace, all 64 commands, governance, and the autonomy modes.

Getting started

Install once, then init per project.

Test Commander supports macOS, Linux, WSL2, and Git Bash (Decision D13). You install the plugin once from its repo, then run /tc:init inside each project you want to test. The workspace lives at .test-commander/ and is committed to git.

install + initialize
# 1. Clone and provision the environment (Python 3.12, PDM, Docker, git, make).
$ git clone https://github.com/NickBaynham/test-commander
$ cd test-commander
$ ./bootstrap.sh          # verify prerequisites; auto-install the safe ones
$ make install            # validate manifests, register the local marketplace,
                          # install the plugin, verify the 20 skills

# 2. Inside YOUR project, initialize the workspace.
$ cd ~/projects/your-app
$ /tc:init                # copy the workspace template into .test-commander/
$ /tc:status              # read-only snapshot of the workspace + per-phase status
$ /tc:next                # ask what to do next for this project

The workflow order — let /tc:next guide you, or follow the phases

the workflow, phase by phase
/tc:init                    # 1.  workspace
/tc:review-requirements     # 2.  requirements quality
/tc:learn-from-docs ...     # 3.  project knowledge ingestion
/tc:create-charter          # 4.  scope an exploratory session
/tc:explore                 #     explore + classify
/tc:test-ideas              #     enrich the seeds
/tc:generate-bdd            # 5.  BDD + traceability
/tc:automation-plan         # 6.  score, then generate
/tc:automate                #     the Playwright suite
/tc:run                     # 7.  execute + collect evidence
/tc:report                  #     publish the quality report
/tc:learn                   # 8.  governed learning loop
/tc:visualize               # 9.  diagrams + infographics
/tc:web-start               # 10. read-only web console
# 10.5 governance · 11 API+MCP · 12 sandboxes · 13 continuous quality

The workspace

One committed directory is the source of truth.

Everything Test Commander produces lands under .test-commander/ as plain Markdown, YAML, and JSON. Every helper is idempotent and byte-deterministic, so re-running against unchanged input produces identical bytes — the workspace reviews like any other source, in real git diffs.

.test-commander/ — committed workspace layout
your-app/
└── .test-commander/            # committed to git like any other source
    ├── project.md              # project metadata
    ├── config.yaml             # YOUR domain extensions (D19)
    ├── journal/                # append-only narrative log
    ├── documents/uploaded/     # your requirements + docs (input)
    ├── requirements/           # reviews + open questions
    ├── product-knowledge/      # ingested entities, journeys, rules, impact map
    ├── charters/ · sessions/   # exploratory testing
    ├── exploration-notes/
    ├── test-ideas/             # tc-test-idea/v1 seeds, enriched
    ├── bdd/features/           # Gherkin with @req:/@cs: linkage
    ├── traceability/           # requirement + scenario maps, coverage
    ├── test-data/              # seed JSON, never inlined in test code (D6)
    ├── runs/ · evidence/       # execution records + screenshots/traces
    ├── quality-report/         # current report + committed history
    ├── lessons/                # governed learning inbox
    ├── visuals/                # Mermaid sources + rendered assets
    ├── policy/ · audit/        # governance: permissions, approvals, audit log
    ├── sandbox/                # sandbox config + state
    └── continuous/             # autonomy config + analysis artifacts

Command reference

64 commands across 20 skills.

Each tc-* skill owns a set of /tc:* commands (Decision D1 — every skill is vendored in-repo). Three skills — tc-evidence, tc-governance, and tc-mcp — ship runtime rather than commands. Commands are read-only or proposal-first by default; anything that mutates the workspace or runs code flows through the governed pipeline.

tc-core

Phase 1

Workspace orchestration: initialize, inspect, journal, and recommend.

/tc:init
Copy the workspace template into .test-commander/. Idempotent — existing files preserved.
/tc:status
Print a read-only snapshot: per-bucket file counts, populated counts, per-phase status.
/tc:journal
Append a timestamped narrative entry to today's journal. Append-only.
/tc:next
Read the workspace state and recommend the next /tc:* command for this project.

tc-requirements

Phase 2

Requirements quality: rubric review, INVEST, acceptance criteria, coverage, and test-idea seeds.

/tc:review-requirements
Run the 16-dimension rubric on uploaded requirements; emit a review plus open questions.
/tc:review-user-stories
INVEST review of user stories: independent, negotiable, valuable, estimable, small, testable.
/tc:review-acceptance-criteria
Review acceptance criteria for testability, completeness, and clarity.
/tc:requirements-coverage
Build the requirement coverage map across the workspace.
/tc:requirements-to-tests
Seed a tc-test-idea/v1 file for every requirement (skip-not-overwrite).

tc-knowledge

Phase 3

Project knowledge ingestion: five helpers extract structured artifacts with file:line provenance.

/tc:learn-from-docs
Extract entities, terms, and user journeys from uploaded documents.
/tc:learn-from-specs
Extract endpoints and contracts from API specifications.
/tc:learn-from-code
Extract modules and business rules from source, each with path:line provenance.
/tc:learn-from-api
Extract behavior from recorded API traffic.
/tc:learn-from-tests
Extract existing coverage from the project's current tests.

tc-explore

Phase 4

Charter-based exploratory testing against a recorded Playwright session.

/tc:create-charter
Scope an exploratory session against the project knowledge and risk areas.
/tc:explore
Classify each recorded event into six universal observation types and six anomaly categories.
/tc:session-summary
Synthesize the exploration session summary with a charter-coverage matrix.
/tc:test-ideas
Enrich the Phase-2 test-idea seeds with the exploration findings.

tc-bdd

Phase 5

BDD generation and review with machine-readable traceability tags.

/tc:generate-bdd
Render one Gherkin scenario per enrichment candidate with @req:/@cs: provenance.
/tc:review-bdd
Run a six-category universal BDD quality rubric.

tc-traceability

Phase 5

The cross-cutting map tying requirements to the scenarios that exercise them.

/tc:traceability-map
Rebuild the requirement and scenario-level traceability chains.

tc-build-framework

Phase 6

The lazily-scaffolded Playwright + TypeScript framework.

/tc:build-framework
Scaffold the project-root tests/ tree, playwright.config.ts, and package.json only when first needed (D8).

tc-automation-plan

Phase 6

The strategic gate before any code is generated.

/tc:automation-plan
Score every scenario against a seven-factor suitability rubric: automate / consider / manual.

tc-automate

Phase 6

Generate and mechanically review the automation.

/tc:automate
Generate page objects, fixtures, and specs with @req:/@cs: provenance and fixture-mediated data.
/tc:review-automation
Mechanically review the generated suite for quality and framework compliance.

tc-test-data

Phase 6

The data discipline (D6): nothing inlined in test code.

/tc:generate-test-data
Populate test-data/ seed JSON and a per-area spec consumed via fixtures.

tc-run

Phase 7

Execution and failure triage.

/tc:run
Orchestrate suite execution; capture per-run records and route evidence per config policy.
/tc:analyze-results
Classify failures by responsible layer without weakening assertions or adding sleeps.

tc-quality-report

Phase 7

The quality report and release-readiness gate.

/tc:report
Write the current quality report and snapshot a copy to committed history.
/tc:quality-gate
Evaluate release-readiness against project-defined thresholds; separate facts from interpretation.

tc-evidence

Phase 7

Runtime, not commands: the evidence indexer that routes screenshots, traces, and logs into evidence/ per the config policy.

runtime — no /tc:* commands

tc-learning

Phase 8

The governed learning loop — nothing promoted silently.

/tc:learn
Open and seed the governed lessons inbox.
/tc:learn-from-failures
Derive candidate lessons from test failures.
/tc:learn-from-exploration
Derive candidate lessons from exploration sessions.
/tc:learn-from-feedback
Derive candidate lessons from human feedback.
/tc:review-lessons
Review candidate lessons: accept, reject, or flag for human review.
/tc:promote-lessons
Promote accepted lessons into project guidance — every promotion visible in git diff.

tc-visualize

Phase 9

Visual documentation: eight diagram types, infographics, and a deterministic renderer.

/tc:visualize
Generate the workspace's full diagram set.
/tc:diagram-architecture
Architecture diagram (Mermaid source).
/tc:diagram-coverage
Coverage diagram.
/tc:diagram-flow
User-flow diagram.
/tc:diagram-risk
Risk diagram.
/tc:diagram-sequence
Sequence diagram.
/tc:diagram-state
State diagram.
/tc:diagram-test-strategy
Test-strategy diagram.
/tc:diagram-traceability
Traceability diagram.
/tc:generate-infographic
Build a quality-report infographic.
/tc:render-visuals
Render Mermaid sources to SVG/PNG deterministically (degrades gracefully without the CLI).

tc-web

Phase 10

The read-only web console over the committed workspace.

/tc:web-init
Provision the console config (.web/console.json) inside the workspace. Idempotent.
/tc:web-start
Bring up the console stack (Next.js + FastAPI) on docker compose. Read-only.
/tc:web-sync
Reconcile the SQLite index with the workspace (a clean rebuild).
/tc:web-index-artifacts
Rebuild the index from the workspace into .web/index.db.
/tc:web-export
Export the console view as a deterministic static bundle.

tc-governance

Phase 10.5

Runtime, not commands: the controlled-execution pipeline behind the console's /api/execute — intent → plan → permission policy → approval gate → bounded execution → output validation → audit. Default deny; no backdoor.

runtime — no /tc:* commands

tc-mcp

Phase 11

Runtime, not commands: an expanded Runtime API (apps/api) and a schema-first MCP server (apps/mcp). Both are alternative front-ends to the same governed pipeline; the seven permission levels are enforced server-side.

runtime — no /tc:* commands

tc-sandbox

Phase 12

On-demand, team-accessible environments launched from GitHub Actions, governed and safe-by-default.

/tc:sandbox-init
Write the sandbox config (provider, target, allow-list, private-range block). Skip-not-overwrite.
/tc:sandbox-launch
Launch the sandbox via its provider and persist state. Idempotent; dry-run by default.
/tc:sandbox-status
Report the persisted sandbox state (none / running / stopped).
/tc:sandbox-sync
Push the committed workspace into the sandbox.
/tc:sandbox-stop
Tear the sandbox down. Idempotent.
/tc:sandbox-export
Write a shareable bundle of endpoints, labels, and status.

tc-continuous-quality

Phase 13

The continuous quality agent: watch → analyze → propose → PR, gated by the autonomy mode.

/tc:watch-changes
Detect changed files from a pull-request or push diff.
/tc:impact-analysis
Map changed files to impacted features and requirements (deterministic; never invents impact).
/tc:coverage-gap-analysis
Find impacted features that lack coverage (never invents coverage).
/tc:propose-tests
Propose BDD/automation for the gaps — safe-write; never opens a PR.
/tc:create-test-pr
Open a clearly-labeled PR through the pipeline; gated by the autonomy mode.
/tc:continuous-quality-check
Run the whole loop under the configured autonomy mode.

Governance

One execution path. Default deny. No backdoor.

From Phase 10.5 on, every action above read-only flows through a single controlled-execution pipeline. The web console, the Runtime API, the MCP server, sandboxes, and the continuous agent are all front-ends to it — none can bypass it.

  1. Intent router
  2. Command planner
  3. Permission policy
  4. Approval gate
  5. Bounded execution
  6. Output validation
  7. Audit log

An unsafe request is blocked before the agent is ever reached. A privileged action that is not approved is held — no execution, no change. An approved action is executed in bounds, its diff validated against the plan, and the whole action written to an append-only audit journal. The seven permission levels — read-only, safe-write, code-write, execute-tests, external-network, destructive, admin — are resolved per role and enforced server-side.

Autonomy modes

The agent’s autonomy is a ceiling, not a license.

The continuous quality agent (Phase 13) runs through the same pipeline. The configured autonomy mode decides which permission levels it may auto-approve; nothing above the mode executes without explicit human approval. The modes are cumulative.

ModeNameAuto-approvesCan open PRs
0read-only-advisornothing (read-only)no
1assisted-testingsafe-writeno
2approved-execution+ execute-testsno
3pull-request-automation+ code-writeyes (labeled)
4governed-autonomy+ external-networkyes (labeled)

destructive and admin are never auto-approved at any mode, and the agent never auto-merges. A team starts at Mode 0 (advice only) and raises the mode deliberately.

Customize

Universal cores, additive project tuning.

Per Decision D19, every shipped rubric, tag taxonomy, and detector uses universal English and software-engineering vocabulary only — the same workflow runs against a banking app, a hospital system, or an internal dashboard. Your domain enters additively.

  • <workspace>/config.yaml

    Extend rubric keyword sets, tag namespaces, and policy overrides. Extensions union with the universal core — you cannot remove a default.

  • documents/uploaded/

    Drop your real requirements and docs as Markdown. The Phase-2/3 helpers find and parse them — the requirements Test Commander reviews are yours.

  • policy/permissions.yaml + approvals.yaml

    Define roles and which permission levels each may reach, and which actions require an approval. The seven levels are the fixed contract; you tune the roles.

  • continuous/config.yaml

    Set the autonomy mode (0–4) and the label applied to agent-opened pull requests.

More documentation

The full guides live in the repo.

Every shipped phase ships an end-to-end walkthrough under docs/user-guide/ in the test-commander repo, each driving the seeded fixture with verbatim output. These are the authoritative references.

View the repositoryBack to the overview