Flagship project · Phases 0–13 shipped · Project complete

Test Commander

AI-assisted software testing from exploration to automation.

A practical workflow for testers: explore software, identify user flows, design scenarios, generate BDD specs, produce deterministic Playwright automation, and ship quality reports — all from the terminal. Not a replacement for testers. An assistant for them.

Autonomous where safe. Human-governed where it matters.

~/test-commander · workflow demo

$ ./bootstrap.sh && make install
$ /tc:init
$ /tc:review-requirements
$ /tc:learn-from-docs
$ /tc:create-charter --target "Sign-in flow"
$ /tc:explore --charter CH-001
$ /tc:test-ideas --session SESS-20260528-600
$ /tc:generate-bdd
$ /tc:automation-plan
$ /tc:automate
$ /tc:traceability-map

// These commands take a tester from a cold local app through requirements, knowledge, exploration, BDD, and a generated Playwright suite. The agent does the heavy lifting; the tester stays in charge of every decision in between.

Test Commander turns human testing insight into structured automation and quality evidence.

Human judgement: Testers stay in charge of scope, risk, and what “good enough” means.
Agent assistance: Exploration, documentation, drafting, generation, summarization.
Deterministic evidence: Playwright tests, BDD specs, and reports a CI/CD pipeline can rely on.

What it is

A terminal-first testing assistant.

Test Commander is a workflow and framework concept for modern quality engineering. It treats AI as a structured assistant, not a magic test generator. The tester stays in control; the agent does the heavy lifting.

Exploratory notes
Structured ideas
BDD specifications
Playwright tests
Quality evidence

Why it exists

Most testing teams hit the same wall.

Manual testing knowledge stays trapped in people’s heads. Exploration produces insight but inconsistent documentation. Automation lags behind delivery. Reports are manual, inconsistent, or missing. Teams want to use AI but worry about reliability. Test Commander connects exploration, design, automation, and reporting so the work compounds instead of dispersing.

Manual testing knowledge becomes reusable, not lost.
Exploration produces documentation, not anecdotes.
Automation starts from intent, not screen-recorded clicks.
Reports communicate evidence, not opinions.
AI accelerates the boring parts, humans own the calls.
The same workflow runs locally and in CI/CD.

For manual testers

Your thinking becomes the input.

Think of Test Commander as an assistant that turns your testing instincts into organized artifacts. You decide what matters. The agent helps document, structure, and automate. Here is what that conversion looks like on a real workflow.

You observe

Exploring a shopping-cart flow, you notice friction the team has not catalogued.

The cart count does not always update.
Invalid search terms are handled inconsistently.
Required form fields do not show clear errors.
Login behavior changes after a failed attempt.
Some buttons are hard to locate reliably.

Test Commander produces

Those observations become reusable artifacts in minutes, ready for review.

User-flow documentation for the cart and checkout.
Page-object candidates with stable locators.
Defects and risks logged with severity hints.
BDD scenarios you can review and approve.
Playwright tests and quality-report findings.

The workflow

A loop, not a one-way street.

Seven steps, designed to feed each other. Insight gathered in one cycle seeds the exploration in the next.

01
Explore
Charter-based sessions with structured anomaly capture.
02
Model
Ingest documents, specs, code, recordings, tests.
03
Specify
Reviewed requirements and traceable BDD specs.
04
Automate
Playwright suite generated from scored candidates.
05
Execute
Local + CI runs with evidence captured per run.
06
Report
Quality report with history; release-readiness scoring.
07
Improve
Governed lessons; nothing promoted silently.

01. Explore

Phase 4 (shipped 2026-05-28). /tc:create-charter scopes a session against the project knowledge; /tc:explore classifies every recorded Playwright event into six universal observation types and six universal anomaly categories with a Charter-Coverage matrix; an internal exploration-review sub-mode routes gap signals to requirements/open-questions.md.

02. Model

Phase 3 (shipped 2026-05-27). Five /tc:learn-from-* commands extract entities, terms, journeys, endpoints, modules, recorded responses, and test coverage into ten structured product-knowledge artifacts under .test-commander/product-knowledge/ with full path:line provenance. A shared synthesizer rebuilds system-model.md byte-deterministically at the end of every run.

03. Specify

Phase 2 (shipped 2026-05-27) ships the requirements layer; Phase 5 (shipped 2026-05-29) turns enriched test ideas into Gherkin. /tc:generate-bdd renders one scenario per enrichment candidate with machine-readable @req:/@cs:provenance, /tc:review-bdd runs a six-category universal rubric, and /tc:traceability-map rebuilds the requirement and scenario-level maps tying each requirement forward to the scenarios that exercise it.

04. Automate

Phase 6 (shipped 2026-05-29) is the project’s first executable artifacts. /tc:build-framework lazily scaffolds a Playwright + TypeScript framework; /tc:automation-plan scores every scenario against a seven-factor suitability rubric; /tc:automate generates page objects, fixtures, and specs with @req:/@cs: provenance; /tc:review-automation enforces quality; and /tc:generate-test-data keeps data in .test-commander/test-data/ rather than inline in code.

05. Execute

Phase 7 (shipped). /tc:run orchestrates suite execution and /tc:analyze-results triages failures; per-run records land in .test-commander/runs/; screenshots, traces, and logs route to .test-commander/evidence/ with the policy defined in config.yaml. The same workflow runs locally and in CI.

06. Report

Phase 7 (shipped). /tc:report writes .test-commander/quality-report/current-quality-report.md and snapshots a copy to history/YYYY-MM-DD-HHmm.md. /tc:quality-gate evaluates release-readiness against project-defined thresholds. Facts, interpretation, and human-review items stay clearly separated.

07. Improve

Phase 8 (shipped). /tc:learn, /tc:learn-from-failures, /tc:learn-from-exploration, /tc:learn-from-feedback, /tc:review-lessons, and /tc:promote-lessons turn the workspace into a learning loop. Every promotion is visible in git diff — Test Commander never silently rewrites methodology.

The full roadmap

Every phase shipped, foundation first.

Test Commander was built phase by phase, each landing under strict test-driven discipline with its own annotated git tag (phase-0 … phase-13). The foundation (Phases 0–6) established the workspace and the exploration-to- automation pipeline; the later phases (7–13) added execution, reporting, learning, visuals, a governed web console, an API and MCP server, sandboxes, and a continuous quality agent. The whole roadmap is now complete.

Foundation · Phases 0–6

Phase 0 — Repository foundation, plugin scaffold, marketplace registration
Phase 1 — Workspace and artifact model (/tc:init, /tc:status, /tc:journal, /tc:next)
Phase 2 — Requirements quality (16-dimension rubric, INVEST review, acceptance-criteria review, coverage map, seeded test-ideas)
Phase 3 — Project knowledge ingestion (five /tc:learn-from-* helpers, shared synthesizer, ten product-knowledge artifacts)
Phase 4 — Charter-based exploratory testing (charters, recorded-session replay, session summaries, Phase-2 seed enrichment)
Phase 5 — BDD generation and traceability (Gherkin features with @req:/@cs: linkage, six-category review, requirement + scenario maps)
Phase 6 — Lazy Playwright/TypeScript framework, seven-factor automation plan, generated suite + review, test-data discipline (D6)

Now also shipped · Phases 7–13

Phase 7 — Execution, evidence policy, and the quality report with committed history
Phase 8 — Governed continuous learning loop (nothing promoted silently)
Phase 9 — Mermaid diagrams + infographics (eight /tc:diagram-* commands)
Phase 10 — Read-only web console (dashboard, journal, BDD viewer, run history, evidence)
Phase 10.5 — Controlled agent execution: the single governed-execution pipeline
Phase 11 — Runtime API + schema-first MCP server (front-ends to the same pipeline)
Phase 12 — Sandboxed environments launched from GitHub Actions, safe-by-default
Phase 13 — Continuous quality agent with five autonomy modes

What ships

Twenty skills, 64 commands, one workspace per project.

Each tc-* skill is owned in-repo (Decision D1 — no community-skill dependencies). The commands route to bundled Python helpers; the workspace lives at .test-commander/ inside the consuming project and is committed to git like any other source artifact. Three skills (tc-evidence, tc-governance, tc-mcp) ship runtime rather than /tc:* commands. The full per-command reference lives on the documentation page.

tc-core
Phase 1 · shipped
Workspace orchestration. Initialize, inspect, journal, recommend.
- /tc:init
- /tc:status
- /tc:journal
- /tc:next
tc-requirements
Phase 2 · shipped
Requirements quality. 16-dimension rubric, INVEST review, AC review, coverage, seed test-ideas.
- /tc:review-requirements
- /tc:review-user-stories
- /tc:review-acceptance-criteria
- /tc:requirements-coverage
- /tc:requirements-to-tests
tc-knowledge
Phase 3 · shipped
Project knowledge ingestion. Five helpers extract structured artifacts from documents, specs, code, recorded API traffic, and existing tests.
- /tc:learn-from-docs
- /tc:learn-from-specs
- /tc:learn-from-code
- /tc:learn-from-api
- /tc:learn-from-tests
tc-explore
Phase 4 · shipped
Charter-based exploratory testing. Scope a session, replay a recorded Playwright run, synthesize the summary, enrich the Phase-2 test-idea seeds.
- /tc:create-charter
- /tc:explore
- /tc:session-summary
- /tc:test-ideas
tc-bdd
Phase 5 · shipped
BDD generation and review. Render Gherkin from enriched test ideas with @req:/@cs: provenance; run a six-category universal rubric.
- /tc:generate-bdd
- /tc:review-bdd
tc-traceability
Phase 5 · shipped
The cross-cutting map. Rebuild the requirement and scenario-level traceability chains; downstream links resolve as phases populate them.
- /tc:traceability-map
tc-build-framework
Phase 6 · shipped
The lazy framework. Scaffold the project-root tests/ tree, playwright.config.ts, and package.json only when automation first needs them (D8).
- /tc:build-framework
tc-automation-plan
Phase 6 · shipped
The strategic gate. Score every scenario against a seven-factor suitability rubric and rank each automate / consider / manual.
- /tc:automation-plan
tc-automate
Phase 6 · shipped
Generation and review. Render page objects, fixtures, and specs with provenance and fixture-mediated data; mechanically review the result.
- /tc:automate
- /tc:review-automation
tc-test-data
Phase 6 · shipped
The data discipline. Populate test-data/ seed JSON and a per-area spec so nothing is inlined in test code (D6).
- /tc:generate-test-data
tc-run
Phase 7 · shipped
Execution and triage. Orchestrate the suite, capture per-run records, and classify failures without weakening assertions.
- /tc:run
- /tc:analyze-results
tc-quality-report
Phase 7 · shipped
The quality report. Write the current report with committed history and evaluate release-readiness against project thresholds.
- /tc:report
- /tc:quality-gate
tc-evidence
Phase 7 · shipped
The evidence indexer. Route screenshots, traces, and logs into .test-commander/evidence/ per the config policy. Runtime; no /tc:* commands.
- evidence indexer
tc-learning
Phase 8 · shipped
The governed learning loop. Capture lessons from failures, exploration, and feedback; review and promote them in visible git diffs.
- /tc:learn
- /tc:learn-from-failures
- /tc:learn-from-exploration
- /tc:learn-from-feedback
- /tc:review-lessons
- /tc:promote-lessons
tc-visualize
Phase 9 · shipped
Visual documentation. Eight diagram types, infographics, and a deterministic renderer turn the workspace into Mermaid sources and rendered assets.
- /tc:visualize
- /tc:diagram-*
- /tc:generate-infographic
- /tc:render-visuals
tc-web
Phase 10 · shipped
The read-only web console. A team-facing viewer over the committed workspace — dashboard, journal, BDD, runs, evidence — that never invents data.
- /tc:web-init
- /tc:web-start
- /tc:web-sync
- /tc:web-index-artifacts
- /tc:web-export
tc-governance
Phase 10.5 · shipped
The controlled-execution pipeline. Intent → plan → policy → approval → bounded execution → validation → audit. Default deny; the single path every action takes. Runtime; no /tc:* commands.
- governance pipeline
tc-mcp
Phase 11 · shipped
Runtime API + MCP server. Alternative front-ends that drive Test Commander over HTTP and the Model Context Protocol — through the same pipeline. Runtime; no /tc:* commands.
- Runtime API
- MCP server
tc-sandbox
Phase 12 · shipped
Sandboxed environments. Launch an on-demand, team-accessible Test Commander environment from GitHub Actions, governed and safe-by-default.
- /tc:sandbox-init
- /tc:sandbox-launch
- /tc:sandbox-status
- /tc:sandbox-sync
- /tc:sandbox-stop
- /tc:sandbox-export
tc-continuous-quality
Phase 13 · shipped
The continuous quality agent. Watch changes, map impact, find coverage gaps, propose tests, and open labeled PRs — gated by five autonomy modes.
- /tc:watch-changes
- /tc:impact-analysis
- /tc:coverage-gap-analysis
- /tc:propose-tests
- /tc:create-test-pr
- /tc:continuous-quality-check

User-guide walkthroughs

One reproducible walkthrough per shipped phase.

Every shipped phase ships its own end-to-end walkthrough under docs/user-guide/ in the test-commander repo. Each one drives the seeded fixture end to end with verbatim sample output so a reader can reproduce the result in a tmp workspace.

Phase 1
First workflow walkthrough
From clone to /tc:next: init the workspace, edit project metadata, append a journal entry, ask what to do next.
Phase 2
Reviewing requirements
Upload requirements.md, run the rubric pass, surface mutually-exclusive open questions, seed tc-test-idea/v1 files for every REQ.
Phase 3
Building project knowledge
Drive five /tc:learn-from-* helpers against the seeded sample-project fixture; produce ten product-knowledge artifacts with file:line provenance.
Phase 4
Exploring an app
Charter -> explore -> session-summary -> test-ideas: scope an exploration, classify every recorded event into universal observation and anomaly cores, enrich the Phase-2 seeds.
Phase 5
Generating BDD
generate-bdd -> review-bdd -> traceability-map: render Gherkin from enriched test ideas with @req:/@cs: linkage, run the six-category rubric, and rebuild the requirement and scenario-level maps.
Phase 6
Automating a suite
build-framework -> automation-plan -> automate -> review-automation -> generate-test-data: score scenarios, generate a traceable Playwright/TypeScript suite, and keep test data out of the code.
Phase 7
Running tests
run -> analyze-results -> report -> quality-gate: execute the suite, triage failures, write the quality report with committed history, and score release-readiness.
Phase 8
The learning loop
Capture lessons from failures, exploration, and feedback, then review and promote them — every promotion visible in git diff, nothing rewritten silently.
Phase 9
Visuals and infographics
visualize -> the eight diagram-* commands -> generate-infographic -> render-visuals: turn the workspace into Mermaid sources and deterministically rendered assets.
Phase 10
The web console
web-init -> web-start -> web-sync: bring up a read-only, team-facing viewer over the committed workspace. Renders the artifacts; never invents data or runs a command.
Phase 10.5
Governance
The controlled-execution pipeline: how a user request becomes a planned, permission-checked, approved, validated, and audited action — with default deny and no backdoor.
Phase 11
Integrating (API + MCP)
Drive Test Commander from another tool or agent over the Runtime API or the schema-first MCP server — both front-ends to the same governed pipeline.
Phase 12
Sandboxes
sandbox-init -> sandbox-launch -> sandbox-status -> sandbox-stop: launch an on-demand environment from GitHub Actions, allow-listed and private-range-blocked by default.
Phase 13
Continuous quality
watch-changes -> impact-analysis -> coverage-gap-analysis -> propose-tests -> create-test-pr: the watch -> analyze -> propose -> PR loop, gated by the configured autonomy mode.

For a single hands-on tour that walks all four shipped phases against one tmp project, read the blog post Test Commander after Phase 4: a hands-on tour.

What it produces

Artifacts you can review, automate, and ship.

The output is not just “a test ran.” It is structured artifacts your team can read, edit, version-control, and learn from.

Charter · YAML frontmatter (Phase 4 · shipped)

.test-commander/charters/CH-001.md

---
id: CH-001
mission: Discover whether the Sign-in flow plus workspace-detail asset upload
  behaves correctly under the documented risk conditions.
target: Sign-in flow plus workspace-detail asset upload (POST /workspaces/{id}/assets).
time-box: 60min
risk-areas:
  - Authentication / authorization boundaries
  - Session lifecycle and token leakage
  - Performance under documented load thresholds
acceptance-criteria:
  - Every flow under '...' completes the happy path with documented status codes.
  - Authentication is correctly enforced for every endpoint that should require it.
  - At least one anomaly per universal category is documented or explained away.
created_at: 2026-05-28T18:47:33Z
phase_3_sources:
  - product-knowledge/entities.md
  - product-knowledge/user-journeys.md
  - requirements/open-questions.md
---

Exploration note · table excerpt (Phase 4 · shipped)

.test-commander/exploration-notes/SESS-20260528-600.md

# SESS-20260528-600 - exploration note for CH-001

## Observations

| # | event_type      | Page             | Result |
| - | --------------- | ---------------- | ------ |
| 0 | page_load       | /sign-in         | ok     |
| 4 | click           | /sign-in         |        |
| 5 | network_request | /sign-in         | 201    |
| 8 | network_request | /dashboard       | 200    |

## Anomalies

| Category         | Severity | Page             | Evidence |
| ---------------- | -------- | ---------------- | -------- |
| auth-mismatch    | high     | /workspaces/ws-1 | S-005    |
| broken-link      | medium   | /account/profile | S-004    |
| slow-response    | high     | /dashboard       | S-002    |

Enriched test-idea · Phase-2 seed + Phase-4 enrichment

.test-commander/test-ideas/REQ-005.md

---
schema: tc-test-idea/v1
requirement_id: REQ-005
requirement_title: All API access requires an authenticated user account
status: enriched              # was: status: seed (Phase 2)
phase_4_sessions: [SESS-20260528-600]
phase_2_findings: [completeness, consistency, testability]
candidates:                   # Phase 2 seeded; preserved byte-for-byte
  - id: REQ-005-happy-01
    title: Happy path
    type: positive
generated_by: /tc:requirements-to-tests
---

## Phase 4 enrichment

### SESS-20260528-600

- **CS-600-001** (negative) - Reproduce auth-mismatch on /workspaces/ws-1
- **CS-600-010** (happy)    - Happy path: POST /sessions returns 201

BDD feature · @req:/@cs: linkage (Phase 5 · shipped)

.test-commander/bdd/features/sign-in.feature

@area:sign-in
Feature: Sign-in flow

  @req:REQ-005 @cs:CS-600-010 @smoke
  Scenario: Happy path - authenticated session is created
    Given a registered user on the sign-in page
    When they submit valid credentials
    Then the session is created and the dashboard loads

  @req:REQ-005 @cs:CS-600-001 @regression @anomaly:auth-mismatch
  Scenario: Authorization boundary is enforced
    Given an authenticated user without workspace access
    When they request a protected workspace asset
    Then the request is rejected

Generated spec · provenance + fixture data (Phase 6 · shipped)

tests/e2e/sign-in.spec.ts

import { test, expect } from "../fixtures/sign-in";

// @req:REQ-005 @cs:CS-600-010
test("Happy path - authenticated session is created", async ({
  signInPage,
  data,
}) => {
  await signInPage.goto();
  await signInPage.signIn(data.validUser);
  await expect(signInPage.dashboard).toBeVisible();
});

// Generated by /tc:automate · refine steps inside the preserved region.
// Data flows from .test-commander/test-data/seed/sign-in.json (D6).

Terminal workflow

Seven commands, end to end.

Test Commander is terminal-first by design. A tester who is comfortable with the command line never has to open an IDE to drive the workflow.

Command	Purpose
`./bootstrap.sh`	Verify prereqs (Python 3.12, PDM, Docker, git, make); auto-install the safe ones.
`make install`	Validate plugin manifests, register the local Claude Code marketplace, install the test-commander plugin, verify the twenty shipped skills.
`/tc:init`	Inside a consuming project, copy the 63-file workspace template into .test-commander/. Idempotent — existing files are preserved.
`/tc:status`	Print a snapshot: per-bucket file counts, populated counts (bytes differ from template), per-phase status. Read-only.
`/tc:journal append`	Append a timestamped narrative entry to today's journal/YYYY-MM-DD.md. Append-only; never edited in place.
`/tc:next`	Read the workspace state and recommend the next /tc:* command for this project.
`/tc:review-requirements`	Run the 16-dimension rubric on uploaded requirements.md; emit requirements-review.md plus [<kind>] open-questions.

Who it’s for

Five different audiences, one workflow.

The same artifacts serve manual testers, automation engineers, QA managers, recruiters, and clients — each gets value at a different point in the lifecycle.

Manual testers
Turn exploratory knowledge into reusable evidence.
- Organize exploration into user flows and risks.
- Convert observations into BDD scenarios you can review.
- Participate in AI-assisted work without becoming a developer overnight.
Automation engineers
Start from a model, not a pile of vague tickets.
- Structured flows, locators, and page objects as inputs.
- Spec-first generation that maps cleanly to Playwright.
- Less rework because tests start from real testing intent.
QA managers
A repeatable quality process with explicit guardrails.
- Faster design, clearer coverage, stronger reporting.
- Easier onboarding for new testers.
- A practical way to adopt AI without losing human oversight.
Recruiters and hiring managers
Evidence of modern QA judgement, not just tooling.
- AI-assisted exploration, BDD, Playwright, CI/CD, reporting — connected.
- Demonstrates human-in-the-loop quality systems.
- Pairs automation engineering with strategic communication.
Clients
A practical path to AI-enhanced quality engineering.
- Modernize exploratory testing into a structured automation pipeline.
- Generate readable specs your team can review and approve.
- Stand up regression coverage and quality reports without starting from scratch.

Design principles

The opinions baked into the workflow.

Principle 01
Human-guided, not fully autonomous
The agent helps; the tester owns the quality decision. AI output is never treated as automatically correct.
Principle 02
Universal cores, project-specific tuning
Per Decision D19, every shipped detector uses universal English and software-engineering vocabulary only. Domain awareness enters additively through <workspace>/config.yaml — extensions union with the universal core; you cannot remove a default. The same rubric runs against a banking app, a hospital system, or an internal dashboard.
Principle 03
file:line provenance for every claim
Every entity, business rule, endpoint, anomaly, candidate scenario, or open question Test Commander surfaces is paired with the path:line where it came from. The structured artifacts are indexes, not summaries. You can always answer 'where did this come from' without leaving the workspace.
Principle 04
Byte-deterministic re-runs
Every shipped helper is idempotent. Re-running against unchanged input produces byte-identical bytes. The workspace is safe to commit to git like any other source artifact — reviews show up as real diffs; nothing flickers on re-run.
Principle 05
Exploration before automation
Automation starts from understanding. Identify what matters first, then encode it.
Principle 06
BDD as the bridge (Phase 5, shipped)
Readable specs connect manual testers, automation engineers, and product stakeholders. The tc-test-idea/v1 schema Phase 2 authors and Phase 4 enriches is the input contract Phase 5 reads — every generated scenario carries @req:/@cs: tags that are the mechanical join key the traceability map parses.
Principle 07
Deterministic tests for CI/CD (Phase 6, shipped)
AI may help generate tests, but CI/CD needs reliable checks. Phase 6 generates and structurally validates a Playwright/TypeScript suite, but never invokes the runner — execution is Phase 7's job. Playwright stays the source of truth.
Principle 08
Separate facts from interpretation
Reports distinguish observed, tested, passed, failed, inferred, and items needing human review. The Phase-7 quality report (shipped) enforces this separation.
Principle 09
One governed execution path
From Phase 10.5 on, every action above read-only flows through a single controlled-execution pipeline — intent, plan, permission policy, approval gate, bounded execution, validation, audit. The web console, the Runtime API, the MCP server, sandboxes, and the continuous agent are all front-ends to it. Default deny; nothing bypasses the gates.

Capability roadmap

All fourteen phases shipped.

What the system can do, phase by phase — every one now complete, each landed under test-driven discipline with its own annotated git tag. Pair this with the team-adoption maturity model below — the two roadmaps answer different questions.

Phase 0
Shipped
Repo foundation
Bootstrap script, plugin manifest, marketplace registration, skill verifier, link checker, CI scaffold.
Phase 1
Shipped
Workspace + artifact model
tc-core: /tc:init, /tc:status, /tc:journal, /tc:next. 63-file workspace template; per-phase recommendation engine.
Phase 2
Shipped
Requirements + user-story intelligence
tc-requirements: 16-dimension rubric, INVEST review, AC review, coverage map, tc-test-idea/v1 seeds.
Phase 3
Shipped
Project knowledge ingestion
tc-knowledge: five /tc:learn-from-* helpers (docs, specs, code, api, tests) + shared synthesizer; ten product-knowledge artifacts with file:line provenance.
Phase 4
Shipped
Exploratory testing
tc-explore: /tc:create-charter, /tc:explore + internal review sub-mode, /tc:session-summary, /tc:test-ideas enriching Phase-2 seeds.
Phase 5
Shipped
BDD generation + traceability
tc-bdd + tc-traceability: /tc:generate-bdd, /tc:review-bdd, /tc:traceability-map. Reads enriched test-ideas; emits Gherkin features tied to REQ-IDs with @req:/@cs: linkage.
Phase 6
Shipped
Playwright framework + strategic automation
tc-build-framework, tc-automation-plan, tc-automate, tc-test-data: lazy Playwright + TypeScript scaffolding, seven-factor automation scoring, generated suite + review, test data outside test code. The first executable artifacts.
Phase 7
Shipped
Execution + evidence + quality report
tc-run + tc-quality-report + tc-evidence: /tc:run, /tc:analyze-results, /tc:report, /tc:quality-gate. Per-run records; committed quality-report history.
Phase 8
Shipped
Continuous learning
tc-learning: governed lessons inbox; /tc:learn-from-failures, /tc:learn-from-exploration, /tc:learn-from-feedback, /tc:review-lessons, /tc:promote-lessons. Nothing promoted silently.
Phase 9
Shipped
Visual documentation
tc-visualize: eight /tc:diagram-* types, /tc:generate-infographic, and /tc:render-visuals — Mermaid sources plus deterministically rendered SVG/PNG.
Phase 10
Shipped
Web console MVP
tc-web: a read-only, team-facing viewer over the committed workspace — dashboard, journal, BDD viewer, run history, evidence. Renders the artifacts; never invents data.
Phase 10.5
Shipped
Controlled agent execution
tc-governance: the single governed-execution pipeline — intent, plan, permission policy, approval gate, bounded execution, output validation, audit. Default deny; no backdoor.
Phase 11
Shipped
Runtime API + MCP server
tc-mcp: an expanded Runtime API and a schema-first MCP server — alternative front-ends that drive Test Commander through the same governed pipeline. Seven permission levels enforced server-side.
Phase 12
Shipped
Sandboxed testing environment
tc-sandbox: on-demand, team-accessible environments launched from GitHub Actions via a provider abstraction (docker-compose-local MVP). Governed and safe-by-default targeting.
Phase 13· Current
Shipped
Continuous quality agent
tc-continuous-quality: watch changes, map impact, find coverage gaps, propose tests, and open labeled PRs — gated by five autonomy modes (0 advisor → 4 governed-autonomy). The final phase.

Implementation roadmap

How teams adopt Test Commander.

Adoption does not have to be all-or-nothing. Start with visibility, layer in requirements review, exploration, BDD, and automation, then graduate to a team console and finally to continuous, governed autonomy.

Stage 1
Quality visibility
A shared quality baseline.
Existing requirements, tests, defects, risks, and reports become a single living dashboard. No major process change — just the picture, made visible.
Stage 2
Requirements review
Stories become clearer and more testable.
Test Commander reviews stories before implementation for clarity, missing acceptance criteria, edge cases, data rules, and automation suitability. Quality shifts left through better questions, not more meetings.
Stage 3
Guided exploration
Exploratory testing becomes durable.
A tester points Test Commander at a target environment and explores. Observations, screenshots, risks, bugs, locator candidates, and test data needs join the quality knowledge base instead of someone's notebook.
Stage 4
BDD and test design
Test design becomes traceable to business intent.
Approved test ideas become BDD scenarios tied to requirements and risks. The team can see which stories have coverage and which edge cases are still missing.
Stage 5
Strategic automation
Playwright tests with rationale, not guesswork.
Automation candidates are scored on business criticality, repeatability, determinism, UI stability, and maintenance cost. Only the candidates worth automating become Playwright tests, page objects, fixtures, and test data.
Stage 6
Team web console
A quality command center the whole team sees.
Live dashboard, journal, BDD viewer, run history, evidence gallery, risk register. Testers explore, developers inspect traces, product owners answer questions, managers read the summary — one shared quality story.
Stage 7
Sandboxed workspaces
No-code testing environments on demand.
A pull request spins up a temporary Test Commander workspace with UI, runtime, uploaded docs, target URL, and artifact storage. Open a link, start testing. No local setup. No Playwright install on a tester's laptop.
Stage 8
Continuous self-improvement
The system gets better at helping the team test.
Lessons accumulate from requirements, code, tests, failures, and production defects. Candidate lessons are reviewed, accepted, rejected, or flagged for human review — then promoted into project guidance. The loop is governed, not silent.
Stage 9
Governed autonomy
Continuous monitoring with human-approved change.
Test Commander watches code, requirements, and pipelines. It analyzes impact, proposes coverage, runs approved suites, captures evidence, opens pull requests, and explains itself. Humans still approve the changes that matter.

Autonomy modes

How much should the agent be allowed to do on its own?

Phase 13 ships this as a concrete control: the configured autonomy mode is a ceiling on which permission levels the continuous agent may auto-approve in the governed pipeline. Five modes, cumulative — and destructive / admin never auto-approve at any mode.

Mode 0
Read-only advisor
Reads artifacts, maps impact, finds coverage gaps, and proposes tests. Auto-approves nothing — a pure advisor. The right place to start.
Mode 1
Assisted testing
Auto-approves safe-write work — analysis and proposed artifacts. Anything that writes code, runs tests, or opens a PR still waits for a human.
Mode 2
Approved execution
Adds execute-tests to what auto-approves: the agent may run designated suites in safe environments. It still cannot open a pull request.
Recommended default
Mode 3
Pull-request automation
Adds code-write and may open clearly-labeled pull requests — new BDD scenarios, generated tests, refreshed traceability. Humans review and merge.
Mode 4
Governed autonomy
Adds external-network targets. The broadest auto-approval — but destructive and admin actions are never auto-approved at any mode, and nothing auto-merges.

A mature workflow tends to settle at Mode 3. Test Commander runs continuously, but every change to test assets arrives as a clearly-labeled pull request a human can read, accept, or reject — and the agent never auto-merges.

Continuous quality agent mode

A living quality system, not a one-shot script.

Phase 13 ships the continuous quality agent: Test Commander watches the application and the delivery pipeline, responds to change, and produces evidence — continuously, transparently, and under the autonomy-mode approval rules above.

Continuously improving, human-governed quality automation.

When requirements change, code ships, or tests fail, the agent reacts. It analyzes impact, reviews updated stories, identifies coverage gaps, generates candidate scenarios, runs impacted suites, captures evidence, updates reports, and records lessons learned. Automatic observation, automatic analysis, automatic reporting. Human-approved implementation.

Code change detected
Impact analysis
Story and risk review
Coverage gap analysis
Generate candidates
Run impacted suite
Capture evidence
Open PR · learn

The same loop runs on pull requests, pushes, nightly schedules, release candidates, and manual dispatches. Read-only analysis happens automatically. Report updates happen automatically. Test execution happens automatically in safe environments. Generated changes arrive as pull requests. Core methodology improvements are proposed, reviewed, and promoted deliberately.

Sample PR comment · continuous-agent output

PR #428 · test-commander comment

Test Commander Analysis

Changed areas:
  - Checkout
  - Saved addresses
  - Payment error handling

Detected risks:
  - Saved address validation behavior changed
  - Payment failure scenario lacks automated coverage

Existing tests:
  - 12 checkout tests passing
  - 2 impacted tests failed
  - 1 flaky test detected

Recommended actions:
  - Clarify expected behavior for expired saved addresses
  - Add BDD scenario for payment timeout
  - Approve generated Playwright test candidate

Artifacts:
  quality report · screenshots · trace · coverage map

The agent says what it changed, what it analyzed, what it found, and what a human should look at next. No surprises, no silent edits.

What’s next

Hiring for AI-augmented QA, or building it yourself?

Test Commander is part of an ongoing body of work in AI-augmented software quality, Playwright automation, and human-guided agentic testing. If you are hiring — or your team is exploring practical AI-assisted testing — I would like to talk.

Test Commander

Test Commander turns human testing insight into structured automation and quality evidence.

You observe

Test Commander produces

01. Explore

02. Model

03. Specify

04. Automate

05. Execute

06. Report

07. Improve

Foundation · Phases 0–6

Now also shipped · Phases 7–13

tc-core

tc-requirements

tc-knowledge

tc-explore

tc-bdd

tc-traceability

tc-build-framework

tc-automation-plan

tc-automate

tc-test-data

tc-run

tc-quality-report

tc-evidence

tc-learning

tc-visualize

tc-web

tc-governance

tc-mcp

tc-sandbox

tc-continuous-quality

Manual testers

Automation engineers

QA managers

Recruiters and hiring managers

Clients

Human-guided, not fully autonomous

Universal cores, project-specific tuning

file:line provenance for every claim

Byte-deterministic re-runs

Exploration before automation

BDD as the bridge (Phase 5, shipped)

Deterministic tests for CI/CD (Phase 6, shipped)

Separate facts from interpretation

One governed execution path

Repo foundation

Workspace + artifact model

Requirements + user-story intelligence

Project knowledge ingestion

Exploratory testing

BDD generation + traceability

Playwright framework + strategic automation

Execution + evidence + quality report

Continuous learning

Visual documentation

Web console MVP

Controlled agent execution

Runtime API + MCP server

Sandboxed testing environment

Continuous quality agent

Quality visibility

Requirements review

Guided exploration

BDD and test design

Strategic automation

Team web console

Sandboxed workspaces

Continuous self-improvement

Governed autonomy

Read-only advisor

Assisted testing

Approved execution

Pull-request automation

Governed autonomy

Hiring for AI-augmented QA, or building it yourself?