TFactory demos — every lane, end to end

Each demo below is a real, unedited pipeline run on the user’s Claude subscription — not a mock. A developer hands TFactory a finished feature; the five-agent pipeline (Planner → Gen-Functional → Executor → Evaluator → Triager) plans, writes, sandboxes, scores, and triages a test suite, and emits the verdicts a reviewer would see on the PR.

Every demo is a single composite screencast: the Claude Code terminal (top-left), the TFactory portal (top-right), and the live triage report with real verdicts (bottom). Each one shows at least one passing test and one failing test caught by a deliberately seeded bug — proof the grader actually distinguishes good tests from bad.

These seven demos span the full v0.2 lane spine — browser (Playwright), unit (pytest + coverage + 3× stability + mutation), api (httpx against a running service), and polyglot (Python pytest and TypeScript Jest in one run) — to show TFactory tests far more than web pages, and grades them on real signals: it catches implementation bugs and rejects weak tests via mutation.

1 · Greeting generator — browser lane (Playwright)

What it tests: a deployed Vite + React SPA. TFactory writes Playwright tests that drive the UI against the live demo site and assert on the output panel.

Result: 4 accept + 1 reject. AC#5 (“two consecutive Generate clicks must differ”) is rejected — it caught a deliberately seeded memoisation cache bug.

Greeting generator browser-lane demo

Download MP4

2 · Failure → merge / dismiss — the human decision

What it shows: the human-in-the-loop close-out over a run with real failures — the reviewer merges the accepted tests and dismisses the rejected ones. This is the decision a reviewer makes when TFactory flags failing tests, not just passing ones.

Result: 5 accepted (committed) · 2 rejected (dismissed).

Failure to merge/dismiss flow demo

Download MP4

3 · Pricing helper — unit lane (pytest + mutation)

What it tests: a pure Python module (pricing.py). This is the lane a browser demo can’t show — the numeric signals: coverage delta, 3× stability re-runs, and mutate-and-check (mutmut). A test only passes if it kills the mutant.

Result: 4 accept — every generated test runs cleanly, is stable across 3 re-runs, and kills its mutation. No browser involved.

Python unit-lane demo

Download MP4

4 · Message board — fill a form, verify it holds the text

What it tests: a form page — type a name + message, click Post, and check the post list holds exactly what was typed (verbatim text, special characters, the author name). TFactory writes Playwright tests that fill the form and assert on the rendered result.

Result: 3 accept + 1 flag. The flagged test (“two posts both remain visible”) caught a seeded state bug where a new post replaces the previous one.

Form-fill browser-lane demo

Download MP4

5 · KV API gateway — api lane (httpx, no browser)

What it tests: a running REST service (FastAPI key-value gateway). TFactory writes httpx tests — import httpx, read TFACTORY_TARGET_URL, assert on response.status_code and response.json() — and runs them against the live service. Zero Playwright, zero browser. This is the proof that TFactory tests APIs, gateways, and service connections, not just web pages.

Result: 4 accept + 1 flag. The four contract tests pass; the flagged test (“a missing key must return 404”) caught a seeded contract bug — the gateway returns HTTP 200 with a null body instead of 404 — and the Evaluator flags it for human review as a regression guard once the gateway is fixed.

API gateway api-lane (httpx) demo

Download MP4

6 · Shipping brackets — edge-case / boundary hunting

What it tests: a shipping_cost(weight_g) tiered calculator. TFactory writes parametrised boundary tests — the exact weights at the edge of each bracket, where off-by-one bugs hide — and mutation confirms each test actually pins its bracket.

Result: 3 accept + 1 reject. The boundary test at exactly 500 g caught a seeded off-by-one (it charged $10 instead of $5); mutation killed on the good tests and survived on the buggy boundary — TFactory’s evidence the test found a real defect.

Edge-case / boundary unit-lane demo

Download MP4

7 · Polyglot — Python pytest and TypeScript Jest in one run

What it tests: a single project with a Python helper (tax.py) and a TypeScript helper (slugify.ts). From one handoff, TFactory’s Planner fans the spec across two languages and two frameworks — generating pytest tests for Python and Jest tests for TypeScript — and runs each in its own container.

Result: 5 accept + 2 reject, and the two rejects show off two different grading powers in one run:

  • caught a real bug — the TypeScript slugify doesn’t fold accented characters (Cafécaf, not cafe); the Jest ascii-fold test rejects it;
  • rejected a weak test — a passing Python test survived mutation (its -1 constant didn’t pin the boundary), so TFactory rejected it for insufficient assertion quality.

One tool, two languages, real bugs and weak-test detection.

Polyglot (Python pytest + TypeScript Jest) demo

Download MP4

How these were produced

All seven were generated by the /demo command, which drives a scenario end-to-end and refuses to publish a demo until an automated quality gate passes (multi-pane frame · real pipeline run · a pass and a fail in the report · web-embeddable output). Scenario definitions live in tests/fixtures/demo-scenarios/; the production scripts live in scripts/demo/.