v0.9 · five-lane pipeline · reproducible Nix execution · visible screenshot + recording evidence · acceptance-criteria fidelity · MFA-authenticated testing · runs on any LLM

Autonomous tests, AI-graded.

Hand TFactory a finished feature on a branch — from AIFactory, Claude Code, or any tool, via the MCP control plane or a plain acceptance-criteria file (markdown / Gherkin / EARS). The agent pipeline plans, writes, sandboxes, scores, and commits the suite — autonomously — grades every acceptance criterion against a test that actually ran, and posts a triage report to your PR.

See the demo Design plan

0 Tasks shipped
0 Backend tests
0 Frontend tests
0 AI agents wired
0 E2E scenarios

Planner

test_plan.json

Gen-Functional

tests/*

Executor

docker sandbox

Evaluator

verdicts.json

Triager

triage_report.md

Part of the Factory family — a governed, verified, observable autonomous software factory. PFactory plans, AIFactory builds, TFactory verifies, CFactory watches over all four. See Why Factory.

How it works

Spec-aware handover

A Claude Code session in your AIFactory repo runs /handover-to-tfactory, or any tool posts acceptance criteria through the MCP control plane. TFactory snapshots the signed contract and the deployed URL, runs five agents, and returns a verdicts report.
Two-layer guardrails

A pre-flight static check confirms every import resolves. Flake-risk lint catches dict-iteration order, time.sleep, and datetime.now without a freeze.
Five-signal verdicts

Coverage delta, a 3x stability re-run, a mutate-and-check probe, flake-lint promotion, and LLM semantic relevance. Tests that survive a mutation do not ship.
Dry-run by default

Per the no-auto-push policy, the git writer and PR commenter record their commands without executing. Operators opt in explicitly.

Evidence you can see

A green checkmark is not proof. For interactive acceptance criteria, the browser lane runs in a reproducible per-task Nix toolchain inside an ephemeral Kubernetes Job (RFC-0005 Tier A), drives the real deployed app, and captures a screenshot of the rendered page plus a recording of the test driving it. The Acceptance tab grades each criterion against a test that actually passed — an honest “verified X/Y”, never a blanket “done”:

The Acceptance tab — verified 5/5 acceptance criteria, each linked to its evidence

The Evidence tab renders the captured recordings and screenshots inline, so a reviewer can watch the test execute and look at the page it produced:

The Evidence tab — browser-lane recordings and screenshots

The whole pipeline — Plan, Generate, Execute, Report — is a live view in the portal, and the same evidence appears on the finished task in the CFactory cockpit:

The TFactory pipeline view — Plan, Generate, Execute, Report

Reach anything under test — including behind MFA

Authenticated and 2FA targets

The .tfactory.yml auth schema covers form, API-token, basic-auth and TOTP two-factor credentials with an ordered login-step flow. For MFA we do not bypass anything: the pipeline provisions a disposable identity provider, owns the OTP secret, generates valid RFC-6238 codes at fill time, captures the authenticated page, and tears the IdP down — zero production credentials. See how
Credential Broker

Resolve secrets from a vault (Azure KV, AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) or local sops / age / agenix, materialise them ephemerally (0600, wiped per task), gated by an explicit egress opt-in with an honest manifest. See how
Kubernetes and SaaS

Log-in-once browser sessions (storageState), type: kubernetes port-forward targets, and first-class type: connector SaaS targets (ServiceNow / Salesforce / SAP / MuleSoft). See examples

A governed node in the Factory line

Governed pickup from PFactory

TFactory enqueues governed test targets from PFactory, parses the planned acceptance contract as the test oracle, then generates, runs, and reports back up the spine. The contract — signed, with the deployed URL — travels with the work.
One completion event

The Triager emits a normalized RFC-0001 completion event with a shared correlation_key, delivered at-least-once via a durable outbox and idempotency key, so the whole line speaks one schema and CFactory watches a single contract. See the envelope
In the Backstage catalog

TFactory ships a catalog-info.yaml plus TechDocs and is importable into Backstage, with enriched annotations and an AI-assistant skill descriptor — discoverable alongside the rest of the Factory.

Status by lane

The lane spine is modality-based (Decision 2). Security scanning is delegated to dedicated security pipelines and is out of scope here — TFactory focuses on functional and feature testing.

Lane	Status	Runtime	Coverage	Evidence captured
Unit	Active	`tfactory-runner-pytest` + `tfactory-runner-jest`	line (cobertura / lcov)	—
Browser	Active	Nix toolchain in a k8s Job (Playwright); host fallback where applicable	n/a (line coverage doesn’t apply when the test drives the browser)	screenshots, video, trace
API	Active	per-framework image + HTTP HAR recorder	line where the test exercises framework code	network.har
Integration	Active	per-framework image + AppRuntime (multi-service)	line where applicable	network.har, service logs
Mutation	Active	`mutmut` (Python) / Stryker (TypeScript) — one-mutation-per-run probe in the Evaluator	reported per mutant (killed / survived)	—

Each subtask’s lane is chosen by the Planner from its (language, framework) via the framework registry; reviewers see the lifecycle phases in the LaneStatusGrid and the per-test evidence in the Triager PR comment. New languages and additional pipelines hook into the same five-lane spine through new FrameworkDescriptors — no lane additions required.

Quickstart

# Clone and bootstrap (NixOS / flake-based)
git clone https://github.com/olafkfreund/TFactory
cd TFactory
nix develop
tfactory-minimal-venv   # creates apps/backend/.venv
tfactory-test           # backend suite, seconds

Full walkthrough in the repo README plus the end-to-end smoke guide for running real scenarios against an AIFactory project.

Documentation

Creating tests

Three ways in — a spec/issue, the portal wizard, or a handover — and every parameter.
Architecture

Directory structure, workspace layout, dataflow.
Design plan

Locked decisions, landscape research, risk register.
Showcase

The pipeline in action, with real captured evidence.
Technical spec

Per-component implementation detail.
Credentials and MFA

The Credential Broker, authenticated targets, and 2FA testing.
Test coverage

The TDD plan: unit / integration / e2e pyramid.
Progress

The live build log: closed tasks and commits.
Changelog

Release notes and history.

Tracking

Epic and sub-issues — github.com/olafkfreund/TFactory/issues
Source — github.com/olafkfreund/TFactory
Sister project (upstream) — github.com/olafkfreund/AIFactory
License — MIT OR GPL-3.0

Autonomous tests, AI-graded.

How it works

Spec-aware handover

Two-layer guardrails

Five-signal verdicts

Dry-run by default

Evidence you can see

Reach anything under test — including behind MFA

Authenticated and 2FA targets

Credential Broker

Kubernetes and SaaS

A governed node in the Factory line

Governed pickup from PFactory

One completion event

In the Backstage catalog

Status by lane

Quickstart

Documentation

Tracking