v0.2.0 · 5 lanes + evidence capture · Credential Broker — vault-backed cloud auth · runs on any LLM

Autonomous tests, AI-graded.

Hand TFactory a finished feature on a branch — from AIFactory, Claude Code, or any tool, via the MCP control plane or a plain acceptance-criteria file (markdown / Gherkin / EARS). The agent pipeline plans, writes, sandboxes, scores, and commits the suite — autonomously — then posts a triage report to your PR.

See the demo →   v0.2.0 release ↗   Design plan →

The five-agent pipeline

spec → graded tests · v0.2
01

Planner

test_plan.json
02

Gen-Functional

tests/*
03

Executor

docker sandbox
04

Evaluator

verdicts.json
05

Triager

triage_report.md

How it works

New since v0.2.0 — connect to anything

Two capabilities make TFactory usable beyond a single laptop and a single model: it can now authenticate to your cloud and run on whatever LLM you already pay for.

Status by lane

v0.2 replaced the v0.1 pipeline-stage decomposition (Functional / SAST / DAST / Fuzz / Mutation) with a modality-based spine per Decision 2. Security scanning is delegated to dedicated security pipelines and is out of scope here — TFactory focuses on functional + feature testing.

Lane v0.2.0 status Runtime Coverage Evidence captured
Unit ✅ Active tfactory-runner-pytest + tfactory-runner-jest line (cobertura / lcov)
Browser ✅ Active tfactory-runner-playwright + AppRuntime (docker-compose + health-poll) null (per Decision 11 — line coverage doesn’t apply when the test drives the browser) screenshots · video · trace.zip
API ✅ Active per-framework Docker image + HTTP HAR recorder line where the test exercises framework code network.har
Integration ✅ Active per-framework Docker image + AppRuntime (multi-service compose) line where applicable network.har · service logs
Mutation ✅ Active mutmut (Python) / Stryker (TypeScript) — one-mutation-per-run probe in the Evaluator reported per mutant (killed / survived)

All five lanes are wired and ship with v0.2.0. Each subtask’s lane is chosen by the Planner based on its (language, framework) via the framework registry; reviewers see the lifecycle phases (executor_app_running, app_not_healthy, etc.) in the LaneStatusGrid and the per-test evidence in the Triager PR comment.

The v0.2 design doc enumerates a longer “future-ramp” set (Go / Rust / Ruby support, additional security pipelines via integration) — those hook into the existing 5-lane spine through new FrameworkDescriptors and don’t require lane additions.

Quickstart

# Clone + bootstrap (NixOS / flake-based)
git clone https://github.com/olafkfreund/TFactory
cd TFactory
nix develop
tfactory-minimal-venv   # creates apps/backend/.venv
tfactory-test           # 531 backend tests, ~10s

Full walkthrough in the repo README plus the end-to-end smoke guide for running real scenarios against an AIFactory project.

Documentation

Tracking