Autonomous tests, AI-graded.
Hand TFactory a finished feature on a branch — from AIFactory, Claude Code, or any tool, via the MCP control plane or a plain acceptance-criteria file (markdown / Gherkin / EARS). The agent pipeline plans, writes, sandboxes, scores, and commits the suite — autonomously — then posts a triage report to your PR.
- 0 Tasks shipped
- 0 Backend tests
- 0 Frontend tests
- 0 AI agents wired
- 0 E2E scenarios
The five-agent pipeline
spec → graded tests · v0.2Planner
Gen-Functional
Executor
Evaluator
Triager
How it works
-
Spec-aware handover
A Claude Code session in your AIFactory repo runs
/handover-to-tfactory. TFactory snapshots the spec + diff, runs four agents, returns a verdicts report. -
Two-layer guardrails
Pre-flight static-checks every
importresolves. Flake-risk lint catches dict-iteration order, time.sleep, datetime.now without freeze. -
5-signal verdicts
Coverage delta · 3× stability re-run · mutate-and-check probe · flake-lint promotion · LLM semantic relevance. Survived-mutation tests don't ship.
-
Dry-run by default
Per
CLAUDE.mdno-auto-push policy: git_writer + gh pr comment record argvs without executing. Operators opt in via env vars.
New since v0.2.0 — connect to anything
Two capabilities make TFactory usable beyond a single laptop and a single model: it can now authenticate to your cloud and run on whatever LLM you already pay for.
-
Credential Broker
Problem: agents need real cloud/K8s/API credentials to test against live services, but secrets must never touch the repo.
Solution: resolve secrets from a vault (Azure KV · AWS Secrets Manager · GCP Secret Manager · HashiCorp Vault) or local sops/age/agenix, materialise them ephemerally (0600, wiped per task), gated by an explicit egress opt-in with an honest manifest. See how →
-
Run on any LLM
Problem: teams are locked to one provider, or can't send code to a managed cloud at all.
Solution: a model-string-driven provider factory — Claude SDK, OpenAI Codex, Gemini CLI, GitHub Copilot CLI, Ollama (local), and any OpenAI-compatible endpoint (vLLM / LM Studio / OpenRouter…). Per-phase routing + an honest data-egress badge for air-gapped / BYO-LLM runs.
Status by lane
v0.2 replaced the v0.1 pipeline-stage decomposition (Functional / SAST / DAST / Fuzz / Mutation) with a modality-based spine per Decision 2. Security scanning is delegated to dedicated security pipelines and is out of scope here — TFactory focuses on functional + feature testing.
| Lane | v0.2.0 status | Runtime | Coverage | Evidence captured |
|---|---|---|---|---|
| Unit | ✅ Active | tfactory-runner-pytest + tfactory-runner-jest |
line (cobertura / lcov) | — |
| Browser | ✅ Active | tfactory-runner-playwright + AppRuntime (docker-compose + health-poll) |
null (per Decision 11 — line coverage doesn’t apply when the test drives the browser) |
screenshots · video · trace.zip |
| API | ✅ Active | per-framework Docker image + HTTP HAR recorder | line where the test exercises framework code | network.har |
| Integration | ✅ Active | per-framework Docker image + AppRuntime (multi-service compose) | line where applicable | network.har · service logs |
| Mutation | ✅ Active | mutmut (Python) / Stryker (TypeScript) — one-mutation-per-run probe in the Evaluator |
reported per mutant (killed / survived) | — |
All five lanes are wired and ship with v0.2.0. Each subtask’s lane is
chosen by the Planner based on its (language, framework) via the
framework registry; reviewers see the lifecycle
phases (executor_app_running, app_not_healthy, etc.) in the
LaneStatusGrid and the per-test evidence in the Triager PR comment.
The v0.2 design doc enumerates a longer “future-ramp” set (Go / Rust /
Ruby support, additional security pipelines via integration) — those
hook into the existing 5-lane spine through new
FrameworkDescriptors and don’t require lane additions.
Quickstart
# Clone + bootstrap (NixOS / flake-based)
git clone https://github.com/olafkfreund/TFactory
cd TFactory
nix develop
tfactory-minimal-venv # creates apps/backend/.venv
tfactory-test # 531 backend tests, ~10s
Full walkthrough in the repo README plus the end-to-end smoke guide for running real scenarios against an AIFactory project.
Documentation
-
🏗️
Architecture
Directory structure, workspace layout, dataflow.
-
🧭
Design plan
10 locked decisions, landscape research, risk register.
-
📜
Spec
Agent OS spec: overview, user stories, deliverables.
-
🔧
Technical spec
Per-component implementation detail.
-
🧪
Test coverage
TDD plan: unit / integration / e2e pyramid.
-
📋
Tasks
All 12 MVP tasks, dependency graph, GitHub issues.
-
📈
Progress
Live build log, closed tasks + commits.
-
⚡
Changelog
v0.2.0 release notes (16 task summaries), v0.1.0-mvp history, sharp edges.
Tracking
- Epic + sub-issues → github.com/olafkfreund/TFactory/issues
- Source → github.com/olafkfreund/TFactory
- Sister project (upstream) → github.com/olafkfreund/AIFactory
- License → MIT OR GPL-3.0