Technical Specification

Historical record — v0.1.0-mvp technical spec (2026-05-28). Describes the v0.1 Functional-lane MVP architecture. The shipped product moved to the v0.2 modality spine (unit / browser / api / integration / mutation); see Architecture for current state.

Spec: TFactory MVP — Walking Skeleton (Functional Lane, Python) Parent: ../spec.md Source design: /home/olafkfreund/.claude/plans/virtual-cooking-bumblebee.md

Architecture summary

Six-agent pipeline mirroring the canonical Planner → Generator → Executor → Evaluator → Triager pattern. The Evaluator is structurally separate from the Generator (research-mandated for non-self-validation). At MVP only the functional lane is lit; the SAST / DAST / fuzz / mutation generators are not implemented, but the lane-tagged plan and dispatcher already account for them so phase 2-5 work is additive.

AIFactory finished branch
   |
   v
/handover-to-tfactory (Claude Code skill)
   -> mcp__tfactory__task_create_and_run
       {project_id, spec_id, branch, base_ref}
   |
   v
TFactory MCP server (stdio)
   -> POST /tasks (FastAPI backend on :3102)
   |
   v
Worker pulls task, creates ~/.tfactory/workspaces/.../{new spec_id}/
   snapshots AIFactory spec dir read-only into context/
   computes diff (base_ref..branch) into context/diff.patch
   runs project_analyzer -> context/project_analysis.json
   |
   v
Planner agent (Claude Agent SDK)
   -> emits test_plan.json (functional subtasks only at MVP)
   |
   v
Per-lane Generator (only Gen-Functional lit at MVP)
   -> generates pytest files into tests/functional/
   -> pre-flight static check (imports + methods resolve)
   -> flake-risk lint
   -> retries via planner replan on hallucination
   |
   v
Executor (shared)
   -> docker run --rm --network=none \
        -v <repo>:/work:ro \
        -v <scratch>:/scratch:rw \
        tfactory-runner-python \
        pytest --cov=<target_pkg> /scratch/tests
   -> collects junit.xml + coverage.xml + stdout/stderr
   |
   v
Evaluator (separate agent)
   -> coverage delta vs base_ref
   -> flake-lint score
   -> 3x stability re-run
   -> LLM semantic relevance judgment (per test)
   -> mutate-and-check sanity probe
   -> per-test verdict { accept | reject | flag } + rationale
   |
   v
Triager
   -> dedup, rank
   -> render report.md + report.json
   -> git commit accepted tests on AIFactory feature branch
   -> gh pr comment <pr> --body REPORT

Inputs / outputs (contracts)

Inbound: mcp__tfactory__task_create_and_run

{
  "project_id": "string",
  "spec_id": "string",
  "branch": "string",
  "base_ref": "string",
  "confirm": true
}

Outbound: report (markdown rendered, JSON stored)

{
  "task_id": "string",
  "spec_id": "string",
  "lane_results": {
    "functional": {
      "tests_generated": 17,
      "tests_accepted": 14,
      "tests_rejected": 2,
      "tests_flagged": 1,
      "coverage_delta_pct": 6.3,
      "flake_warnings": ["tests/functional/test_oauth.py::test_lookup uses dict iteration order"],
      "hallucination_replans": 1,
      "mutate_probe_killed": true
    }
  },
  "git": {
    "commit_sha": "abc1234",
    "files_added": ["tests/functional/test_login.py", "..."],
    "pr_comment_url": "https://github.com/..."
  },
  "phase2_gap_notice": "Mutation gating not yet active; trivial-test risk remains until phase 2."
}

Component-level technical detail

Hard-fork scaffold

MCP server (apps/backend/mcp_server/tfactory_server.py)

MVP tool surface (subset of AIFactory’s):

.mcp.json updated to point at scripts/start-tfactory-mcp.sh which invokes python -m apps.backend.mcp_server.tfactory_server.

Handover skill

TFactory/.claude/skills/handover-to-tfactory/SKILL.md:

Companion skill on AIFactory side: AIFactory/.claude/skills/handover-to-tfactory/SKILL.md is the user-facing skill that lives in AIFactory’s repo (since the slash command is typed while working in an AIFactory-tracked project). It MCP-calls TFactory’s server. This is the only AIFactory-side change in MVP.

Workspace + state model

Planner agent

Gen-Functional agent

Docker executor (apps/backend/tools/runners/docker_runner.py)

Dockerfile (docker/runners/python.Dockerfile)

Evaluator agent

Triager + git side-effects

Portal retheme

Configuration & environment

Dependencies and provenance

Out-of-scope reminders (for clarity)

Open implementation questions to resolve during build

  1. Container runtime: docker vs podman. NixOS-rootless podman is the user’s natural choice; verify both work and document. Default docker since the Python docker SDK is the most universal.
  2. Test-dir detection heuristic for projects without a tests/ directory — fall back to creating one? Or refuse and report? Recommendation: create tests/ and note in report.
  3. Whether to commit tests/_tfactory/REPORT.md to the branch alongside source tests. Recommendation: yes, commit it — durability + grep-ability outweighs the noise.
  4. How to handle multi-package monorepos at MVP. Recommendation: scope detection to the diff’s directory subtree; reject (with clear error) if diff spans multiple top-level packages.