CICD Redesign — code truth on every PR, site truth on every run
Tonight a practice site went URL → live through seventeen of the factory's own checkpoints. GitHub's checks contributed nothing — while every PR shows two red ✗'s from a gate inspecting artifacts the factory no longer produces, and every merge is an admin override. The redesign gives each kind of truth its own gate, and retires the one guarding a world we left. v1.1 — RATIFIED 2026-07-04: Q1 investigated → delete outright; Q2 the cut is on. Fleet layer deferred. Slice 1 executing.
Fleet layer → later (your ruling)
PR CI = code truth only
Site truth = the assembly line, per run
First PR through the new CI = the door work
Today's CI — verified from the logs, not memory
Every PR runs ~10 jobs. Two are standing-red on every PR regardless of content:
cutover-gate re-verifies the legacy pipeline's stored site artifacts for a canary site. Its own log: route universe incomplete, missing crawl substrate, 155 failures, nearly every probe skipped. The world moved to the One Door; this gate still inspects the old loading dock. Its "skip when irrelevant" classifier didn't even fast-pass a config-only PR.
readiness (governance) dies on a network fetch — a curl that retries for two minutes and gives up. An environment dependency, not a code verdict.
Result: merges #1846, #1847, #1848 all went in by admin override. A permanent override culture is gate erosion — the red ✗ stops meaning anything, and one day it hides a real one.
↘ go deeper — the exact verified failure chain
validate.yml (40KB) triggers on every PR + push to main. cutover-gate (line 382): GATE_CUTOVER_SITE_ID=fairfax-vet, runs gate-cutover.sh. Verdict chain: deploy-hygiene:PASS → deploy-completeness:FAIL → 23 probes SKIP → a11y-focus-order:FAIL → capture-integrity:FAIL → serve-time-containment:FAIL. Plus "Integration preservation proof written with blockers: ga4:FAIL, gtm:FAIL" and "compiler packet non-launch-grade: route-universe-incomplete (MISSING_CRAWL_SUBSTRATE, BLOCK_CAPTURE_INCOMPLETE, BLOCK_BUILD_INCOMPLETE, BLOCK_ROUTE_SHRINKAGE)". failures = 155. Governance: curl --fail --retry 3 --retry-all-errors at 05:21:44 → exit 1 at 05:23:54; the fetched URL gets pinned down in Slice 2, not guessed. Meanwhile the real verification moved into the product: 18 assembly-line stages per run, the produced repo's own gates, the versioning spine's reconciliation gate on every edit. An assembly-line.yml workflow already exists in CI — manual-trigger only, wired to nothing.
The cut — follow one change through the new pipeline
A PR opens. CI asks exactly one question: is the factory code sound? Typecheck (now covering the assembly line itself — unchecked today), units, contracts, form-intake, provenance trust-roots. Fast, deterministic, no network, no stored sites.
The old gate is gone from this path. cutover-gate leaves PR CI — removed, not patched. Its end-to-end value returns as a hermetic fixture-door (Slice 4): the real assembly line run against a tiny committed fixture site, no network. Until then PR CI has no end-to-end — accepted interim, because every real deploy still passes the 18 live stages, and a red gate everyone overrides protects less than a missing gate everyone knows is missing.
Green means green. Branch protection requires exactly the checks that can honestly pass. Merges happen with zero overrides. A deliberate break must go red before we trust the new set (anti-vacuity, proven, not assumed).
A site ships through the assembly line, not through CI. The 18-stage run IS the deploy pipeline — capture integrity, byte-identity, provenance, walk. The homeless pieces get homes here: the walk's browser-environment fix, the content audit, digest-verify.
The first PR through the new CI is the door work itself — landing the One Door + versioning spine onto main through honest greens, not another override.
Your instinct ("doubt it but please look"), verified: nothing automatic ships through the legacy path (all legacy workflows are manual-trigger; main pushes deploy nothing); the CI gate's canary artifacts are gone from the repo, so it can only fail; and the one live legacy-class site — digitalempathyvet.com itself, redeployed 2 days ago — is guarded by its own deploy-run gates + proven rollback anchor, not by this job. A scheduled canary would be red-forever on the same missing substrate, so no canary either.
★ Q2 — the cut → RATIFIED
PR CI goes code-truth-only; the hermetic fixture-door lands as Slice 4 with the interim gap accepted as named in Risks below.
Q3 — sequence → unopposed, proceeding
Stop-the-bleeding first (delete cutover-gate + require the green set), then the governance pin-down, then typecheck, then fixture-door, then Site-CD formalization.
Build slices — small diffs, each independently revertable
1 · Stop the bleeding. Remove cutover-gate from the PR path (keep manual pending Q1). Required checks := the honestly-green set.
2 · Governance pin-down. Reproduce the failing curl, identify what it fetches; vendor it or drop it. Proof: green twice on an unchanged PR.
3 · Typecheck slice. CI typecheck finally covers assembly-line.ts. Proof: a planted type error goes red.
4 · Hermetic fixture-door. The assembly line on a committed 3-page fixture, no network — also fixes the walk's browser-resolution break from tonight. Proof: a planted leak fails the job.
5 · Site-CD formalization. assembly-line.yml becomes the canonical per-site run; absorbs the walk env fix, the content audit (independent re-verify first), digest-verify.
What could go wrong (honest)
Interim end-to-end gap. Between Slice 1 and Slice 4, a factory regression only an end-to-end would catch could merge. The per-run site gates still catch it at the next real deploy; the fixture-door is Slice 4, not "someday."
Governance may guard something real. The pin-down may reveal a dependency worth keeping — then it gets vendored, not deleted. No silent weakening.
Q1 answered wrong either keeps dead weight or retires a live guard — which is why it's your call, with the facts laid out.
No network in PR CI. And no gate ships until it's been shown to go red on a deliberate break — a check that can't fail protects nothing.