Human overview · for understanding
A reusable robot QA tester that looks at real pixels and tells the truth · 2026-06-22
A reusable robot QA tester that looks at real pixels and tells the truth
Master summary — the gist in 30 seconds
Input: a proven-but-throwaway live test + a handoff spec. Output: a global skill (engine + judge + live report), a smoke proof, a drop-in ClientsFlow config, and a ready-to-paste retest prompt.
flowchart LR A[Proven one-off test] --> B[Generalize into a skill] B --> C[Smoke-prove on a demo page] C --> D[Wire for ClientsFlow retest] D --> E[Paste the prompt - run all 23]
Input: a web app + what each screen SHOULD look like. Output: a live HTML report (watchable on localhost) where every screenshot gets a Gemini read + a Claude double-check, tagged honestly.
flowchart TD
S[Take screenshot] --> G[Gemini: 5-sentence read + confidence]
G --> Q{PASS and confident?}
Q -- yes --> T[Trust it - move on]
Q -- no/unsure --> C[Claude reads the pixels + decides]
C --> V[Verdict: OK / fix / blocked]
V --> N[Next step]
N --> S
Input: example.com + a deliberately-flawed expectation. Output: Gemini cried 'BUG', Claude looked at the actual pixels and corrected it to 'expectation was wrong' — and a confident clean frame was trusted without a second look.
flowchart LR
subgraph Frame1[Frame 1 - my expectation was off]
A[Gemini: BUG conf 1.0] --> B[Claude reads pixels] --> C[EXPECTATION_WRONG]
end
subgraph Frame2[Frame 2 - clearly fine]
D[Gemini: PASS conf 1.0] --> E[Trusted - no second look]
end
Input: the generic skill (no live-app safety) + the ClientsFlow specifics. Output: a config file (URL, judge rules, selectors) + a hard send-gate that raises an error if a recipient isn't a ZZ test address.
flowchart TD
X[Run wants to send an email] --> Y{Recipient is a ZZ test address?}
Y -- yes --> Z[Send + verify it landed]
Y -- no --> W[SendBlocked - STOP]
Input: hundreds of frames over 23 scenarios. Output: a lean run — disk holds the truth, only uncertain frames are 'looked at', and context is compacted at scenario boundaries, never mid-task.
flowchart LR R[(log.json + frames on disk = truth)] --> A[Read a PNG only if uncertain] A --> B[Compact at scenario boundary] B --> C[Resume from first un-judged step] C --> R
Input: the finalized retest prompt + config. Output (when you run it): a master report of all 23 scenarios with honest tags, plus a separate list of any new product findings.
timeline title From here Run it : paste prompt in a new chat : walk Scenario 1 to 23 Watch for : live-call may be BLOCKED : not a bug, an honest tag Then : review the master report : triage new product findings