wizenheimer / canary★ Featured · worked examples

Canary

QA a described user flow against a live web app the way Canary does: the agent drives a real (headless) browser through small, intent-named steps, and captures evidence at every step — a screenshot, console messages, and network activity — plus a Playwright…

agent claude-codemodel claude-sonnet-4-6snapshot prism-playwrighteval programmatic6 stepsv1.0.0

Deploy Canary to your jetty.io

One-click installs this runbook into a collection on your Jetty account. You can run it from the Spot dashboard, schedule it, or pipe inputs in via the API.

Deploy on jetty.io →View source

Run time3–5 mins

Headline outputreport.html · replay.py

Runs on Jetty's managed sandbox. No setup. Free for your first 10 runs.

Worked examples · 3

Real runs, real outputs.

qacrud-floweval ✓

TodoMVC — add & complete a todo

Add two todos and complete one. 7/7 steps pass, with real assertions on the counter (2 → 1) and the completed state. Self-contained report + replay.py.

report.htmlreplay.pysteps.jsonvalidation_report.json

claude-sonnet-4-6 · 3 minView run →Source

qaauth-cart-floweval ✓

SauceDemo — login & add to cart

Log in to a demo store, add an item, assert the cart badge. 7/7 steps pass — login lands on the inventory page and the cart badge shows 1.

report.htmlreplay.pysteps.jsonvalidation_report.json

claude-sonnet-4-6 · 3 minView run →Source

qaauth-floweval ✓

the-internet — login & logout

report.htmlreplay.pysteps.jsonvalidation_report.json

claude-sonnet-4-6 · 3 minView run →Source

The shape of the run

6 steps · start to finish.

Step 1

Environment Setup

▶

mkdir -p "{{results_dir}}/screenshots"
python -c "import playwright; print('playwright', playwright.__version__)" || python -m pip install --quiet playwright
python -m playwright install chromium 2>/dev/null || true
SITE="{{target_url}}"
[ -n "$SITE" ] && [ "$SITE" != "{{target_url}}" ] || { echo "ERROR: no target_url provided"; exit 1; }
echo "QA target: $SITE"

2
Step 2
Explore, then Drive the Flow
▶
First observe the target: fetch the page, note the real selectors for the elements the flow touches (don't guess — read the DOM). Then translate the plain-language {{flow}} into small, intent-named…
3
Step 3
Write the Reusable Replay Script
▶
Write {{results_dir}}/replay.py — a standalone Playwright script (no agent, no harness) that reproduces the exact flow and exits non-zero if any assertion fails. This is the artifact that runs in CI…
4
Step 4
Build `report.html`
▶
Write a self-contained {{results_dir}}/report.html (no external assets — inline the screenshots as base64). Include: the flow + target URL, the overall verdict (pass/fail), and for each step its…
5
Step 5
Evaluate, Validate & Iterate (max 3 rounds)
▶
Status · Criteria PASS · The flow drove ≥ 2 steps, evidence was captured for each (screenshot + console + the shared trace.zip + network.har), every assertion step ran a real check, report.html and…
6
Step 6
Write Executive Summary
▶
Write {{results_dir}}/summary.md:

Inputs

Target URLurlrequired

Where the flow starts.

Flowtextrequired

The user journey to QA in plain language, with the checks that must hold (visible text / URL / state / no console error).

Credentialstext

Optional login creds, e.g. user=...,pass=... .

Dependencies

playwright (Python) + Chromium · required · Runtime

Required outputs

{{results_dir}}/report.html
Self-contained QA report: per-step status, the inline screenshot of each step, console errors, a network summary, and the overall verdict. Open it, commit it, send it.
{{results_dir}}/replay.py
The reusable Playwright script that reproduces the flow exactly — re-runnable in CI with no agent cost.
{{results_dir}}/steps.json
Structured per-step results: name, action, check, status, screenshot path, console errors.
{{results_dir}}/trace.zip
The Playwright trace for the whole session (open with `playwright show-trace`).
{{results_dir}}/console.log
All browser console messages captured during the run.
{{results_dir}}/network.har
The network HAR for the session.
{{results_dir}}/summary.md
Executive summary: flow, verdict, steps passed/failed, the single most important finding.
{{results_dir}}/validation_report.json
Stage-by-stage validation with `overall_passed`. See Step 5.

Origin

source: github.com
title: Canary — QA harness for Claude Code
attr: high

Original →

Canary

Deploy Canary to your jetty.io

Real runs, real outputs.

TodoMVC — add & complete a todo

SauceDemo — login & add to cart

the-internet — login & logout

6 steps · start to finish.

Environment Setup

Explore, then Drive the Flow

Write the Reusable Replay Script

Build `report.html`

Evaluate, Validate & Iterate (max 3 rounds)

Write Executive Summary