Canary
QA a described user flow against a live web app the way Canary does: the agent drives a real (headless) browser through small, intent-named steps, and captures evidence at every step — a screenshot, console messages, and network activity — plus a Playwright…
Runs on Jetty's managed sandbox. No setup. Free for your first 10 runs.
Real runs, real outputs.
TodoMVC — add & complete a todo
Add two todos and complete one. 7/7 steps pass, with real assertions on the counter (2 → 1) and the completed state. Self-contained report + replay.py.
SauceDemo — login & add to cart
Log in to a demo store, add an item, assert the cart badge. 7/7 steps pass — login lands on the inventory page and the cart badge shows 1.
6 steps · start to finish.
- 1Step 1
Environment Setup
▶mkdir -p "{{results_dir}}/screenshots" python -c "import playwright; print('playwright', playwright.__version__)" || python -m pip install --quiet playwright python -m playwright install chromium 2>/dev/null || true SITE="{{target_url}}" [ -n "$SITE" ] && [ "$SITE" != "{{target_url}}" ] || { echo "ERROR: no target_url provided"; exit 1; } echo "QA target: $SITE" - 2Step 2
Explore, then Drive the Flow
▶First observe the target: fetch the page, note the real selectors for the elements the flow touches (don't guess — read the DOM). Then translate the plain-language {{flow}} into small, intent-named…
- 3Step 3
Write the Reusable Replay Script
▶Write {{results_dir}}/replay.py — a standalone Playwright script (no agent, no harness) that reproduces the exact flow and exits non-zero if any assertion fails. This is the artifact that runs in CI…
- 4Step 4
Build `report.html`
▶Write a self-contained {{results_dir}}/report.html (no external assets — inline the screenshots as base64). Include: the flow + target URL, the overall verdict (pass/fail), and for each step its…
- 5Step 5
Evaluate, Validate & Iterate (max 3 rounds)
▶Status · Criteria PASS · The flow drove ≥ 2 steps, evidence was captured for each (screenshot + console + the shared trace.zip + network.har), every assertion step ran a real check, report.html and…
- 6Step 6
Write Executive Summary
▶Write {{results_dir}}/summary.md: