← All runbooks
wizenheimer / canary★ Featured · worked examples

Canary

QA a described user flow against a live web app the way Canary does: the agent drives a real (headless) browser through small, intent-named steps, and captures evidence at every step — a screenshot, console messages, and network activity — plus a Playwright…

agent claude-codemodel claude-sonnet-4-6snapshot prism-playwrighteval programmatic6 stepsv1.0.0

Deploy Canary to your jetty.io

One-click installs this runbook into a collection on your Jetty account. You can run it from the Spot dashboard, schedule it, or pipe inputs in via the API.

Run time3–5 mins
Headline outputreport.html · replay.py

Runs on Jetty's managed sandbox. No setup. Free for your first 10 runs.

Worked examples · 3

Real runs, real outputs.

The shape of the run

6 steps · start to finish.

  1. 1
    Step 1

    Environment Setup

    mkdir -p "{{results_dir}}/screenshots"
    python -c "import playwright; print('playwright', playwright.__version__)" || python -m pip install --quiet playwright
    python -m playwright install chromium 2>/dev/null || true
    SITE="{{target_url}}"
    [ -n "$SITE" ] && [ "$SITE" != "{{target_url}}" ] || { echo "ERROR: no target_url provided"; exit 1; }
    echo "QA target: $SITE"
    

  2. 2
    Step 2

    Explore, then Drive the Flow

    First observe the target: fetch the page, note the real selectors for the elements the flow touches (don't guess — read the DOM). Then translate the plain-language {{flow}} into small, intent-named…

  3. 3
    Step 3

    Write the Reusable Replay Script

    Write {{results_dir}}/replay.py — a standalone Playwright script (no agent, no harness) that reproduces the exact flow and exits non-zero if any assertion fails. This is the artifact that runs in CI…

  4. 4
    Step 4

    Build `report.html`

    Write a self-contained {{results_dir}}/report.html (no external assets — inline the screenshots as base64). Include: the flow + target URL, the overall verdict (pass/fail), and for each step its…

  5. 5
    Step 5

    Evaluate, Validate & Iterate (max 3 rounds)

    Status · Criteria PASS · The flow drove ≥ 2 steps, evidence was captured for each (screenshot + console + the shared trace.zip + network.har), every assertion step ran a real check, report.html and…

  6. 6
    Step 6

    Write Executive Summary

    Write {{results_dir}}/summary.md: