← All runbooks
jettyio / posthog-trace-optimizer

PostHog Trace Optimizer

Analyze a project's PostHog LLM-analytics data the way a cost-and-quality engineer would, and hand back a recommendations.md report an engineering team can act on this week. The agent queries PostHog (HogQL over $ai_generation events) for aggregate…

agent claude-codemodel anthropic/claude-sonnet-4.6snapshot python312-uveval rubric11 stepsv1.0.0

Deploy PostHog Trace Optimizer to your jetty.io

One-click installs this runbook into a collection on your Jetty account. You can run it from the Spot dashboard, schedule it, or pipe inputs in via the API.

The shape of the run

11 steps · start to finish.

  1. 1
    Step 1

    Environment Setup & Connectivity Check

    Install deps, write the creds to {{results_dir}}/.env for reproducibility, and prove the connection before doing anything else. If you cannot connect, STOP — this is a live-data task and every downstream step depends on it.

    mkdir -p "{{results_dir}}/data" "{{results_dir}}/figures"
    pip install -q requests litellm pandas matplotlib tabulate
    
    cat > "{{results_dir}}/.env" <<EOF
    POSTHOG_API_KEY=${POSTHOG_API_KEY}
    POSTHOG_PROJECT_ID=${POSTHOG_PROJECT_ID}
    POSTHOG_HOST=${POSTHOG_HOST}
    EOF
    
    [ -n "$POSTHOG_API_KEY" ] && [ -n "$POSTHOG_PROJECT_ID" ] && [ -n "$POSTHOG_HOST" ] \
      || { echo "ERROR: missing PostHog credentials"; exit 1; }
    echo "PostHog host: $POSTHOG_HOST  project: $POSTHOG_PROJECT_ID"
    
    # Connectivity gate — STOP IMMEDIATELY IF THIS FAILS.
    import os, requests
    HOST = os.environ["POSTHOG_HOST"].rstrip("/")
    PID  = os.environ["POSTHOG_PROJECT_ID"]
    KEY  = os.environ["POSTHOG_API_KEY"]
    HEADERS = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}
    
    def hogql(sql: str):
        """Run a HogQL query against the project and return (columns, rows)."""
        r = requests.post(f"{HOST}/api/projects/{PID}/query/",
                          headers=HEADERS, json={"query": {"kind": "HogQLQuery", "query": sql}},
                          timeout=120)
        r.raise_for_status()
        j = r.json()
        return j.get("columns", []), j.get("results", [])
    
    # Auth + sanity: are there any $ai_generation events at all?
    cols, rows = hogql(
        "SELECT count() FROM events WHERE event = '$ai_generation' "
        f"AND timestamp >= now() - INTERVAL {int('{{window_days}}')} DAY")
    n = rows[0][0] if rows else 0
    print(f"PostHog connection OK — {n} $ai_generation events in window")
    assert n and n > 0, "No $ai_generation events found — is LLM analytics instrumented on this project?"
    

    If the request 401/403s, or there are zero $ai_generation events, do not continue: report the connection/instrumentation failure in summary.md and exit. Do not fabricate analysis against data you could not read.

    PostHog $ai_generation property cheat-sheet (used throughout): $ai_model, $ai_provider, $ai_input_tokens, $ai_output_tokens, $ai_total_cost_usd, $ai_latency (seconds), $ai_trace_id, $ai_span_name, $ai_input (prompt JSON), $ai_output_choices (completion JSON), $ai_is_error, $ai_http_status.


  2. 2
    Step 2

    Fetch Live Model Pricing (provider-filtered, cross-check)

    PostHog computes $ai_total_cost_usd itself, but its map can lag new models (cost shows null). Fetch current pricing from LiteLLM, filtered by the litellm_provider field — not by matching model-name…

  3. 3
    Step 3

    Aggregate Metrics — Cost & Volume by Operation and Model

    Use HogQL for cheap aggregates over the window. PostHog's $ai_span_name is the analog of Langfuse's trace name (the logical operation); $ai_trace_id groups generations into one trace. Write…

  4. 4
    Step 4

    Cost-Variance Deep Dive on the Top-N Expensive Operations

    For the {{top_n}} most expensive operations, pull per-trace cost (sum the generations within each $ai_trace_id) and compute the distribution. High variance is where the money leaks. Write…

  5. 5
    Step 5

    Root-Cause the Cost Drivers

    For each outlier trace, pull its generations and test concrete hypotheses. Don't guess — read the actual $ai_generation rows.

  6. 6
    Step 6

    Failure-Mode Detection

    Find the errors and inconsistency the cost view hides. PostHog flags failed calls with $ai_is_error and carries $ai_http_status / $ai_error.

  7. 7
    Step 7

    Qualitative Trace Assessment (manual inspection)

    Aggregates miss the things that actually embarrass a team. Pull {{sample_size}} traces across percentiles (cheapest / median / most expensive / slowest / errored) and read their $ai_input and…

  8. 8
    Step 8

    Rank Recommendations (evidence + WHY + code + measurement)

    Turn findings into at least {{min_recommendations}} ranked recommendations. Each one MUST have all five parts — a recommendation without a measurement plan is an opinion, not an engineering action:

  9. 9
    Step 9

    Factor in Prior Analysis History (if provided)

    If {{analysis_history}} is non-empty, it lists past recommendations and the PRs that acted on them. Use it to make this report a follow-up, not a repeat:

  10. 10
    Step 10

    Write `recommendations.md` (the report) + `summary.md`

    Assemble {{results_dir}}/recommendations.md with this structure:

  11. 11
    Step 11

    Self-Validation Report

    Write {{results_dir}}/validation_report.json: