PostHog Trace Optimizer
Analyze a project's PostHog LLM-analytics data the way a cost-and-quality engineer would, and hand back a recommendations.md report an engineering team can act on this week. The agent queries PostHog (HogQL over $ai_generation events) for aggregate…
11 steps · start to finish.
- 1Step 1
Environment Setup & Connectivity Check
▶Install deps, write the creds to
{{results_dir}}/.envfor reproducibility, and prove the connection before doing anything else. If you cannot connect, STOP — this is a live-data task and every downstream step depends on it.mkdir -p "{{results_dir}}/data" "{{results_dir}}/figures" pip install -q requests litellm pandas matplotlib tabulate cat > "{{results_dir}}/.env" <<EOF POSTHOG_API_KEY=${POSTHOG_API_KEY} POSTHOG_PROJECT_ID=${POSTHOG_PROJECT_ID} POSTHOG_HOST=${POSTHOG_HOST} EOF [ -n "$POSTHOG_API_KEY" ] && [ -n "$POSTHOG_PROJECT_ID" ] && [ -n "$POSTHOG_HOST" ] \ || { echo "ERROR: missing PostHog credentials"; exit 1; } echo "PostHog host: $POSTHOG_HOST project: $POSTHOG_PROJECT_ID"# Connectivity gate — STOP IMMEDIATELY IF THIS FAILS. import os, requests HOST = os.environ["POSTHOG_HOST"].rstrip("/") PID = os.environ["POSTHOG_PROJECT_ID"] KEY = os.environ["POSTHOG_API_KEY"] HEADERS = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"} def hogql(sql: str): """Run a HogQL query against the project and return (columns, rows).""" r = requests.post(f"{HOST}/api/projects/{PID}/query/", headers=HEADERS, json={"query": {"kind": "HogQLQuery", "query": sql}}, timeout=120) r.raise_for_status() j = r.json() return j.get("columns", []), j.get("results", []) # Auth + sanity: are there any $ai_generation events at all? cols, rows = hogql( "SELECT count() FROM events WHERE event = '$ai_generation' " f"AND timestamp >= now() - INTERVAL {int('{{window_days}}')} DAY") n = rows[0][0] if rows else 0 print(f"PostHog connection OK — {n} $ai_generation events in window") assert n and n > 0, "No $ai_generation events found — is LLM analytics instrumented on this project?"If the request 401/403s, or there are zero
$ai_generationevents, do not continue: report the connection/instrumentation failure insummary.mdand exit. Do not fabricate analysis against data you could not read.PostHog
$ai_generationproperty cheat-sheet (used throughout):$ai_model,$ai_provider,$ai_input_tokens,$ai_output_tokens,$ai_total_cost_usd,$ai_latency(seconds),$ai_trace_id,$ai_span_name,$ai_input(prompt JSON),$ai_output_choices(completion JSON),$ai_is_error,$ai_http_status. - 2Step 2
Fetch Live Model Pricing (provider-filtered, cross-check)
▶PostHog computes $ai_total_cost_usd itself, but its map can lag new models (cost shows null). Fetch current pricing from LiteLLM, filtered by the litellm_provider field — not by matching model-name…
- 3Step 3
Aggregate Metrics — Cost & Volume by Operation and Model
▶Use HogQL for cheap aggregates over the window. PostHog's $ai_span_name is the analog of Langfuse's trace name (the logical operation); $ai_trace_id groups generations into one trace. Write…
- 4Step 4
Cost-Variance Deep Dive on the Top-N Expensive Operations
▶For the {{top_n}} most expensive operations, pull per-trace cost (sum the generations within each $ai_trace_id) and compute the distribution. High variance is where the money leaks. Write…
- 5Step 5
Root-Cause the Cost Drivers
▶For each outlier trace, pull its generations and test concrete hypotheses. Don't guess — read the actual $ai_generation rows.
- 6Step 6
Failure-Mode Detection
▶Find the errors and inconsistency the cost view hides. PostHog flags failed calls with $ai_is_error and carries $ai_http_status / $ai_error.
- 7Step 7
Qualitative Trace Assessment (manual inspection)
▶Aggregates miss the things that actually embarrass a team. Pull {{sample_size}} traces across percentiles (cheapest / median / most expensive / slowest / errored) and read their $ai_input and…
- 8Step 8
Rank Recommendations (evidence + WHY + code + measurement)
▶Turn findings into at least {{min_recommendations}} ranked recommendations. Each one MUST have all five parts — a recommendation without a measurement plan is an opinion, not an engineering action:
- 9Step 9
Factor in Prior Analysis History (if provided)
▶If {{analysis_history}} is non-empty, it lists past recommendations and the PRs that acted on them. Use it to make this report a follow-up, not a repeat:
- 10Step 10
Write `recommendations.md` (the report) + `summary.md`
▶Assemble {{results_dir}}/recommendations.md with this structure:
- 11Step 11
Self-Validation Report
▶Write {{results_dir}}/validation_report.json: