jettyio / posthog-trace-optimizer

PostHog Trace Optimizer

Analyze a project's PostHog LLM-analytics data the way a cost-and-quality engineer would, and hand back a recommendations.md report an engineering team can act on this week. The agent queries PostHog (HogQL over $ai_generation events) for aggregate…

agent claude-codemodel anthropic/claude-sonnet-4.6snapshot python312-uveval rubric11 stepsv1.0.0

Deploy PostHog Trace Optimizer to your jetty.io

One-click installs this runbook into a collection on your Jetty account. You can run it from the Spot dashboard, schedule it, or pipe inputs in via the API.

Deploy on jetty.io →View source

The shape of the run

11 steps · start to finish.

Step 1

Environment Setup & Connectivity Check

▶

Install deps, write the creds to {{results_dir}}/.env for reproducibility, and prove the connection before doing anything else. If you cannot connect, STOP — this is a live-data task and every downstream step depends on it.

mkdir -p "{{results_dir}}/data" "{{results_dir}}/figures"
pip install -q requests litellm pandas matplotlib tabulate

cat > "{{results_dir}}/.env" <<EOF
POSTHOG_API_KEY=${POSTHOG_API_KEY}
POSTHOG_PROJECT_ID=${POSTHOG_PROJECT_ID}
POSTHOG_HOST=${POSTHOG_HOST}
EOF

[ -n "$POSTHOG_API_KEY" ] && [ -n "$POSTHOG_PROJECT_ID" ] && [ -n "$POSTHOG_HOST" ] \
  || { echo "ERROR: missing PostHog credentials"; exit 1; }
echo "PostHog host: $POSTHOG_HOST  project: $POSTHOG_PROJECT_ID"

# Connectivity gate — STOP IMMEDIATELY IF THIS FAILS.
import os, requests
HOST = os.environ["POSTHOG_HOST"].rstrip("/")
PID  = os.environ["POSTHOG_PROJECT_ID"]
KEY  = os.environ["POSTHOG_API_KEY"]
HEADERS = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

def hogql(sql: str):
    """Run a HogQL query against the project and return (columns, rows)."""
    r = requests.post(f"{HOST}/api/projects/{PID}/query/",
                      headers=HEADERS, json={"query": {"kind": "HogQLQuery", "query": sql}},
                      timeout=120)
    r.raise_for_status()
    j = r.json()
    return j.get("columns", []), j.get("results", [])

# Auth + sanity: are there any $ai_generation events at all?
cols, rows = hogql(
    "SELECT count() FROM events WHERE event = '$ai_generation' "
    f"AND timestamp >= now() - INTERVAL {int('{{window_days}}')} DAY")
n = rows[0][0] if rows else 0
print(f"PostHog connection OK — {n} $ai_generation events in window")
assert n and n > 0, "No $ai_generation events found — is LLM analytics instrumented on this project?"

If the request 401/403s, or there are zero $ai_generation events, do not continue: report the connection/instrumentation failure in summary.md and exit. Do not fabricate analysis against data you could not read.

PostHog $ai_generation property cheat-sheet (used throughout): $ai_model, $ai_provider, $ai_input_tokens, $ai_output_tokens, $ai_total_cost_usd, $ai_latency (seconds), $ai_trace_id, $ai_span_name, $ai_input (prompt JSON), $ai_output_choices (completion JSON), $ai_is_error, $ai_http_status.

2
Step 2
Fetch Live Model Pricing (provider-filtered, cross-check)
▶
PostHog computes $ai_total_cost_usd itself, but its map can lag new models (cost shows null). Fetch current pricing from LiteLLM, filtered by the litellm_provider field — not by matching model-name…
3
Step 3
Aggregate Metrics — Cost & Volume by Operation and Model
▶
Use HogQL for cheap aggregates over the window. PostHog's $ai_span_name is the analog of Langfuse's trace name (the logical operation); $ai_trace_id groups generations into one trace. Write…
4
Step 4
Cost-Variance Deep Dive on the Top-N Expensive Operations
▶
For the {{top_n}} most expensive operations, pull per-trace cost (sum the generations within each $ai_trace_id) and compute the distribution. High variance is where the money leaks. Write…
5
Step 5
Root-Cause the Cost Drivers
▶
For each outlier trace, pull its generations and test concrete hypotheses. Don't guess — read the actual $ai_generation rows.
6
Step 6
Failure-Mode Detection
▶
Find the errors and inconsistency the cost view hides. PostHog flags failed calls with $ai_is_error and carries $ai_http_status / $ai_error.
7
Step 7
Qualitative Trace Assessment (manual inspection)
▶
Aggregates miss the things that actually embarrass a team. Pull {{sample_size}} traces across percentiles (cheapest / median / most expensive / slowest / errored) and read their $ai_input and…
8
Step 8
Rank Recommendations (evidence + WHY + code + measurement)
▶
Turn findings into at least {{min_recommendations}} ranked recommendations. Each one MUST have all five parts — a recommendation without a measurement plan is an opinion, not an engineering action:
9
Step 9
Factor in Prior Analysis History (if provided)
▶
If {{analysis_history}} is non-empty, it lists past recommendations and the PRs that acted on them. Use it to make this report a follow-up, not a repeat:
10
Step 10
Write `recommendations.md` (the report) + `summary.md`
▶
Assemble {{results_dir}}/recommendations.md with this structure:
11
Step 11
Self-Validation Report
▶
Write {{results_dir}}/validation_report.json:

Parameters

Results directory

default: `{{results_dir}}`

`/app/results`

Analysis window (days)

default: `{{window_days}}`

`30`

Top-N expensive operations

default: `{{top_n}}`

`5`

Qualitative sample size

default: `{{sample_size}}`

`15`

Min recommendations

default: `{{min_recommendations}}`

`3`

Analysis history

default: `{{analysis_history}}`

*(optional)*

Dependencies

POSTHOG_API_KEY / POSTHOG_PROJECT_ID / POSTHOG_HOST · required · Secret
requests (pip) · required · Runtime
litellm (pip) · required · Runtime
pandas (pip) · required · Runtime
matplotlib (pip) · optional · Runtime

Required outputs

{{results_dir}}/recommendations.md
The headline report: executive summary, cost analysis tables, qualitative findings, ≥3 ranked recommendations (each with evidence trace IDs, WHY, code example, measurement plan), and an implementation roadmap.
{{results_dir}}/data/metrics_by_span.csv
Per-operation aggregates (`$ai_span_name`: count, total cost, total tokens, p95 latency) for the analysis window.
{{results_dir}}/data/metrics_by_model.csv
Per-model aggregates (`$ai_model`: count, cost, tokens, error rate).
{{results_dir}}/data/cost_variance.json
Per-expensive-operation cost distribution (min/mean/median/max/std/p95) and flagged outlier trace IDs.
{{results_dir}}/data/qualitative_samples.json
The sampled traces inspected manually, with trace IDs, input/output excerpts, and the abnormality tagged on each.
{{results_dir}}/data/model_pricing.json
Current model pricing fetched from LiteLLM (provider-filtered) — used to cross-check PostHog's computed costs.
{{results_dir}}/summary.md
One-screen executive summary: window, generations analyzed, total cost, top 3 findings, total projected monthly savings.
{{results_dir}}/validation_report.json
Stage-by-stage self-validation with `overall_passed`. See Step 11.

Origin

source: github.com
title: Langfuse Optimizer — Terminal Bench 2.0 task (PostHog sink variant)
attr: high

Original →

PostHog Trace Optimizer

Deploy PostHog Trace Optimizer to your jetty.io

11 steps · start to finish.

Environment Setup & Connectivity Check

Fetch Live Model Pricing (provider-filtered, cross-check)

Aggregate Metrics — Cost & Volume by Operation and Model

Cost-Variance Deep Dive on the Top-N Expensive Operations

Root-Cause the Cost Drivers

Failure-Mode Detection

Qualitative Trace Assessment (manual inspection)

Rank Recommendations (evidence + WHY + code + measurement)

Factor in Prior Analysis History (if provided)

Write `recommendations.md` (the report) + `summary.md`

Self-Validation Report