Langfuse Trace Optimizer
Analyze a project's Langfuse trace data the way a cost-and-quality engineer would, and hand back a recommendations.md report that an engineering team can act on this week. The agent connects to Langfuse, pulls aggregate metrics and individual traces, finds…
11 steps · start to finish.
- 1Step 1
Environment Setup & Connectivity Check
▶Install deps, write the creds to
{{results_dir}}/.envfor reproducibility, and prove the connection before doing anything else. If you cannot connect, STOP — this is a live-data task and every downstream step depends on it.mkdir -p "{{results_dir}}/data" "{{results_dir}}/figures" pip install -q langfuse litellm pandas matplotlib tabulate # Persist creds for reproducibility (never echo the secret values into logs). cat > "{{results_dir}}/.env" <<EOF LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY} LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY} LANGFUSE_HOST=${LANGFUSE_HOST} EOF [ -n "$LANGFUSE_PUBLIC_KEY" ] && [ -n "$LANGFUSE_SECRET_KEY" ] && [ -n "$LANGFUSE_HOST" ] \ || { echo "ERROR: missing Langfuse credentials"; exit 1; } echo "Langfuse host: $LANGFUSE_HOST"# Connectivity gate — STOP IMMEDIATELY IF THIS FAILS. from langfuse import get_client langfuse = get_client() assert langfuse.auth_check(), "Langfuse auth_check() failed — bad keys or host" print("Langfuse connection OK")If
auth_check()raises or returns false, do not continue: report the connection failure insummary.mdand exit. Do not fabricate analysis against data you could not read. - 2Step 2
Fetch Live Model Pricing (provider-filtered)
▶Accurate cost analysis needs current pricing. Fetch it from LiteLLM and filter by the litellm_provider field — not by matching model-name substrings, so new families (gpt-4.1, claude-4, gemini-2) are…
- 3Step 3
Aggregate Metrics — Cost & Volume by Trace and Model
▶Use the Metrics API (cheap aggregates, not row-by-row listing) for the window and two sub-windows (3d/7d/Nd). Write metrics_by_trace.csv and metrics_by_model.csv.
- 4Step 4
Cost-Variance Deep Dive on the Top-N Expensive Traces
▶For the {{top_n}} most expensive trace names, list individual traces and compute the cost distribution. High variance is where the money leaks. Write data/cost_variance.json.
- 5Step 5
Root-Cause the Cost Drivers
▶For each expensive/outlier trace, pull its observations and test concrete hypotheses. Don't guess — read the actual generations.
- 6Step 6
Failure-Mode Detection
▶Find the errors and inconsistency the cost view hides. Use the Metrics API with a level filter, then catalog patterns.
- 7Step 7
Qualitative Trace Assessment (manual inspection)
▶Aggregates miss the things that actually embarrass a team. Pull {{sample_size}} traces across percentiles (cheapest / median / most expensive / slowest / errored) and read them. Write…
- 8Step 8
Rank Recommendations (evidence + WHY + code + measurement)
▶Turn findings into at least {{min_recommendations}} ranked recommendations. Each one MUST have all five parts — a recommendation without a measurement plan is an opinion, not an engineering action:
- 9Step 9
Factor in Prior Analysis History (if provided)
▶If {{analysis_history}} is non-empty, it lists past recommendations and the PRs that acted on them. Use it to make this report a follow-up, not a repeat:
- 10Step 10
Write `recommendations.md` (the report) + `summary.md`
▶Assemble {{results_dir}}/recommendations.md with this structure:
- 11Step 11
Self-Validation Report
▶Write {{results_dir}}/validation_report.json: