Skill Creator
This runbook guides an AI agent through the complete lifecycle of creating, testing, iteratively improving, and packaging AI agent skills. Starting from intent capture and research, the agent drafts a SKILL.md, writes test cases, runs parallel evaluations with and without the ski
11 steps · start to finish.
- 1Step 1
Environment Setup
▶Verify the environment has all required tools and the skill path is valid.
# Check Python is available python3 --version || { echo "ERROR: python3 not found"; exit 1; } # Check claude CLI (needed for description optimization) command -v claude >/dev/null 2>&1 && echo "claude CLI present" || echo "WARN: claude CLI not found (description optimization will be unavailable)" # Verify skill path exists or create it SKILL_NAME="${SKILL_NAME:-my-skill}" SKILL_PATH="${SKILL_PATH:-./$SKILL_NAME}" mkdir -p "$SKILL_PATH" mkdir -p "$SKILL_PATH/evals" mkdir -p "${SKILL_NAME}-workspace" mkdir -p /app/results echo "Setup complete. Skill path: $SKILL_PATH" - 2Step 2
Capture Intent
▶Understand the user's intent before writing anything. If the current conversation already contains a workflow the user wants to capture (e.g., they say "turn this into a skill"), extract answers from history first.
- 3Step 3
Interview and Research
▶Before writing the SKILL.md, proactively gather:
- 4Step 4
Write the SKILL.md
▶Based on the user interview, produce `<skill-path>/SKILL.md` with this structure:
- 5Step 5
Write Test Cases
▶After drafting the skill, create 2–3 realistic test prompts — the kind of message a real user would actually send. Share them with the user for confirmation before running.
- 6Step 6
Run & Evaluate (max 5 iterations)
▶This is one continuous sequence — do not stop partway through.
- 7Step 7
Iterate on Errors (max 3 rounds)
▶After reading feedback, improve the skill and re-run:
- 8Step 8
Advanced — Blind Comparison (Optional)
▶For rigorous comparison between two skill versions, use the blind comparison system in `agents/comparator.md` and `agents/analyzer.md`. An independent agent evaluates both outputs without knowing which is which. This is optional and most users won't need it — the human review loo
- 9Step 9
Description Optimization
▶After finalizing the skill body, offer to optimize the description for better triggering accuracy.
- 10Step 10
Package and Present
▶If the `present_files` tool is available:
- 11Step 11
Environment-Specific Adaptations
▶- **No subagents**: run test cases sequentially, executing SKILL.md instructions inline. Skip baseline runs. - **No browser**: skip `generate_review.py`. Present results directly in conversation; ask for feedback inline.