← All runbooks
anthropics / skill-creator

Skill Creator

This runbook guides an AI agent through the complete lifecycle of creating, testing, iteratively improving, and packaging AI agent skills. Starting from intent capture and research, the agent drafts a SKILL.md, writes test cases, runs parallel evaluations with and without the ski

agent claude-codemodel claude-sonnet-4-6snapshot python312-uveval programmatic11 stepsv1.0.0

Deploy Skill Creator to your jetty.io

One-click installs this runbook into a collection on your Jetty account. You can run it from the Spot dashboard, schedule it, or pipe inputs in via the API.

The shape of the run

11 steps · start to finish.

  1. 1
    Step 1

    Environment Setup

    Verify the environment has all required tools and the skill path is valid.

    # Check Python is available
    python3 --version || { echo "ERROR: python3 not found"; exit 1; }
    
    # Check claude CLI (needed for description optimization)
    command -v claude >/dev/null 2>&1 && echo "claude CLI present" || echo "WARN: claude CLI not found (description optimization will be unavailable)"
    
    # Verify skill path exists or create it
    SKILL_NAME="${SKILL_NAME:-my-skill}"
    SKILL_PATH="${SKILL_PATH:-./$SKILL_NAME}"
    mkdir -p "$SKILL_PATH"
    mkdir -p "$SKILL_PATH/evals"
    mkdir -p "${SKILL_NAME}-workspace"
    mkdir -p /app/results
    
    echo "Setup complete. Skill path: $SKILL_PATH"
    

  2. 2
    Step 2

    Capture Intent

    Understand the user's intent before writing anything. If the current conversation already contains a workflow the user wants to capture (e.g., they say "turn this into a skill"), extract answers from history first.

  3. 3
    Step 3

    Interview and Research

    Before writing the SKILL.md, proactively gather:

  4. 4
    Step 4

    Write the SKILL.md

    Based on the user interview, produce `<skill-path>/SKILL.md` with this structure:

  5. 5
    Step 5

    Write Test Cases

    After drafting the skill, create 2–3 realistic test prompts — the kind of message a real user would actually send. Share them with the user for confirmation before running.

  6. 6
    Step 6

    Run & Evaluate (max 5 iterations)

    This is one continuous sequence — do not stop partway through.

  7. 7
    Step 7

    Iterate on Errors (max 3 rounds)

    After reading feedback, improve the skill and re-run:

  8. 8
    Step 8

    Advanced — Blind Comparison (Optional)

    For rigorous comparison between two skill versions, use the blind comparison system in `agents/comparator.md` and `agents/analyzer.md`. An independent agent evaluates both outputs without knowing which is which. This is optional and most users won't need it — the human review loo

  9. 9
    Step 9

    Description Optimization

    After finalizing the skill body, offer to optimize the description for better triggering accuracy.

  10. 10
    Step 10

    Package and Present

    If the `present_files` tool is available:

  11. 11
    Step 11

    Environment-Specific Adaptations

    - **No subagents**: run test cases sequentially, executing SKILL.md instructions inline. Skip baseline runs. - **No browser**: skip `generate_review.py`. Present results directly in conversation; ask for feedback inline.