anthropics / skill-creator

Skill Creator

This runbook guides an AI agent through the complete lifecycle of creating, testing, iteratively improving, and packaging AI agent skills. Starting from intent capture and research, the agent drafts a SKILL.md, writes test cases, runs parallel evaluations with and without the ski

agent claude-codemodel claude-sonnet-4-6snapshot python312-uveval programmatic11 stepsv1.0.0

Deploy Skill Creator to your jetty.io

One-click installs this runbook into a collection on your Jetty account. You can run it from the Spot dashboard, schedule it, or pipe inputs in via the API.

Deploy on jetty.io →View source

The shape of the run

11 steps · start to finish.

Step 1

Environment Setup

▶

Verify the environment has all required tools and the skill path is valid.

# Check Python is available
python3 --version || { echo "ERROR: python3 not found"; exit 1; }

# Check claude CLI (needed for description optimization)
command -v claude >/dev/null 2>&1 && echo "claude CLI present" || echo "WARN: claude CLI not found (description optimization will be unavailable)"

# Verify skill path exists or create it
SKILL_NAME="${SKILL_NAME:-my-skill}"
SKILL_PATH="${SKILL_PATH:-./$SKILL_NAME}"
mkdir -p "$SKILL_PATH"
mkdir -p "$SKILL_PATH/evals"
mkdir -p "${SKILL_NAME}-workspace"
mkdir -p /app/results

echo "Setup complete. Skill path: $SKILL_PATH"

2
Step 2
Capture Intent
▶
Understand the user's intent before writing anything. If the current conversation already contains a workflow the user wants to capture (e.g., they say "turn this into a skill"), extract answers from history first.
3
Step 3
Interview and Research
▶
Before writing the SKILL.md, proactively gather:
4
Step 4
Write the SKILL.md
▶
Based on the user interview, produce `<skill-path>/SKILL.md` with this structure:
5
Step 5
Write Test Cases
▶
After drafting the skill, create 2–3 realistic test prompts — the kind of message a real user would actually send. Share them with the user for confirmation before running.
6
Step 6
Run & Evaluate (max 5 iterations)
▶
This is one continuous sequence — do not stop partway through.
7
Step 7
Iterate on Errors (max 3 rounds)
▶
After reading feedback, improve the skill and re-run:
8
Step 8
Advanced — Blind Comparison (Optional)
▶
For rigorous comparison between two skill versions, use the blind comparison system in `agents/comparator.md` and `agents/analyzer.md`. An independent agent evaluates both outputs without knowing which is which. This is optional and most users won't need it — the human review loo
9
Step 9
Description Optimization
▶
After finalizing the skill body, offer to optimize the description for better triggering accuracy.
10
Step 10
Package and Present
▶
If the `present_files` tool is available:
11
Step 11
Environment-Specific Adaptations
▶
- **No subagents**: run test cases sequentially, executing SKILL.md instructions inline. Skip baseline runs. - **No browser**: skip `generate_review.py`. Present results directly in conversation; ask for feedback inline.

Parameters

Results directory

default: `/app/results`

Output directory for validation and summary files

Skill intentrequired

What the skill should enable Claude to do

Skill namerequired

Slug identifier for the skill (e.g., `my-skill`)

Workspace dir

default: `<skill-name>-workspace/`

Sibling to the skill directory; holds iteration outputs

Skill pathrequired

Path to the skill directory containing SKILL.md

Model ID

default: `claude-sonnet-4-6`

Dependencies

python3 · required · CLI
claude CLI · required · CLI

Required outputs

<skill-name>-workspace/iteration-N/benchmark.json
Aggregated benchmark results for the final iteration
<skill-name>-workspace/iteration-N/benchmark.md
Human-readable benchmark summary

Origin

source: skills.sh
title: Skill Creator
attr: high

Original →

Skill Creator

Deploy Skill Creator to your jetty.io

11 steps · start to finish.

Environment Setup

Capture Intent

Interview and Research

Write the SKILL.md

Write Test Cases

Run & Evaluate (max 5 iterations)

Iterate on Errors (max 3 rounds)

Advanced — Blind Comparison (Optional)

Description Optimization

Package and Present

Environment-Specific Adaptations