# Skill Authoring Best Practices

For field definitions and description format, see `./standard-fields.md`. For quality dimensions, see `./quality-dimensions.md`.

## Core Philosophy: Outcome-Based Authoring

Skills should describe **what to achieve**, not **how to achieve it**. The LLM is capable of figuring out the approach — it needs to know the goal, the constraints, and the why.

**The test for every instruction:** Would removing this cause the LLM to produce a worse outcome? If the LLM would do it anyway — or if it's just spelling out mechanical steps — cut it.

### Outcome vs Prescriptive

| Prescriptive (avoid) | Outcome-based (prefer) |
|---|---|
| "Step 1: Ask about goals. Step 2: Ask about constraints. Step 3: Summarize and confirm." | "Ensure the user's vision is fully captured — goals, constraints, and edge cases — before proceeding." |
| "Load config. Read user_name. Read communication_language. Greet the user by name in their language." | "Load available config and greet the user appropriately." |
| "Create a file. Write the header. Write section 1. Write section 2. Save." | "Produce a report covering X, Y, and Z." |

The prescriptive versions miss requirements the author didn't think of. The outcome-based versions let the LLM adapt to the actual situation.

### Why This Works

- **Why over what** — When you explain why something matters, the LLM adapts to novel situations. When you just say what to do, it follows blindly even when it shouldn't.
- **Context enables judgment** — Give domain knowledge, constraints, and goals. The LLM figures out the approach. It's better at adapting to messy reality than any script you could write.
- **Prescriptive steps create brittleness** — When reality doesn't match the script, the LLM either follows the wrong script or gets confused. Outcomes let it adapt.
- **Every instruction should carry its weight** — If the LLM would do it anyway, the instruction is noise. If the LLM wouldn't know to do it without being told, that's signal.

### When Prescriptive Is Right

Reserve exact steps for **fragile operations** where getting it wrong has consequences — script invocations, exact file paths, specific CLI commands, API calls with precise parameters. These need low freedom because there's one right way to do them.

| Freedom | When | Example |
|---------|------|---------|
| **High** (outcomes) | Multiple valid approaches, LLM judgment adds value | "Ensure the user's requirements are complete" |
| **Medium** (guided) | Preferred approach exists, some variation OK | "Present findings in a structured report with an executive summary" |
| **Low** (exact) | Fragile, one right way, consequences for deviation | `python3 scripts/scan-path-standards.py {skill-path}` |

## Patterns

These are patterns that naturally emerge from outcome-based thinking. Apply them when they fit — they're not a checklist.

### Soft Gate Elicitation

At natural transitions, invite contribution without demanding it: "Anything else, or shall we move on?" Users almost always remember one more thing when given a graceful exit ramp. This produces richer artifacts than rigid section-by-section questioning.

### Intent-Before-Ingestion

Understand why the user is here before scanning documents or project context. Intent gives you the relevance filter — without it, scanning is noise.

### Capture-Don't-Interrupt

When users provide information beyond the current scope, capture it for later rather than redirecting. Users in creative flow share their best insights unprompted — interrupting loses them.

### Dual-Output: Human Artifact + LLM Distillate

Artifact-producing skills can output both a polished human-facing document and a token-efficient distillate for downstream LLM consumption. The distillate captures overflow, rejected ideas, and detail that doesn't belong in the human doc but has value for the next workflow. Always optional.

### Parallel Review Lenses

Before finalizing significant artifacts, fan out reviewers with different perspectives — skeptic, opportunity spotter, domain-specific lens. If subagents aren't available, do a single critical self-review pass. Multiple perspectives catch blind spots no single reviewer would.

### Three-Mode Architecture (Guided / Yolo / Headless)

Consider whether the skill benefits from multiple execution modes:

| Mode | When | Behavior |
|------|------|----------|
| **Guided** | Default | Conversational discovery with soft gates |
| **Yolo** | "just draft it" | Ingest everything, draft complete artifact, then refine |
| **Headless** | `--headless` / `-H` | Complete the task without user input, using sensible defaults |

Not all skills need all three. But considering them during design prevents locking into a single interaction model.

### Graceful Degradation

Every subagent-dependent feature should have a fallback path. A skill that hard-fails without subagents is fragile — one that falls back to sequential processing works everywhere.

### Verifiable Intermediate Outputs

For complex tasks with consequences: plan → validate → execute → verify. Create a verifiable plan before executing, validate with scripts where possible. Catches errors early and makes the work reversible.

## Writing Guidelines

- **Consistent terminology** — one term per concept, stick to it
- **Third person** in descriptions — "Processes files" not "I help process files"
- **Descriptive file names** — `form_validation_rules.md` not `doc2.md`
- **Forward slashes** in all paths — cross-platform
- **One level deep** for reference files — SKILL.md → reference.md, never chains
- **TOC for long files** — >100 lines

## Anti-Patterns

| Anti-Pattern | Fix |
|---|---|
| Numbered steps for things the LLM would figure out | Describe the outcome and why it matters |
| Explaining how to load config (the mechanic) | List the config keys and their defaults (the outcome) |
| Prescribing exact greeting/menu format | "Greet the user and present capabilities" |
| Spelling out headless mode in detail | "If headless, complete without user input" |
| Too many options upfront | One default with escape hatch |
| Deep reference nesting (A→B→C) | Keep references 1 level from SKILL.md |
| Inconsistent terminology | Choose one term per concept |
| Scripts that classify meaning via regex | Intelligence belongs in prompts, not scripts |

## Scripts in Skills

- **Execute vs reference** — "Run `analyze.py`" (execute) vs "See `analyze.py` for the algorithm" (read)
- **Document constants** — explain why `TIMEOUT = 30`, not just what
- **PEP 723 for Python** — self-contained with inline dependency declarations
- **MCP tools** — use fully qualified names: `ServerName:tool_name`