refactor(ux): consolidate BMAD skills, update design system, and clean up Prisma generated client

This commit is contained in:
Sepehr Ramezani
2026-04-19 19:21:27 +02:00
parent 5296c4da2c
commit 25529a24b8
2476 changed files with 127934 additions and 101962 deletions

View File

@@ -0,0 +1,176 @@
---
name: build-process
description: Six-phase conversational discovery process for building BMad workflows and skills. Covers intent discovery, skill type classification, requirements gathering, drafting, building, and summary.
---
**Language:** Use `{communication_language}` for all output.
# Build Process
Build workflows and skills through conversational discovery. Your north star: **outcome-driven design**. Every instruction in the final skill should describe what to achieve, not prescribe how to do it step by step. Only add procedural detail where the LLM would genuinely fail without it.
## Phase 1: Discover Intent
Understand their vision before diving into specifics. Let them describe what they want to build — encourage detail on edge cases, tone, persona, tools, and other skills involved.
**Input flexibility:** Accept input in any format:
- Existing BMad workflow/skill path → read and extract intent (see below)
- Rough idea or description → guide through discovery
- Code, documentation, API specs → extract intent and requirements
- Non-BMad skill/tool → extract intent for conversion
### When given an existing skill
**Critical:** Treat the existing skill as a **description of intent**, not a specification to follow. Extract _what_ it's trying to achieve. Do not inherit its verbosity, structure, or mechanical procedures — the old skill is reference material, not a template.
If the SKILL.md routing already asked the 3-way question (Analyze/Edit/Rebuild), proceed with that intent. Otherwise ask now:
- **Edit** — changing specific behavior while keeping the current approach
- **Rebuild** — rethinking from core outcomes, full discovery using the old skill as context
For **Edit**: use the edit fast-track — full Phase 1-6 is for Rebuild and new builds only.
**Edit fast-track:**
1. Read the relevant files in the existing skill
2. Understand the specific change requested and its scope
3. Apply the change following outcome-driven principles — preserve what works, improve what's targeted
4. Run lint gate (Phase 5 lint steps) and fix any findings
5. Present the change and lint results
If the edit touches the skill's core architecture, classification, or requires rethinking multiple stages, recommend Rebuild instead.
For **Rebuild**: read the old skill to understand its goals, then proceed through full discovery as if building new — the old skill informs your questions but doesn't constrain the design.
### Discovery questions (don't skip these, even with existing input)
The best skills come from understanding the human's intent, not reverse-engineering it from code. Walk through these conversationally — adapt based on what the user has already shared:
- What is the **core outcome** this skill delivers? What does success look like?
- **Who is the user** and how should the experience feel? What's the interaction model — collaborative discovery, rapid execution, guided interview?
- What **judgment calls** does the LLM need to make vs. just do mechanically?
- What's the **one thing** this skill must get right?
- Are there things the user might not know or might get wrong? How should the skill handle that?
The goal is to conversationally gather enough to cover Phase 2 and 3 naturally. Since users often brain-dump rich detail, adapt subsequent phases to what you already know.
## Phase 2: Classify Skill Type
Ask upfront:
- Will this be part of a module? If yes:
- What's the module code?
- What other skills will it use from the core or module? (need name, inputs, outputs for integration)
- What config variables does it need access to?
Load `./classification-reference.md` and classify. Present classification with reasoning.
For Simple Workflows and Complex Workflows, also ask:
- **Headless mode?** Should this support `--headless`? (If it produces an artifact, headless is often valuable)
## Phase 3: Gather Requirements
Work through conversationally, adapted per skill type. Glean from what the user already shared or suggest based on their narrative.
**All types — Common fields:**
- **Name:** kebab-case. Module: `{modulecode}-{skillname}`. Standalone: `{skillname}`. The `bmad-` prefix is reserved for official BMad creations only.
- **Description:** Two parts: [5-8 word summary]. [Use when user says 'specific phrase'.] — Default to conservative triggering. See `./standard-fields.md` for format.
- **Overview:** What/How/Why-Outcome. For interactive or complex skills, include domain framing and theory of mind — these give the executing agent context for judgment calls.
- **Role guidance:** Brief "Act as a [role/expert]" primer
- **Design rationale:** Non-obvious choices the executing agent should understand
- **External skills used:** Which skills does this invoke?
- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval before proceeding — dependencies add install-time cost and require `uv` to be available.
- **Creates output documents?** If yes, will use `{document_output_language}`
**Simple Utility additional:**
- Input/output format, standalone?, composability
**Simple Workflow additional:**
- Steps (inline in SKILL.md), config variables
**Complex Workflow additional:**
- Stages with purposes, progression conditions, headless behavior, config variables
**Module capability metadata (if part of a module):**
Confirm with user: phase-name, after (dependencies), before (downstream), is-required, description (short — what it produces, not how).
**Path conventions (CRITICAL):**
- Skill-internal: `references/`, `scripts/` (relative to skill root)
- Project-scope paths: `{project-root}/...` (any path relative to project root)
- Config variables used directly — they already contain `{project-root}`
## Phase 4: Draft & Refine
Think one level deeper. Clarify gaps in logic or understanding. Create and present a plan. Point out vague areas. Iterate until ready.
**Pruning check (apply before building):**
For every planned instruction, ask: **would the LLM do this correctly without being told?** If yes, cut it. Scoring algorithms, calibration tables, decision matrices for subjective judgment, weighted formulas — these are things LLMs handle naturally. The instruction must earn its place by preventing a failure that would otherwise happen.
Watch especially for:
- Mechanical procedures for tasks the LLM does through general capability
- Per-platform instructions when a single adaptive instruction works
- Templates that explain things the LLM already knows (how to format output, how to greet users)
- Multiple files that could be a single instruction
## Phase 5: Build
**Load these before building:**
- `./standard-fields.md` — field definitions, description format, path rules
- `./skill-best-practices.md` — outcome-driven authoring, patterns, anti-patterns
- `./quality-dimensions.md` — build quality checklist
**Load based on skill type:**
- **If Complex Workflow:** `./complex-workflow-patterns.md` — compaction survival, config integration, progressive disclosure
Load the template from `assets/SKILL-template.md` and `./template-substitution-rules.md`. Build the skill with progressive disclosure (SKILL.md for overview and routing, `references/` for progressive disclosure content). Output to `{bmad_builder_output_folder}`.
**Skill Source Tree** (only create subfolders that are needed):
```
{skill-name}/
├── SKILL.md # Frontmatter, overview, activation, routing
├── references/ # Progressive disclosure content — prompts, guides, schemas
├── assets/ # Templates, starter files
├── scripts/ # Deterministic code with tests
│ └── tests/
```
| Location | Contains | LLM relationship |
| ------------------- | ---------------------------------- | ------------------------------------ |
| **SKILL.md** | Overview, activation, routing | LLM identity and router |
| **`references/`** | Capability prompts, reference data | Loaded on demand |
| **`assets/`** | Templates, starter files | Copied/transformed into output |
| **`scripts/`** | Python, shell scripts with tests | Invoked for deterministic operations |
**If the built skill includes scripts**, also load `./script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start.
**Lint gate** — after building, validate and auto-fix:
If subagents available, delegate lint-fix to a subagent. Otherwise run inline.
1. Run both lint scripts in parallel:
```bash
python3 scripts/scan-path-standards.py {skill-path}
python3 scripts/scan-scripts.py {skill-path}
```
2. Fix high/critical findings and re-run (up to 3 attempts per script)
3. Run unit tests if scripts exist in the built skill
## Phase 6: Summary
Present what was built: location, structure, capabilities. Include lint results.
Run unit tests if scripts exist. Remind user to commit before quality analysis.
**Offer quality analysis:** Ask if they'd like a Quality Analysis to identify opportunities. If yes, load `./quality-analysis.md` with the skill path.

View File

@@ -4,10 +4,10 @@ Classify the skill type based on user requirements. This table is for internal u
## 3-Type Taxonomy
| Type | Description | Structure | When to Use |
|------|-------------|-----------|-------------|
| **Simple Utility** | Input/output building block. Headless, composable, often has scripts. | Single SKILL.md + scripts/ | Composable building block with clear input/output, single-purpose |
| **Simple Workflow** | Multi-step process contained in a single SKILL.md. Minimal or no prompt files. | SKILL.md + optional references/ | Multi-step process that fits in one file, no progressive disclosure needed |
| Type | Description | Structure | When to Use |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- | ---------------------------------------------------------------------------- |
| **Simple Utility** | Input/output building block. Headless, composable, often has scripts. | Single SKILL.md + scripts/ | Composable building block with clear input/output, single-purpose |
| **Simple Workflow** | Multi-step process contained in a single SKILL.md. Minimal or no prompt files. | SKILL.md + optional references/ | Multi-step process that fits in one file, no progressive disclosure needed |
| **Complex Workflow** | Multi-stage with progressive disclosure, numbered prompt files at root, config integration. May support headless mode. | SKILL.md (routing) + prompt stages at root + references/ | Multiple stages, long-running process, progressive disclosure, routing logic |
## Decision Tree
@@ -28,6 +28,7 @@ Classify the skill type based on user requirements. This table is for internal u
## Classification Signals
### Simple Utility Signals
- Clear input → processing → output pattern
- No user interaction needed during execution
- Other skills/workflows call it
@@ -36,6 +37,7 @@ Classify the skill type based on user requirements. This table is for internal u
- Examples: JSON validator, schema checker, format converter
### Simple Workflow Signals
- 3-8 numbered steps
- User interaction at specific points
- Uses standard tools (gh, git, npm, etc.)
@@ -44,6 +46,7 @@ Classify the skill type based on user requirements. This table is for internal u
- Examples: PR creator, deployment checklist, code review
### Complex Workflow Signals
- Multiple distinct phases/stages
- Long-running (likely to hit context compaction)
- Progressive disclosure needed (too much for one file)
@@ -55,5 +58,8 @@ Classify the skill type based on user requirements. This table is for internal u
## Module Context (Orthogonal)
Module context is asked for ALL types:
- **Module-based:** Part of a BMad module. Uses `bmad-{modulecode}-{skillname}` naming. Config loading includes a fallback pattern — if config is missing, the skill informs the user that the module setup skill is available and continues with sensible defaults.
- **Standalone:** Independent skill. Uses `bmad-{skillname}` naming. Config loading is best-effort — load if available, use defaults if not, no mention of a setup skill.
- **Module-based:** Part of a module. Uses `{modulecode}-{skillname}` naming. Config loading includes a fallback pattern — if config is missing, the skill informs the user that the module setup skill is available and continues with sensible defaults.
- **Standalone:** Independent skill. Uses `{skillname}` naming (no prefix required). Config loading is best-effort — load if available, use defaults if not, no mention of a setup skill.
The `bmad-` prefix is reserved for official BMad creations. User-created skills should not include it unless the user specifically requests it.

View File

@@ -17,12 +17,14 @@ Workflows read config from `{project-root}/_bmad/config.yaml` and `config.user.y
### Config Loading Pattern
**Module-based skills** — load with fallback and setup skill awareness:
```
Load config from {project-root}/_bmad/config.yaml ({module-code} section) and config.user.yaml.
If missing: inform user that {module-setup-skill} is available, continue with sensible defaults.
```
**Standalone skills** — load best-effort:
```
Load config from {project-root}/_bmad/config.yaml and config.user.yaml if available.
If missing: continue with defaults — no mention of setup skill.
@@ -33,6 +35,7 @@ If missing: continue with defaults — no mention of setup skill.
Load core config (user preferences, language, output locations) with sensible defaults. If the workflow creates documents, include document output language.
**Example config line for a document-producing workflow:**
```
vars: user_name:BMad,communication_language:English,document_output_language:English,output_folder:{project-root}/_bmad-output,bmad_builder_output_folder:{project-root}/bmad-builder-creations/
```
@@ -57,12 +60,12 @@ Each stage after the first reads the output document to recover context. If comp
```markdown
---
title: "Analysis: Research Topic"
status: "analysis"
title: 'Analysis: Research Topic'
status: 'analysis'
inputs:
- "{project_root}/docs/brief.md"
created: "2025-03-02T10:00:00Z"
updated: "2025-03-02T11:30:00Z"
- '{project_root}/docs/brief.md'
created: '2025-03-02T10:00:00Z'
updated: '2025-03-02T11:30:00Z'
---
```
@@ -75,6 +78,7 @@ updated: "2025-03-02T11:30:00Z"
## Sequential Progressive Disclosure
Use numbered prompt files at the skill root when:
- Multi-phase workflow with ordered stages
- Input of one phase affects the next
- Workflow is long-running and stages shouldn't be visible upfront
@@ -101,7 +105,7 @@ Each stage prompt specifies prerequisites, progression conditions, and next dest
## Module Metadata Reference
BMad module workflows require extended frontmatter metadata. See `./references/metadata-reference.md` for the metadata template and field explanations.
BMad module workflows require extended frontmatter metadata. See `./metadata-reference.md` for the metadata template and field explanations.
---
@@ -114,6 +118,6 @@ Before finalizing a BMad module workflow, verify:
- [ ] Portable paths — artifacts use `{project_root}`?
- [ ] Compaction survival — each stage writes to output document?
- [ ] Document-as-cache — YAML front matter with status and inputs?
- [ ] Progressive disclosure — stages in `./references/` with progression conditions?
- [ ] Progressive disclosure — stages in `references/` with progression conditions?
- [ ] Final polish — subagent polish step at the end?
- [ ] Recovery — can resume by reading output doc front matter?

View File

@@ -0,0 +1,106 @@
---
name: convert-process
description: Automated skill conversion workflow. Analyzes an existing skill, rebuilds it outcome-driven, and generates a before/after HTML comparison report.
---
**Language:** Use `{communication_language}` for all output.
# Convert Process
Convert any existing skill into a BMad-compliant, outcome-driven equivalent. Whether the input is bloated, poorly structured, or simply non-conformant, this process extracts intent, rebuilds following BMad best practices, and produces a before/after comparison report.
This process is always headless — no interactive questions. The original skill provides all the context needed.
## Step 1: Capture the Original
1. **Fetch/read the original skill.** If a URL was provided, fetch the raw content. If a local path, read all files in the skill directory.
2. **Save the original.** Write the complete original content to `{bmad_builder_reports}/convert-{skill-name}/original/SKILL.md` (and any additional files if the original is a multi-file skill). This preserved copy is needed for the comparison script.
3. **Note the source** (URL or path) for the report metadata.
## Step 2: Rebuild from Intent
Load and follow `build-process.md` with these parameters pre-set:
- **Intent:** Rebuild — rethink from core outcomes, the original is reference material only
- **Headless mode:** Active — skip all interactive questions, use sensible defaults
- **Discovery questions:** Answer them yourself by analyzing the original skill's intent
- **Classification:** Determine from the original's structure and purpose
- **Requirements:** Derive from the original, applying aggressive pruning
**Critical:** Do not inherit the original's verbosity, structure, or mechanical procedures. Extract *what it achieves*, then build the leanest skill that delivers the same outcome.
When the build process reaches Phase 6 (Summary), skip the quality analysis offer and continue to Step 3 below.
## Step 3: Generate Comparison Report
After the rebuilt skill is complete:
1. **Create the analysis file.** Write `{bmad_builder_reports}/convert-{skill-name}/convert-analysis.json`:
```json
{
"skill_name": "{skill-name}",
"original_source": "{url-or-path-provided-by-user}",
"cuts": [
{
"category": "Category Name",
"description": "Why this content was cut",
"examples": ["Specific example 1", "Specific example 2"],
"severity": "high|medium|low"
}
],
"retained": [
{
"category": "Category Name",
"description": "Why this content was kept — what behavioral impact it has"
}
],
"verdict": "One sharp sentence summarizing the conversion"
}
```
### Categorizing Changes
Not every conversion is about bloat — some skills are well-intentioned but non-conformant. Categorize what changed and why, drawing from these common patterns:
**Content removal** (when applicable):
| Category | Signal |
|----------|--------|
| **Training Data Redundancy** | Facts, biographies, domain knowledge the LLM already has |
| **Prescriptive Procedures** | Step-by-step instructions for things the LLM reasons through naturally |
| **Mechanical Frameworks** | Scoring rubrics, decision matrices, evaluation checklists for subjective judgment |
| **Generic Boilerplate** | "Best Practices", "Common Pitfalls", "When to Use/Not Use" filler |
| **Template Bloat** | Response format templates, greeting scripts, output structure prescriptions |
| **Redundant Examples** | Examples that repeat what the instructions already say |
| **Per-Platform Duplication** | Separate instructions per platform when one adaptive instruction works |
**Structural changes** (conformance to BMad best practices):
| Category | Signal |
|----------|--------|
| **Progressive Disclosure** | Monolithic content split into SKILL.md routing + references |
| **Outcome-Driven Rewrite** | Prescriptive instructions reframed as outcomes |
| **Frontmatter/Description** | Added or fixed BMad-compliant frontmatter and trigger phrases |
| **Path Convention Fixes** | Corrected file references to use `./` for skill-internal, `{project-root}/` for project-scope |
Severity: **high** = significant impact on quality or compliance, **medium** = notable improvement, **low** = minor or stylistic.
### Categorizing Retained Content
Focus on what the LLM *wouldn't* do correctly without being told. The retained categories should explain why each piece earns its place.
2. **Generate the HTML report:**
```bash
python3 ./scripts/generate-convert-report.py \
"{bmad_builder_reports}/convert-{skill-name}/original" \
"{rebuilt-skill-path}" \
"{bmad_builder_reports}/convert-{skill-name}/convert-analysis.json" \
-o "{bmad_builder_reports}/convert-{skill-name}/convert-report.html" \
--open
```
3. **Present the summary** — key metrics, reduction percentages, report file location. The HTML report opens automatically.

View File

@@ -0,0 +1,150 @@
---
name: quality-analysis
description: Comprehensive quality analysis for BMad workflows and skills. Runs deterministic lint scripts and spawns parallel subagents for judgment-based scanning. Produces a synthesized report with themes and actionable opportunities.
menu-code: QA
---
# Quality Analysis
Communicate with user in `{communication_language}`. Write report content in `{document_output_language}`.
You orchestrate quality analysis on a BMad workflow or skill. Deterministic checks run as scripts (fast, zero tokens). Judgment-based analysis runs as LLM subagents. A report creator synthesizes everything into a unified, theme-based report.
## Your Role: Coordination, Not File Reading
**DO NOT read the target skill's files yourself.** Scripts and subagents do all analysis.
You orchestrate: run deterministic scripts and pre-pass extractors, spawn LLM scanner subagents in parallel, then hand off to the report creator for synthesis.
## Headless Mode
If `{headless_mode}=true`, skip all user interaction, use safe defaults, note any warnings, and output structured JSON as specified in the Present Findings section.
## Pre-Scan Checks
Check for uncommitted changes. In headless mode, note warnings and proceed. In interactive mode, inform the user and confirm before proceeding. In interactive mode, also confirm the workflow is currently functioning.
## Analysis Principles
**Effectiveness over efficiency.** The analysis may suggest leaner phrasing, but if the current phrasing captures the right guidance, it should be kept. Over-optimization can make skills lose their effectiveness. The report presents opportunities — the user applies judgment.
## Scanners
### Lint Scripts (Deterministic — Run First)
These run instantly, cost zero tokens, and produce structured JSON:
| # | Script | Focus | Output File |
| --- | -------------------------------- | --------------------------------------- | -------------------------- |
| S1 | `scripts/scan-path-standards.py` | Path conventions | `path-standards-temp.json` |
| S2 | `scripts/scan-scripts.py` | Script portability, PEP 723, unit tests | `scripts-temp.json` |
### Pre-Pass Scripts (Feed LLM Scanners)
Extract metrics so LLM scanners work from compact data instead of raw files:
| # | Script | Feeds | Output File |
| --- | --------------------------------------- | ---------------------------- | --------------------------------- |
| P1 | `scripts/prepass-workflow-integrity.py` | workflow-integrity scanner | `workflow-integrity-prepass.json` |
| P2 | `scripts/prepass-prompt-metrics.py` | prompt-craft scanner | `prompt-metrics-prepass.json` |
| P3 | `scripts/prepass-execution-deps.py` | execution-efficiency scanner | `execution-deps-prepass.json` |
### LLM Scanners (Judgment-Based — Run After Scripts)
Each scanner writes a free-form analysis document (not JSON):
| # | Scanner | Focus | Pre-Pass? | Output File |
| --- | ------------------------------------------- | ------------------------------------------------------------------------- | --------- | --------------------------------------- |
| L1 | `quality-scan-workflow-integrity.md` | Structural completeness, naming, type-appropriate requirements | Yes | `workflow-integrity-analysis.md` |
| L2 | `quality-scan-prompt-craft.md` | Token efficiency, outcome-driven balance, progressive disclosure, pruning | Yes | `prompt-craft-analysis.md` |
| L3 | `quality-scan-execution-efficiency.md` | Parallelization, subagent delegation, context optimization | Yes | `execution-efficiency-analysis.md` |
| L4 | `quality-scan-skill-cohesion.md` | Stage flow, purpose alignment, complexity appropriateness | No | `skill-cohesion-analysis.md` |
| L5 | `quality-scan-enhancement-opportunities.md` | Edge cases, UX gaps, user journeys, headless potential | No | `enhancement-opportunities-analysis.md` |
| L6 | `quality-scan-script-opportunities.md` | Deterministic operations that should be scripts | No | `script-opportunities-analysis.md` |
## Execution
First create output directory: `{bmad_builder_reports}/{skill-name}/quality-analysis/{date-time-stamp}/`
### Step 1: Run All Scripts (Parallel)
Run all lint scripts and pre-pass scripts in parallel:
```bash
python3 scripts/scan-path-standards.py {skill-path} -o {report-dir}/path-standards-temp.json
python3 scripts/scan-scripts.py {skill-path} -o {report-dir}/scripts-temp.json
uv run scripts/prepass-workflow-integrity.py {skill-path} -o {report-dir}/workflow-integrity-prepass.json
python3 scripts/prepass-prompt-metrics.py {skill-path} -o {report-dir}/prompt-metrics-prepass.json
uv run scripts/prepass-execution-deps.py {skill-path} -o {report-dir}/execution-deps-prepass.json
```
### Step 2: Spawn LLM Scanners (Parallel)
After scripts complete, spawn all applicable LLM scanners as parallel subagents.
**For scanners WITH pre-pass (L1, L2, L3):** provide the pre-pass JSON file path so the scanner reads compact metrics first, then reads raw files only as needed for judgment calls.
**For scanners WITHOUT pre-pass (L4, L5, L6):** provide just the skill path and output directory.
Each subagent receives:
- Scanner file to load
- Skill path: `{skill-path}`
- Output directory: `{report-dir}`
- Pre-pass file path (if applicable)
The subagent loads the scanner file, analyzes the skill, writes its analysis to the output directory, and returns the filename.
### Step 3: Synthesize Report
After all scanners complete, spawn a subagent with `./report-quality-scan-creator.md`.
Provide:
- `{skill-path}` — The skill being analyzed
- `{quality-report-dir}` — Directory containing all scanner output
The report creator reads everything, synthesizes themes, and writes:
1. `quality-report.md` — Narrative markdown report
2. `report-data.json` — Structured data for HTML
### Step 4: Generate HTML Report
After the report creator finishes, generate the interactive HTML:
```bash
python3 scripts/generate-html-report.py {report-dir} --open
```
This reads `report-data.json` and produces `quality-report.html` — a self-contained interactive report with opportunity themes, "Fix This Theme" prompt generation, and expandable detailed analysis.
## Present to User
**IF `{headless_mode}=true`:**
Read `report-data.json` and output:
```json
{
"headless_mode": true,
"scan_completed": true,
"report_file": "{path}/quality-report.md",
"html_report": "{path}/quality-report.html",
"data_file": "{path}/report-data.json",
"warnings": [],
"grade": "Excellent|Good|Fair|Poor",
"opportunities": 0,
"broken": 0
}
```
**IF interactive:**
Read `report-data.json` and present:
1. Grade and narrative — the 2-3 sentence synthesis
2. Broken items (if any) — critical/high issues prominently
3. Top opportunities — theme names with finding counts and impact
4. Reports — "Full report: quality-report.md" and "Interactive HTML opened in browser"
5. Offer: apply fixes directly, use HTML to select specific items, or discuss findings

View File

@@ -16,13 +16,13 @@ The executing agent needs enough context to make judgment calls when situations
- Simple utilities need minimal context — input/output is self-explanatory
- Interactive/complex workflows need domain understanding, user perspective, and rationale for non-obvious choices
- When in doubt, explain *why* — an agent that understands the mission improvises better than one following blind steps
- When in doubt, explain _why_ — an agent that understands the mission improvises better than one following blind steps
## 3. Intelligence Placement
Scripts handle plumbing (fetch, transform, validate). Prompts handle judgment (interpret, classify, decide).
**Test:** If a script contains an `if` that decides what content *means*, intelligence has leaked.
**Test:** If a script contains an `if` that decides what content _means_, intelligence has leaked.
**Reverse test:** If a prompt validates structure, counts items, parses known formats, compares against schemas, or checks file existence — determinism has leaked into the LLM. That work belongs in a script.
@@ -30,9 +30,9 @@ Scripts handle plumbing (fetch, transform, validate). Prompts handle judgment (i
SKILL.md stays focused. Detail goes where it belongs.
- Stage instructions → `./references/`
- Reference data, schemas, large tables → `./references/`
- Templates, config files → `./assets/`
- Stage instructions → `references/`
- Reference data, schemas, large tables → `references/`
- Templates, config files → `assets/`
- Multi-branch SKILL.md under ~250 lines: fine as-is
- Single-purpose up to ~500 lines (~5000 tokens): acceptable if focused
@@ -40,13 +40,13 @@ SKILL.md stays focused. Detail goes where it belongs.
Two parts: `[5-8 word summary]. [Use when user says 'X' or 'Y'.]`
Default to conservative triggering. See `./references/standard-fields.md` for full format.
Default to conservative triggering. See `./standard-fields.md` for full format.
## 6. Path Construction
Only use `{project-root}` for `_bmad` paths. Config variables used directly — they already contain `{project-root}`.
Use `{project-root}` for any project-scope path. Use `./` only for same-folder references; cross-directory paths are bare and relative to skill root. Config variables used directly — they already contain `{project-root}`.
See `./references/standard-fields.md` for correct/incorrect patterns.
See `./standard-fields.md` for correct/incorrect patterns.
## 7. Token Efficiency

View File

@@ -0,0 +1,185 @@
# Quality Scan: Creative Edge-Case & Experience Innovation
You are **DreamBot**, a creative disruptor who pressure-tests workflows by imagining what real humans will actually do with them — especially the things the builder never considered. You think wild first, then distill to sharp, actionable suggestions.
## Overview
Other scanners check if a skill is built correctly, crafted well, runs efficiently, and holds together. You ask the question none of them do: **"What's missing that nobody thought of?"**
You read a skill and genuinely _inhabit_ it — imagine yourself as six different users with six different contexts, skill levels, moods, and intentions. Then you find the moments where the skill would confuse, frustrate, dead-end, or underwhelm them. You also find the moments where a single creative addition would transform the experience from functional to delightful.
This is the BMad dreamer scanner. Your job is to push boundaries, challenge assumptions, and surface the ideas that make builders say "I never thought of that." Then temper each wild idea into a concrete, succinct suggestion the builder can actually act on.
**This is purely advisory.** Nothing here is broken. Everything here is an opportunity.
## Your Role
You are NOT checking structure, craft quality, performance, or test coverage — other scanners handle those. You are the creative imagination that asks:
- What happens when users do the unexpected?
- What assumptions does this skill make that might not hold?
- Where would a confused user get stuck with no way forward?
- Where would a power user feel constrained?
- What's the one feature that would make someone love this skill?
- What emotional experience does this skill create, and could it be better?
## Scan Targets
Find and read:
- `SKILL.md` — Understand the skill's purpose, audience, and flow
- `*.md` prompt files at root — Walk through each stage as a user would experience it
- `references/*.md` — Understand what supporting material exists
## Creative Analysis Lenses
### 1. Edge Case Discovery
Imagine real users in real situations. What breaks, confuses, or dead-ends?
**User archetypes to inhabit:**
- The **first-timer** who has never used this kind of tool before
- The **expert** who knows exactly what they want and finds the workflow too slow
- The **confused user** who invoked this skill by accident or with the wrong intent
- The **edge-case user** whose input is technically valid but unexpected
- The **hostile environment** where external dependencies fail, files are missing, or context is limited
- The **automator** — a cron job, CI pipeline, or another agent that wants to invoke this skill headless with pre-supplied inputs and get back a result
**Questions to ask at each stage:**
- What if the user provides partial, ambiguous, or contradictory input?
- What if the user wants to skip this stage or go back to a previous one?
- What if the user's real need doesn't fit the skill's assumed categories?
- What happens if an external dependency (file, API, other skill) is unavailable?
- What if the user changes their mind mid-workflow?
- What if context compaction drops critical state mid-conversation?
### 2. Experience Gaps
Where does the skill deliver output but miss the _experience_?
| Gap Type | What to Look For |
| ------------------------ | ----------------------------------------------------------------------------------------- |
| **Dead-end moments** | User hits a state where the skill has nothing to offer and no guidance on what to do next |
| **Assumption walls** | Skill assumes knowledge, context, or setup the user might not have |
| **Missing recovery** | Error or unexpected input with no graceful path forward |
| **Abandonment friction** | User wants to stop mid-workflow but there's no clean exit or state preservation |
| **Success amnesia** | Skill completes but doesn't help the user understand or use what was produced |
| **Invisible value** | Skill does something valuable but doesn't surface it to the user |
### 3. Delight Opportunities
Where could a small addition create outsized positive impact?
| Opportunity Type | Example |
| ------------------------- | ------------------------------------------------------------------------------ |
| **Quick-win mode** | "I already have a spec, skip the interview" — let experienced users fast-track |
| **Smart defaults** | Infer reasonable defaults from context instead of asking every question |
| **Proactive insight** | "Based on what you've described, you might also want to consider..." |
| **Progress awareness** | Help the user understand where they are in a multi-stage workflow |
| **Memory leverage** | Use prior conversation context or project knowledge to personalize |
| **Graceful degradation** | When something goes wrong, offer a useful alternative instead of just failing |
| **Unexpected connection** | "This pairs well with [other skill]" — suggest adjacent capabilities |
### 4. Assumption Audit
Every skill makes assumptions. Surface the ones that are most likely to be wrong.
| Assumption Category | What to Challenge |
| ----------------------------- | ------------------------------------------------------------------------ |
| **User intent** | Does the skill assume a single use case when users might have several? |
| **Input quality** | Does the skill assume well-formed, complete input? |
| **Linear progression** | Does the skill assume users move forward-only through stages? |
| **Context availability** | Does the skill assume information that might not be in the conversation? |
| **Single-session completion** | Does the skill assume the workflow completes in one session? |
| **Skill isolation** | Does the skill assume it's the only thing the user is doing? |
### 5. Headless Potential
Many workflows are built for human-in-the-loop interaction — conversational discovery, iterative refinement, user confirmation at each stage. But what if someone passed in a headless flag and a detailed prompt? Could this workflow just... do its job, create the artifact, and return the file path?
This is one of the most transformative "what ifs" you can ask about a HITL workflow. A skill that works both interactively AND headlessly is dramatically more valuable — it can be invoked by other skills, chained in pipelines, run on schedules, or used by power users who already know what they want.
**For each HITL interaction point, ask:**
| Question | What You're Looking For |
| ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| Could this question be answered by input parameters? | "What type of project?" → could come from a prompt or config instead of asking |
| Could this confirmation be skipped with reasonable defaults? | "Does this look right?" → if the input was detailed enough, skip confirmation |
| Is this clarification always needed, or only for ambiguous input? | "Did you mean X or Y?" → only needed when input is vague |
| Does this interaction add value or just ceremony? | Some confirmations exist because the builder assumed interactivity, not because they're necessary |
**Assess the skill's headless potential:**
| Level | What It Means |
| ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| **Headless-ready** | Could work headlessly today with minimal changes — just needs a flag to skip confirmations |
| **Easily adaptable** | Most interaction points could accept pre-supplied parameters; needs a headless path added to 2-3 stages |
| **Partially adaptable** | Core artifact creation could be headless, but discovery/interview stages are fundamentally interactive — suggest a "skip to build" entry point |
| **Fundamentally interactive** | The value IS the conversation (coaching, brainstorming, exploration) — headless mode wouldn't make sense, and that's OK |
**When the skill IS adaptable, suggest the output contract:**
- What would a headless invocation return? (file path, JSON summary, status code)
- What inputs would it need upfront? (parameters that currently come from conversation)
- Where would the `{headless_mode}` flag need to be checked?
- Which stages could auto-resolve vs which need explicit input even in headless mode?
**Don't force it.** Some skills are fundamentally conversational — their value is the interactive exploration. Flag those as "fundamentally interactive" and move on. The insight is knowing which skills _could_ transform, not pretending all of them should.
### 6. Facilitative Workflow Patterns
If the skill involves collaborative discovery, artifact creation through user interaction, or any form of guided elicitation — check whether it leverages established facilitative patterns. These patterns are proven to produce richer artifacts and better user experiences. Missing them is a high-value opportunity.
**Check for these patterns:**
| Pattern | What to Look For | If Missing |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| **Soft Gate Elicitation** | Does the workflow use "anything else or shall we move on?" at natural transitions? | Suggest replacing hard menus with soft gates — they draw out information users didn't know they had |
| **Intent-Before-Ingestion** | Does the workflow understand WHY the user is here before scanning artifacts/context? | Suggest reordering: greet → understand intent → THEN scan. Scanning without purpose is noise |
| **Capture-Don't-Interrupt** | When users provide out-of-scope info during discovery, does the workflow capture it silently or redirect/stop them? | Suggest a capture-and-defer mechanism — users in creative flow share their best insights unprompted |
| **Dual-Output** | Does the workflow produce only a human artifact, or also offer an LLM-optimized distillate for downstream consumption? | If the artifact feeds into other LLM workflows, suggest offering a token-efficient distillate alongside the primary output |
| **Parallel Review Lenses** | Before finalizing, does the workflow get multiple perspectives on the artifact? | Suggest fanning out 2-3 review subagents (skeptic, opportunity spotter, contextually-chosen third lens) before final output |
| **Three-Mode Architecture** | Does the workflow only support one interaction style? | If it produces an artifact, consider whether Guided/Yolo/Autonomous modes would serve different user contexts |
| **Graceful Degradation** | If the workflow uses subagents, does it have fallback paths when they're unavailable? | Every subagent-dependent feature should degrade to sequential processing, never block the workflow |
**How to assess:** These patterns aren't mandatory for every workflow — a simple utility doesn't need three-mode architecture. But any workflow that involves collaborative discovery, user interviews, or artifact creation through guided interaction should be checked against all seven. Flag missing patterns as `medium-opportunity` or `high-opportunity` depending on how transformative they'd be for the specific skill.
### 7. User Journey Stress Test
Mentally walk through the skill end-to-end as each user archetype. Document the moments where the journey breaks, stalls, or disappoints.
For each journey, note:
- **Entry friction** — How easy is it to get started? What if the user's first message doesn't perfectly match the expected trigger?
- **Mid-flow resilience** — What happens if the user goes off-script, asks a tangential question, or provides unexpected input?
- **Exit satisfaction** — Does the user leave with a clear outcome, or does the workflow just... stop?
- **Return value** — If the user came back to this skill tomorrow, would their previous work be accessible or lost?
## How to Think
1. **Go wild first.** Read the skill and let your imagination run. Think of the weirdest user, the worst timing, the most unexpected input. No idea is too crazy in this phase.
2. **Then temper.** For each wild idea, ask: "Is there a practical version of this that would actually improve the skill?" If yes, distill it to a sharp, specific suggestion. If the idea is genuinely impractical, drop it — don't pad findings with fantasies.
3. **Prioritize by user impact.** A suggestion that prevents user confusion outranks a suggestion that adds a nice-to-have feature. A suggestion that transforms the experience outranks one that incrementally improves it.
4. **Stay in your lane.** Don't flag structural issues (workflow-integrity handles that), craft quality (prompt-craft handles that), performance (execution-efficiency handles that), or architectural coherence (skill-cohesion handles that). Your findings should be things _only a creative thinker would notice_.
## Output
Write your analysis as a natural document. Include:
- **Skill understanding** — purpose, primary user, key assumptions (2-3 sentences)
- **User journeys** — for each archetype (first-timer, expert, confused, edge-case, hostile-environment, automator): a brief narrative, friction points, and bright spots
- **Headless assessment** — potential level (headless-ready/easily-adaptable/partially-adaptable/fundamentally-interactive), which interaction points could auto-resolve, what a headless invocation would need
- **Key findings** — edge cases, experience gaps, delight opportunities. Each with severity (high-opportunity/medium-opportunity/low-opportunity), affected area, what you noticed, and a concrete suggestion
- **Top insights** — the 2-3 most impactful creative observations, distilled
- **Facilitative patterns check** — which of the 7 patterns are present/missing and which would be most valuable to add
Go wild first, then temper. Prioritize by user impact. The report creator will synthesize your analysis with other scanners' output.
Write your analysis to: `{quality-report-dir}/enhancement-opportunities-analysis.md`
Return only the filename when complete.

View File

@@ -0,0 +1,235 @@
# Quality Scan: Execution Efficiency
You are **ExecutionEfficiencyBot**, a performance-focused quality engineer who validates that workflows execute efficiently — operations are parallelized, contexts stay lean, dependencies are optimized, and subagent patterns follow best practices.
## Overview
You validate execution efficiency across the entire skill: parallelization, subagent delegation, context management, stage ordering, and dependency optimization. **Why this matters:** Sequential independent operations waste time. Parent reading before delegating bloats context. Missing batching adds latency. Poor stage ordering creates bottlenecks. Over-constrained dependencies prevent parallelism. Efficient execution means faster, cheaper, more reliable skill operation.
This is a unified scan covering both _how work is distributed_ (subagent delegation, context optimization) and _how work is ordered_ (stage sequencing, dependency graphs, parallelization). These concerns are deeply intertwined — you can't evaluate whether operations should be parallel without understanding the dependency graph, and you can't evaluate delegation quality without understanding context impact.
## Your Role
Read the skill's SKILL.md and all prompt files. Identify inefficient execution patterns, missed parallelization opportunities, context bloat risks, and dependency issues.
## Scan Targets
Find and read:
- `SKILL.md` — On Activation patterns, operation flow
- `*.md` prompt files at root — Each prompt for execution patterns
- `references/*.md` — Resource loading patterns
---
## Part 1: Parallelization & Batching
### Sequential Operations That Should Be Parallel
| Check | Why It Matters |
| ----------------------------------------------- | ------------------------------------- |
| Independent data-gathering steps are sequential | Wastes time — should run in parallel |
| Multiple files processed sequentially in loop | Should use parallel subagents |
| Multiple tools called in sequence independently | Should batch in one message |
| Multiple sources analyzed one-by-one | Should delegate to parallel subagents |
```
BAD (Sequential):
1. Read file A
2. Read file B
3. Read file C
4. Analyze all three
GOOD (Parallel):
Read files A, B, C in parallel (single message with multiple Read calls)
Then analyze
```
### Tool Call Batching
| Check | Why It Matters |
| ----------------------------------------------- | ---------------------------------- |
| Independent tool calls batched in one message | Reduces latency |
| No sequential Read calls for different files | Single message with multiple Reads |
| No sequential Grep calls for different patterns | Single message with multiple Greps |
| No sequential Glob calls for different patterns | Single message with multiple Globs |
### Language Patterns That Indicate Missed Parallelization
| Pattern Found | Likely Problem |
| ------------------------------------ | ------------------------------------------- |
| "Read all files in..." | Needs subagent delegation or parallel reads |
| "Analyze each document..." | Needs subagent per document |
| "Scan through resources..." | Needs subagent for resource files |
| "Review all prompts..." | Needs subagent per prompt |
| Loop patterns ("for each X, read Y") | Should use parallel subagents |
---
## Part 2: Subagent Delegation & Context Management
### Read Avoidance (Critical Pattern)
**Don't read files in parent when you could delegate the reading.** This is the single highest-impact optimization pattern.
```
BAD: Parent bloats context, then delegates "analysis"
1. Read doc1.md (2000 lines)
2. Read doc2.md (2000 lines)
3. Delegate: "Summarize what you just read"
# Parent context: 4000+ lines plus summaries
GOOD: Delegate reading, stay lean
1. Delegate subagent A: "Read doc1.md, extract X, return JSON"
2. Delegate subagent B: "Read doc2.md, extract X, return JSON"
# Parent context: two small JSON results
```
| Check | Why It Matters |
| ---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Parent doesn't read sources before delegating analysis | Context stays lean |
| Parent delegates READING, not just analysis | Subagents do heavy lifting |
| No "read all, then analyze" patterns | Context explosion avoided |
| No implicit instructions that would cause parent to read subagent-intended content | Instructions like "acknowledge inputs" or "summarize what you received" cause agents to read files even without explicit Read calls — bypassing the subagent architecture entirely |
**The implicit read trap:** If a later stage delegates document analysis to subagents, check that earlier stages don't contain instructions that would cause the parent to read those same documents first. Look for soft language ("review", "acknowledge", "assess", "summarize what you have") in stages that precede subagent delegation — an agent will interpret these as "read the files" even when that's not the intent. The fix is explicit: "note document paths for subagent scanning, don't read them now."
### When Subagent Delegation Is Needed
| Scenario | Threshold | Why |
| ---------------------------- | -------------------- | -------------------------------------------------- |
| Multi-document analysis | 5+ documents | Each doc adds thousands of tokens |
| Web research | 5+ sources | Each page returns full HTML |
| Large file processing | File 10K+ tokens | Reading entire file explodes context |
| Resource scanning on startup | Resources 5K+ tokens | Loading all resources every activation is wasteful |
| Log analysis | Multiple log files | Logs are verbose by nature |
| Prompt validation | 10+ prompts | Each prompt needs individual review |
### Subagent Instruction Quality
| Check | Why It Matters |
| -------------------------------------------------------------------- | -------------------------------------------------------------- |
| Subagent prompt specifies exact return format | Prevents verbose output |
| Token limit guidance provided (50-100 tokens for summaries) | Ensures succinct results |
| JSON structure required for structured results | Parseable, enables automated processing |
| File path included in return format | Parent needs to know which source produced findings |
| "ONLY return" or equivalent constraint language | Prevents conversational filler |
| Explicit instruction to delegate reading (not "read yourself first") | Without this, parent may try to be helpful and read everything |
```
BAD: Vague instruction
"Analyze this file and discuss your findings"
# Returns: Prose, explanations, may include entire content
GOOD: Structured specification
"Read {file}. Return ONLY a JSON object with:
{
'key_findings': [3-5 bullet points max],
'issues': [{severity, location, description}],
'recommendations': [actionable items]
}
No other output. No explanations outside the JSON."
```
### Subagent Chaining Constraint
**Subagents cannot spawn other subagents.** Chain through parent.
| Check | Why It Matters |
| ------------------------------------------------- | --------------------------------------- |
| No subagent spawning from within subagent prompts | Won't work — violates system constraint |
| Multi-step workflows chain through parent | Each step isolated, parent coordinates |
### Resource Loading Optimization
| Check | Why It Matters |
| -------------------------------------------------------- | -------------------------------------------------- |
| Resources not loaded as single block on every activation | Large resources should be loaded selectively |
| Specific resource files loaded when needed | Load only what the current stage requires |
| Subagent delegation for resource analysis | If analyzing all resources, use subagents per file |
| "Essential context" separated from "full reference" | Prevents loading everything when summary suffices |
### Result Aggregation Patterns
| Approach | When to Use |
| -------------------- | ---------------------------------------------------- |
| Return to parent | Small results, immediate synthesis needed |
| Write to temp files | Large results (10+ items), separate aggregation step |
| Background subagents | Long-running tasks, no clarifying questions needed |
| Check | Why It Matters |
| ---------------------------------------------------------- | ------------------------------------ |
| Large results use temp file aggregation | Prevents context explosion in parent |
| Separate aggregator subagent for synthesis of many results | Clean separation of concerns |
---
## Part 3: Stage Ordering & Dependency Optimization
### Stage Ordering
| Check | Why It Matters |
| ----------------------------------------------------- | -------------------------------------------------- |
| Stages ordered to maximize parallel execution | Independent stages should not be serialized |
| Early stages produce data needed by many later stages | Shared dependencies should run first |
| Validation stages placed before expensive operations | Fail fast — don't waste tokens on doomed workflows |
| Quick-win stages ordered before heavy stages | Fast feedback improves user experience |
```
BAD: Expensive stage runs before validation
1. Generate full output (expensive)
2. Validate inputs (cheap)
3. Report errors
GOOD: Validate first, then invest
1. Validate inputs (cheap, fail fast)
2. Generate full output (expensive, only if valid)
3. Report results
```
### Dependency Graph Optimization
| Check | Why It Matters |
| ---------------------------------------------------------------------- | --------------------------------------------------- |
| `after` only lists true hard dependencies | Over-constraining prevents parallelism |
| `before` captures downstream consumers | Allows engine to sequence correctly |
| `is-required` used correctly (true = hard block, false = nice-to-have) | Prevents unnecessary bottlenecks |
| No circular dependency chains | Execution deadlock |
| Diamond dependencies resolved correctly | A→B, A→C, B→D, C→D should allow B and C in parallel |
| Transitive dependencies not redundantly declared | If A→B→C, A doesn't need to also declare C |
### Workflow Dependency Accuracy
| Check | Why It Matters |
| --------------------------------------------- | --------------------------------- |
| Only true dependencies are sequential | Independent work runs in parallel |
| Dependency graph is accurate | No artificial bottlenecks |
| No "gather then process" for independent data | Each item processed independently |
---
## Severity Guidelines
| Severity | When to Apply |
| ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Critical** | Circular dependencies (execution deadlock), subagent-spawning-from-subagent (will fail at runtime) |
| **High** | Parent-reads-before-delegating (context bloat), sequential independent operations with 5+ items, missing delegation for large multi-source operations |
| **Medium** | Missed batching opportunities, subagent instructions without output format, stage ordering inefficiencies, over-constrained dependencies |
| **Low** | Minor parallelization opportunities (2-3 items), result aggregation suggestions, soft ordering improvements |
---
## Output
Write your analysis as a natural document. Include:
- **Assessment** — overall efficiency verdict in 2-3 sentences
- **Key findings** — each with severity (critical/high/medium/low), affected file:line, current pattern, efficient alternative, and estimated token/time savings. Critical = circular deps or subagent-from-subagent. High = parent-reads-before-delegating, sequential independent ops with 5+ items. Medium = missed batching, stage ordering issues. Low = minor parallelization opportunities.
- **Optimization opportunities** — larger structural changes that would improve efficiency, with estimated impact
- **What's already efficient** — patterns worth preserving
Be specific about file paths, line numbers, and savings estimates. The report creator will synthesize your analysis with other scanners' output.
Write your analysis to: `{quality-report-dir}/execution-efficiency-analysis.md`
Return only the filename when complete.

View File

@@ -0,0 +1,273 @@
# Quality Scan: Prompt Craft
You are **PromptCraftBot**, a quality engineer who understands that great prompts balance efficiency with the context an executing agent needs to make intelligent decisions.
## Overview
You evaluate the craft quality of a workflow/skill's prompts — SKILL.md and all stage prompts. This covers token efficiency, anti-patterns, outcome focus, and instruction clarity as a **unified assessment** rather than isolated checklists. The reason these must be evaluated together: a finding that looks like "waste" from a pure efficiency lens may be load-bearing context that enables the agent to handle situations the prompt doesn't explicitly cover. Your job is to distinguish between the two.
## Your Role
Read every prompt in the skill and evaluate craft quality with this core principle:
**Informed Autonomy over Scripted Execution.** The best prompts give the executing agent enough domain understanding to improvise when situations don't match the script. The worst prompts are either so lean the agent has no framework for judgment, or so bloated the agent can't find the instructions that matter. Your findings should push toward the sweet spot.
## Scan Targets
Find and read:
- `SKILL.md` — Primary target, evaluated with SKILL.md-specific criteria (see below)
- `*.md` prompt files at root — Each stage prompt evaluated for craft quality
- `references/*.md` — Check progressive disclosure is used properly
---
## Part 1: SKILL.md Craft
The SKILL.md is special. It's the first thing the executing agent reads when the skill activates. It sets the mental model, establishes domain understanding, and determines whether the agent will execute with informed judgment or blind procedure-following. Leanness matters here, but so does comprehension.
### The Overview Section (Required, Load-Bearing)
Every SKILL.md must start with an `## Overview` section. This is the agent's mental model — it establishes domain understanding, mission context, and the framework for judgment calls. The Overview is NOT a separate "vision" section — it's a unified block that weaves together what the skill does, why it matters, and what the agent needs to understand about the domain and users.
A good Overview includes whichever of these elements are relevant to the skill:
| Element | Purpose | Guidance |
| -------------------------------------------------- | -------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| What this skill does and why it matters | Tells agent the mission and what "good" looks like | 2-4 sentences. An agent that understands the mission makes better judgment calls. |
| Domain framing (what are we building/operating on) | Gives agent conceptual vocabulary for the domain | Essential for complex workflows. A workflow builder that doesn't explain what workflows ARE can't build good ones. |
| Theory of mind guidance | Helps agent understand the user's perspective | Valuable for interactive workflows. "Users may not know technical terms" changes how the agent communicates. This is powerful — a single sentence can reshape the agent's entire communication approach. |
| Design rationale for key decisions | Explains WHY specific approaches were chosen | Prevents the agent from "optimizing" away important constraints it doesn't understand. |
**When to flag the Overview as excessive:**
- Exceeds ~10-12 sentences for a single-purpose skill (tighten, don't remove)
- Same concept restated that also appears in later sections
- Philosophical content disconnected from what the skill actually does
**When NOT to flag the Overview:**
- It establishes mission context (even if "soft")
- It defines domain concepts the skill operates on
- It includes theory of mind guidance for user-facing workflows
- It explains rationale for design choices that might otherwise be questioned
### SKILL.md Size & Progressive Disclosure
**Size guidelines — these are guidelines, not hard rules:**
| Scenario | Acceptable Size | Notes |
| ----------------------------------------------------------------------- | ------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| Multi-branch skill where each branch is lightweight | Up to ~250 lines | Each branch section should have a brief explanation of what it handles and why, even if the procedure is short |
| Single-purpose skill with no branches | Up to ~500 lines (~5000 tokens) | Rare, but acceptable if the content is genuinely needed and focused on one thing |
| Any skill with large data tables, schemas, or reference material inline | Flag for extraction | These belong in `references/` or `assets/`, not the SKILL.md body |
**Progressive disclosure techniques — how SKILL.md stays lean without stripping context:**
| Technique | When to Use | What to Flag |
| ------------------------------------- | -------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| Branch to prompt `*.md` files at root | Multiple execution paths where each path needs detailed instructions | All detailed path logic inline in SKILL.md when it pushes beyond size guidelines |
| Load from `references/*.md` | Domain knowledge, reference tables, examples >30 lines, large data | Large reference blocks or data tables inline that aren't needed every activation |
| Load from `assets/` | Templates, schemas, config files | Template content pasted directly into SKILL.md |
| Routing tables | Complex workflows with multiple entry points | Long prose describing "if this then go here, if that then go there" |
**Flag when:** SKILL.md contains detailed content that belongs in prompt files or references/ — data tables, schemas, long reference material, or detailed multi-step procedures for branches that could be separate prompts.
**Don't flag:** Overview context, branch summary sections with brief explanations of what each path handles, or design rationale. These ARE needed on every activation because they establish the agent's mental model. A multi-branch SKILL.md under ~250 lines with brief-but-contextual branch sections is good design, not an anti-pattern.
### Detecting Over-Optimization (Under-Contextualized Skills)
A skill that has been aggressively optimized — or built too lean from the start — will show these symptoms:
| Symptom | What It Looks Like | Impact |
| ------------------------------- | ----------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| Missing or empty Overview | SKILL.md jumps straight to "## On Activation" or step 1 with no context | Agent follows steps mechanically, can't adapt when situations vary |
| No domain framing in Overview | Instructions reference concepts (workflows, agents, reviews) without defining what they are in this context | Agent uses generic understanding instead of skill-specific framing |
| No theory of mind | Interactive workflow with no guidance on user perspective | Agent communicates at wrong level, misses user intent |
| No design rationale | Procedures prescribed without explaining why | Agent may "optimize" away important constraints, or give poor guidance when improvising |
| Bare procedural skeleton | Entire skill is numbered steps with no connective context | Works for simple utilities, fails for anything requiring judgment |
| Branch sections with no context | Multi-branch SKILL.md where branches are just procedure with no explanation of what each handles or why | Agent can't make informed routing decisions or adapt within a branch |
| Missing "what good looks like" | No examples, no quality bar, no success criteria beyond completion | Agent produces technically correct but low-quality output |
**When to flag under-contextualization:**
- Complex or interactive workflows with no Overview context at all — flag as **high severity**
- Stage prompts that handle judgment calls (classification, user interaction, creative output) with no domain context — flag as **medium severity**
- Simple utilities or I/O transforms with minimal framing — this is fine, do NOT flag
**Suggested remediation for under-contextualized skills:**
- Strengthen the Overview: what is this skill for, why does it matter, what does "good" look like (2-4 sentences minimum)
- Add domain framing to Overview if the skill operates on concepts that benefit from definition
- Add theory of mind guidance if the skill interacts with users
- Add brief design rationale for non-obvious procedural choices
- For multi-branch skills: add a brief explanation at each branch section of what it handles and why
- Keep additions brief — the goal is informed autonomy, not a dissertation
### SKILL.md Anti-Patterns
| Pattern | Why It's a Problem | Fix |
| --------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| SKILL.md exceeds size guidelines with no progressive disclosure | Context-heavy on every activation, likely contains extractable content | Extract detailed procedures to prompt files at root, reference material and data to references/ |
| Large data tables, schemas, or reference material inline | This is never needed on every activation — bloats context | Move to `references/` or `assets/`, load on demand |
| No Overview or empty Overview | Agent follows steps without understanding why — brittle when situations vary | Add Overview with mission, domain framing, and relevant context |
| Overview without connection to behavior | Philosophy that doesn't change how the agent executes | Either connect it to specific instructions or remove it |
| Multi-branch sections with zero context | Agent can't understand what each branch is for | Add 1-2 sentence explanation per branch — what it handles and why |
| Routing logic described in prose | Hard to parse, easy to misfollow | Use routing table or clear conditional structure |
**Not an anti-pattern:** A multi-branch SKILL.md under ~250 lines where each branch has brief contextual explanation. This is good design — the branches don't need heavy prescription, and keeping them together gives the agent a unified view of the skill's capabilities.
---
## Part 2: Stage Prompt Craft
Stage prompts (prompt `*.md` files at skill root) are the working instructions for each phase of execution. These should be more procedural than SKILL.md, but still benefit from brief context about WHY this stage matters.
### Config Header
| Check | Why It Matters |
| ----------------------------------------------------------- | ---------------------------------------------------------------- |
| Has config header establishing language and output settings | Agent needs `{communication_language}` and output format context |
| Uses config variables, not hardcoded values | Flexibility across projects and users |
### Progression Conditions
| Check | Why It Matters |
| ------------------------------------------------ | ------------------------------------------------------------------------------- |
| Explicit progression conditions at end of prompt | Agent must know when this stage is complete |
| Conditions are specific and testable | "When done" is vague; "When all fields validated and user confirms" is testable |
| Specifies what happens next | Agent needs to know where to go after this stage |
### Self-Containment (Context Compaction Survival)
| Check | Why It Matters |
| ------------------------------------------------------------- | ---------------------------------------------------------- |
| Prompt works independently of SKILL.md being in context | Context compaction may drop SKILL.md during long workflows |
| No references to "as described above" or "per the overview" | Those references break when context compacts |
| Critical instructions are in the prompt, not only in SKILL.md | Instructions only in SKILL.md may be lost |
### Intelligence Placement
| Check | Why It Matters |
| -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Scripts handle deterministic operations (validation, parsing, formatting) | Scripts are faster, cheaper, and reproducible |
| Prompts handle judgment calls (classification, interpretation, adaptation) | AI reasoning is for semantic understanding, not regex |
| No script-based classification of meaning | If a script uses regex to decide what content MEANS, that's intelligence done badly |
| No prompt-based deterministic operations | If a prompt validates structure, counts items, parses known formats, or compares against schemas — that work belongs in a script. Flag as `intelligence-placement` with a note that L6 (script-opportunities scanner) will provide detailed analysis |
### Stage Prompt Context Sufficiency
Stage prompts that handle judgment calls need enough context to make good decisions — even if SKILL.md has been compacted away.
| Check | When to Flag |
| --------------------------------------------------------------------- | --------------------------------------------------- |
| Judgment-heavy prompt with no brief context on what it's doing or why | Always — this prompt will produce mechanical output |
| Interactive prompt with no user perspective guidance | When the stage involves user communication |
| Classification/routing prompt with no criteria or examples | When the prompt must distinguish between categories |
A 1-2 sentence context block at the top of a stage prompt ("This stage evaluates X because Y. Users at this point typically need Z.") is not waste — it's the minimum viable context for informed execution. Flag its _absence_ in judgment-heavy prompts, not its presence.
---
## Part 3: Universal Craft Quality (SKILL.md AND Stage Prompts)
These apply everywhere but must be evaluated with nuance, not mechanically.
### Genuine Token Waste
Flag these — they're always waste regardless of context:
| Pattern | Example | Fix |
| ------------------------------------- | --------------------------------------------------------- | ------------------------------------------------ |
| Exact repetition | Same instruction in two sections | Remove duplicate, keep the one in better context |
| Defensive padding | "Make sure to...", "Don't forget to...", "Remember to..." | Use direct imperative: "Load config first" |
| Meta-explanation | "This workflow is designed to process..." | Delete — just give the instructions |
| Explaining the model to itself | "You are an AI that...", "As a language model..." | Delete — the agent knows what it is |
| Conversational filler with no purpose | "Let's think about this...", "Now we'll..." | Delete or replace with direct instruction |
### Context That Looks Like Waste But Isn't
Do NOT flag these as token waste:
| Pattern | Why It's Valuable |
| ----------------------------------------------------------------- | ---------------------------------------------------------------- |
| Brief domain framing in Overview (what are workflows/agents/etc.) | Executing agent needs domain vocabulary to make judgment calls |
| Design rationale ("we do X because Y") | Prevents agent from undermining the design when improvising |
| Theory of mind notes ("users may not know...") | Changes how agent communicates — directly affects output quality |
| Warm/coaching tone in interactive workflows | Affects the agent's communication style with users |
| Examples that illustrate ambiguous concepts | Worth the tokens when the concept genuinely needs illustration |
### Outcome vs Implementation Balance
The right balance depends on the type of skill:
| Skill Type | Lean Toward | Rationale |
| ----------------------------------------- | ------------------------------------------------- | ---------------------------------------------------------------- |
| Simple utility (I/O transform) | Outcome-focused | Agent just needs to know WHAT output to produce |
| Simple workflow (linear steps) | Mix of outcome + key HOW | Agent needs some procedural guidance but can fill gaps |
| Complex workflow (branching, multi-stage) | Outcome + rationale + selective HOW | Agent needs to understand WHY to make routing/judgment decisions |
| Interactive/conversational workflow | Outcome + theory of mind + communication guidance | Agent needs to read the user and adapt |
**Flag over-specification when:** Every micro-step is prescribed for a task the agent could figure out with an outcome description.
**Don't flag procedural detail when:** The procedure IS the value (e.g., subagent orchestration patterns, specific API sequences, security-critical operations).
### Pruning: Instructions the LLM Doesn't Need
Beyond micro-step over-specification, check for entire blocks that teach the LLM something it already knows. The pruning test: **"Would the LLM do this correctly without this instruction?"** If the answer is yes, the block is noise — it should be cut regardless of how well-written it is.
**Flag as HIGH when the skill contains any of these:**
| Anti-Pattern | Why It's Noise | Example |
| --------------------------------------------------- | ---------------------------------------------------------------- | ------------------------------------------------------------------- |
| Weighted scoring formulas for subjective judgment | LLMs naturally assess relevance without numeric weights | "Compute score: expertise(×4) + complementarity(×3) + recency(×2)" |
| Point-based decision systems for natural assessment | LLMs read the room without scorecards | "Cross-talk if score ≥ 2: opposing positions +3, complementary -2" |
| Calibration tables mapping signals to parameters | LLMs naturally calibrate depth, agent count, tone | "Quick question → 1 agent, Brief, No cross-talk, Fast model" |
| Per-platform adapter files | LLMs know their own platform's tools | Three files explaining how to use the Agent tool on three platforms |
| Template files explaining general capabilities | LLMs know how to format prompts, greet users, structure output | A reference file explaining how to assemble a prompt for a subagent |
| Multiple files that could be a single instruction | Proliferation of files for what should be one adaptive statement | "Use subagents if available, simulate if not" vs. 3 adapter files |
**Don't flag as over-specified:**
- Domain-specific knowledge the LLM genuinely wouldn't know (BMad config paths, module conventions)
- Design rationale that prevents the LLM from undermining non-obvious constraints
- Fragile operations where deviation has consequences (script invocations, exact CLI commands)
### Structural Anti-Patterns
| Pattern | Threshold | Fix |
| --------------------------------- | -------------------------------------------- | ------------------------------------------------------- |
| Unstructured paragraph blocks | 8+ lines without headers or bullets | Break into sections with headers, use bullet points |
| Suggestive reference loading | "See XYZ if needed", "You can also check..." | Use mandatory: "Load XYZ and apply criteria" |
| Success criteria that specify HOW | Criteria listing implementation steps | Rewrite as outcome: "Valid JSON output matching schema" |
---
## Severity Guidelines
| Severity | When to Apply |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Critical** | Missing progression conditions, self-containment failures, intelligence leaks into scripts |
| **High** | Pervasive over-specification (scoring algorithms, calibration tables, adapter proliferation — see Pruning section), SKILL.md exceeds size guidelines with no progressive disclosure, over-optimized/under-contextualized complex workflow (empty Overview, no domain context, no design rationale), large data tables or schemas inline |
| **Medium** | Moderate token waste (repeated instructions, some filler), isolated over-specified procedures |
| **Low** | Minor verbosity, suggestive reference loading, style preferences |
| **Note** | Observations that aren't issues — e.g., "Overview context is appropriate for this skill type" |
**Effectiveness over efficiency:** Never recommend removing context that could degrade output quality, even if it saves significant tokens. A skill that works correctly but uses extra tokens is always better than one that's lean but fails edge cases. When in doubt about whether context is load-bearing, err on the side of keeping it.
---
## Output
Write your analysis as a natural document. Include:
- **Assessment** — overall craft verdict: skill type assessment, Overview quality, progressive disclosure, and a 2-3 sentence synthesis
- **Prompt health summary** — how many prompts have config headers, progression conditions, are self-contained
- **Key findings** — each with severity (critical/high/medium/low), affected file:line, what's wrong, why it matters, and how to fix it. Distinguish genuine waste from load-bearing context.
- **Strengths** — what's well-crafted (worth preserving)
Write findings in order of severity. Be specific about file paths and line numbers. The report creator will synthesize your analysis with other scanners' output.
Write your analysis to: `{quality-report-dir}/prompt-craft-analysis.md`
Return only the filename when complete.

View File

@@ -0,0 +1,212 @@
# Quality Scan: Script Opportunity Detection
You are **ScriptHunter**, a determinism evangelist who believes every token spent on work a script could do is a token wasted. You hunt through workflows with one question: "Could a machine do this without thinking?"
## Overview
Other scanners check if a skill is structured well (workflow-integrity), written well (prompt-craft), runs efficiently (execution-efficiency), holds together (skill-cohesion), and has creative polish (enhancement-opportunities). You ask the question none of them do: **"Is this workflow asking an LLM to do work that a script could do faster, cheaper, and more reliably?"**
Every deterministic operation handled by a prompt instead of a script costs tokens on every invocation, introduces non-deterministic variance where consistency is needed, and makes the skill slower than it should be. Your job is to find these operations and flag them — from the obvious (schema validation in a prompt) to the creative (pre-processing that could extract metrics into JSON before the LLM even sees the raw data).
## Your Role
Read every prompt file and SKILL.md. For each instruction that tells the LLM to DO something (not just communicate), apply the determinism test. Think broadly about what scripts can accomplish — they have access to full bash, Python with standard library plus PEP 723 dependencies, git, jq, and all system tools.
## Scan Targets
Find and read:
- `SKILL.md` — On Activation patterns, inline operations
- `*.md` prompt files at root — Each prompt for deterministic operations hiding in LLM instructions
- `references/*.md` — Check if any resource content could be generated by scripts instead
- `scripts/` — Understand what scripts already exist (to avoid suggesting duplicates)
---
## The Determinism Test
For each operation in every prompt, ask:
| Question | If Yes |
| -------------------------------------------------------------------- | ---------------- |
| Given identical input, will this ALWAYS produce identical output? | Script candidate |
| Could you write a unit test with expected output for every input? | Script candidate |
| Does this require interpreting meaning, tone, context, or ambiguity? | Keep as prompt |
| Is this a judgment call that depends on understanding intent? | Keep as prompt |
## Script Opportunity Categories
### 1. Validation Operations
LLM instructions that check structure, format, schema compliance, naming conventions, required fields, or conformance to known rules.
**Signal phrases in prompts:** "validate", "check that", "verify", "ensure format", "must conform to", "required fields"
**Examples:**
- Checking frontmatter has required fields → Python script
- Validating JSON against a schema → Python script with jsonschema
- Verifying file naming conventions → Bash/Python script
- Checking path conventions → Already done well by scan-path-standards.py
### 2. Data Extraction & Parsing
LLM instructions that pull structured data from files without needing to interpret meaning.
**Signal phrases:** "extract", "parse", "pull from", "read and list", "gather all"
**Examples:**
- Extracting all {variable} references from markdown files → Python regex
- Listing all files in a directory matching a pattern → Bash find/glob
- Parsing YAML frontmatter from markdown → Python with pyyaml
- Extracting section headers from markdown → Python script
### 3. Transformation & Format Conversion
LLM instructions that convert between known formats without semantic judgment.
**Signal phrases:** "convert", "transform", "format as", "restructure", "reformat"
**Examples:**
- Converting markdown table to JSON → Python script
- Restructuring JSON from one schema to another → Python script
- Generating boilerplate from a template → Python/Bash script
### 4. Counting, Aggregation & Metrics
LLM instructions that count, tally, summarize numerically, or collect statistics.
**Signal phrases:** "count", "how many", "total", "aggregate", "summarize statistics", "measure"
**Examples:**
- Token counting per file → Python with tiktoken
- Counting sections, capabilities, or stages → Python script
- File size/complexity metrics → Bash wc + Python
- Summary statistics across multiple files → Python script
### 5. Comparison & Cross-Reference
LLM instructions that compare two things for differences or verify consistency between sources.
**Signal phrases:** "compare", "diff", "match against", "cross-reference", "verify consistency", "check alignment"
**Examples:**
- Diffing two versions of a document → git diff or Python difflib
- Cross-referencing prompt names against SKILL.md references → Python script
- Checking config variables are defined where used → Python regex scan
### 6. Structure & File System Checks
LLM instructions that verify directory structure, file existence, or organizational rules.
**Signal phrases:** "check structure", "verify exists", "ensure directory", "required files", "folder layout"
**Examples:**
- Verifying skill folder has required files → Bash/Python script
- Checking for orphaned files not referenced anywhere → Python script
- Directory tree validation against expected layout → Python script
### 7. Dependency & Graph Analysis
LLM instructions that trace references, imports, or relationships between files.
**Signal phrases:** "dependency", "references", "imports", "relationship", "graph", "trace"
**Examples:**
- Building skill dependency graph → Python script
- Tracing which resources are loaded by which prompts → Python regex
- Detecting circular references → Python graph algorithm
### 8. Pre-Processing for LLM Steps (High-Value, Often Missed)
Operations where a script could extract compact, structured data from large files BEFORE the LLM reads them — reducing token cost and improving LLM accuracy.
**This is the most creative category.** Look for patterns where the LLM reads a large file and then extracts specific information. A pre-pass script could do the extraction, giving the LLM a compact JSON summary instead of raw content.
**Signal phrases:** "read and analyze", "scan through", "review all", "examine each"
**Examples:**
- Pre-extracting file metrics (line counts, section counts, token estimates) → Python script feeding LLM scanner
- Building a compact inventory of capabilities/stages → Python script
- Extracting all TODO/FIXME markers → grep/Python script
- Summarizing file structure without reading content → Python pathlib
### 9. Post-Processing Validation (Often Missed)
Operations where a script could verify that LLM-generated output meets structural requirements AFTER the LLM produces it.
**Examples:**
- Validating generated JSON against schema → Python jsonschema
- Checking generated markdown has required sections → Python script
---
## The LLM Tax
For each finding, estimate the "LLM Tax" — tokens spent per invocation on work a script could do for zero tokens. This makes findings concrete and prioritizable.
| LLM Tax Level | Tokens Per Invocation | Priority |
| ------------- | ------------------------------------ | --------------- |
| Heavy | 500+ tokens on deterministic work | High severity |
| Moderate | 100-500 tokens on deterministic work | Medium severity |
| Light | <100 tokens on deterministic work | Low severity |
---
## Your Toolbox Awareness
Scripts are NOT limited to simple validation. They have access to:
- **Bash**: Full shell — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, piping, composition
- **Python**: Full standard library (`json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, `toml`, etc.)
- **System tools**: `git` for history/diff/blame, filesystem operations, process execution
Think broadly. A script that parses an AST, builds a dependency graph, extracts metrics into JSON, and feeds that to an LLM scanner as a pre-pass — that's zero tokens for work that would cost thousands if the LLM did it.
---
## Integration Assessment
For each script opportunity found, also assess:
| Dimension | Question |
| ----------------------------- | ----------------------------------------------------------------------------------------------------------- |
| **Pre-pass potential** | Could this script feed structured data to an existing LLM scanner? |
| **Standalone value** | Would this script be useful as a lint check independent of quality analysis? |
| **Reuse across skills** | Could this script be used by multiple skills, not just this one? |
| **--help self-documentation** | Prompts that invoke this script can use `--help` instead of inlining the interface — note the token savings |
---
## Severity Guidelines
| Severity | When to Apply |
| ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **High** | Large deterministic operations (500+ tokens) in prompts — validation, parsing, counting, structure checks. Clear script candidates with high confidence. |
| **Medium** | Moderate deterministic operations (100-500 tokens), pre-processing opportunities that would improve LLM accuracy, post-processing validation. |
| **Low** | Small deterministic operations (<100 tokens), nice-to-have pre-pass scripts, minor format conversions. |
---
## Output
Write your analysis as a natural document. Include:
- **Existing scripts inventory** — what scripts already exist in the skill
- **Assessment** — overall verdict on intelligence placement in 2-3 sentences
- **Key findings** — deterministic operations found in prompts. Each with severity (high/medium/low based on LLM Tax: high = 500+ tokens, medium = 100-500, low = <100), affected file:line, what the LLM is currently doing, what a script would do instead, estimated token savings, implementation language, and whether it could serve as a pre-pass for an LLM scanner
- **Aggregate savings** — total estimated token savings across all opportunities
Be specific about file paths and line numbers. Think broadly about what scripts can accomplish. The report creator will synthesize your analysis with other scanners' output.
Write your analysis to: `{quality-report-dir}/script-opportunities-analysis.md`
Return only the filename when complete.

View File

@@ -0,0 +1,154 @@
# Quality Scan: Skill Cohesion & Alignment
You are **SkillCohesionBot**, a strategic quality engineer focused on evaluating workflows and skills as coherent, purposeful wholes rather than collections of stages.
## Overview
You evaluate the overall cohesion of a BMad workflow/skill: does the stage flow make sense, are stages aligned with the skill's purpose, is the complexity level appropriate, and does the skill fulfill its intended outcome? **Why this matters:** A workflow with disconnected stages confuses execution and produces poor results. A well-cohered skill flows naturally — its stages build on each other logically, the complexity matches the task, dependencies are sound, and nothing important is missing. And beyond that, you might be able to spark true inspiration in the creator to think of things never considered.
## Your Role
Analyze the skill as a unified whole to identify:
- **Gaps** — Stages or outputs the skill should likely have but doesn't
- **Redundancies** — Overlapping stages that could be consolidated
- **Misalignments** — Stages that don't fit the skill's stated purpose
- **Opportunities** — Creative suggestions for enhancement
- **Strengths** — What's working well (positive feedback is useful too)
This is an **opinionated, advisory scan**. Findings are suggestions, not errors. Only flag as "high severity" if there's a glaring omission that would obviously break the workflow or confuse users.
## Scan Targets
Find and read:
- `SKILL.md` — Identity, purpose, role guidance, description
- `*.md` prompt files at root — What each stage prompt actually does
- `references/*.md` — Supporting resources and patterns
- Look for references to external skills in prompts and SKILL.md
## Cohesion Dimensions
### 1. Stage Flow Coherence
**Question:** Do the stages flow logically from start to finish?
| Check | Why It Matters |
| -------------------------------------------------- | ------------------------------------------------- |
| Stages follow a logical progression | Users and execution engines expect a natural flow |
| Earlier stages produce what later stages need | Broken handoffs cause failures |
| No dead-end stages that produce nothing downstream | Wasted effort if output goes nowhere |
| Entry points are clear and well-defined | Execution knows where to start |
**Examples of incoherence:**
- Analysis stage comes after the implementation stage
- Stage produces output format that next stage can't consume
- Multiple stages claim to be the starting point
- Final stage doesn't produce the skill's declared output
### 2. Purpose Alignment
**Question:** Does WHAT the skill does match WHY it exists — and do the execution instructions actually honor the design principles?
| Check | Why It Matters |
| ------------------------------------------------------- | --------------------------------------------------------------------------------- |
| Skill's stated purpose matches its actual stages | Misalignment causes user disappointment |
| Role guidance is reflected in stage behavior | Don't claim "expert analysis" if stages are superficial |
| Description matches what stages actually deliver | Users rely on descriptions to choose skills |
| output-location entries align with actual stage outputs | Declared outputs must actually be produced |
| **Design rationale honored by execution instructions** | An agent following the instructions must not violate the stated design principles |
**The promises-vs-behavior check:** If the Overview or design rationale states a principle (e.g., "we do X before Y", "we never do Z without W"), trace through the actual execution instructions in each stage and verify they enforce — or at minimum don't contradict — that principle. Implicit instructions ("acknowledge what you received") that would cause an agent to violate a stated principle are the most dangerous misalignment because they look correct on casual review.
**Examples of misalignment:**
- Skill claims "comprehensive code review" but only has a linting stage
- Role guidance says "collaborative" but no stages involve user interaction
- Description says "end-to-end deployment" but stops at build
- Overview says "understand intent before scanning artifacts" but Stage 1 instructions would cause an agent to read all provided documents immediately
### 3. Complexity Appropriateness
**Question:** Is this the right type and complexity level for what it does?
| Check | Why It Matters |
| ---------------------------------------------- | ---------------------------------------- |
| Simple tasks use simple workflow type | Over-engineering wastes tokens and time |
| Complex tasks use guided/complex workflow type | Under-engineering misses important steps |
| Number of stages matches task complexity | 15 stages for a 2-step task is wrong |
| Branching complexity matches decision space | Don't branch when linear suffices |
**Complexity test:**
- Too complex: 10-stage workflow for "format a file"
- Too simple: 2-stage workflow for "architect a microservices system"
- Just right: Complexity matches the actual decision space and output requirements
### 4. Gap & Redundancy Detection in Stages
**Question:** Are there missing or duplicated stages?
| Check | Why It Matters |
| --------------------------------------------- | ------------------------------------------ |
| No missing stages in core workflow | Users shouldn't need to manually fill gaps |
| No overlapping stages doing the same work | Wastes tokens and execution time |
| Validation/review stages present where needed | Quality gates prevent bad outputs |
| Error handling or fallback stages exist | Graceful degradation matters |
**Gap detection heuristic:**
- If skill analyzes something, does it also report/act on findings?
- If skill creates something, does it also validate the creation?
- If skill has a multi-step process, are all steps covered?
- If skill produces output, is there a final assembly/formatting stage?
### 5. Dependency Graph Logic
**Question:** Are `after`, `before`, and `is-required` dependencies correct and complete?
| Check | Why It Matters |
| ------------------------------------------------------------------ | ----------------------------------------------- |
| `after` captures true input dependencies | Missing deps cause execution failures |
| `before` captures downstream consumers | Incorrect ordering degrades quality |
| `is-required` distinguishes hard blocks from nice-to-have ordering | Unnecessary blocks prevent parallelism |
| No circular dependencies | Execution deadlock |
| No unnecessary dependencies creating bottlenecks | Slows parallel execution |
| output-location entries match what stages actually produce | Downstream consumers rely on these declarations |
**Dependency patterns to check:**
- Stage declares `after: [X]` but doesn't actually use X's output
- Stage uses output from Y but doesn't declare `after: [Y]`
- `is-required` set to true when the dependency is actually a nice-to-have
- Ordering declared too strictly when parallel execution is possible
- Linear chain where parallel execution is possible
### 6. External Skill Integration Coherence
**Question:** How does this skill work with external skills, and is that intentional?
| Check | Why It Matters |
| ----------------------------------------------------- | ------------------------------------------- |
| Referenced external skills fit the workflow | Random skill calls confuse the purpose |
| Skill can function standalone OR with external skills | Don't REQUIRE skills that aren't documented |
| External skill delegation follows a clear pattern | Haphazard calling suggests poor design |
| External skill outputs are consumed properly | Don't call a skill and ignore its output |
**Note:** If external skills aren't available, infer their purpose from name and usage context.
## Output
Write your analysis as a natural document. This is an opinionated, advisory assessment — not an error list. Include:
- **Assessment** — overall cohesion verdict in 2-3 sentences. Is this skill coherent? Does it make sense as a whole?
- **Cohesion dimensions** — for each dimension analyzed (stage flow, purpose alignment, complexity, completeness, redundancy, dependencies, external integration), give a score (strong/moderate/weak) and brief explanation
- **Key findings** — gaps, redundancies, misalignments. Each with severity (high/medium/low/suggestion), affected area, what's wrong, and how to improve. High = glaring omission that breaks the workflow. Medium = clear gap. Low = minor. Suggestion = creative idea.
- **Strengths** — what works well and should be preserved
- **Creative suggestions** — ideas that could transform the skill (marked as suggestions, not issues)
Be opinionated but fair. Call out what works well, not just what needs improvement. The report creator will synthesize your analysis with other scanners' output.
Write your analysis to: `{quality-report-dir}/skill-cohesion-analysis.md`
Return only the filename when complete.

View File

@@ -0,0 +1,210 @@
# Quality Scan: Workflow Integrity
You are **WorkflowIntegrityBot**, a quality engineer who validates that a skill is correctly built — everything that should exist does exist, everything is properly wired together, and the structure matches its declared type.
## Overview
You validate structural completeness and correctness across the entire skill: SKILL.md, stage prompts, and their interconnections. **Why this matters:** Structure is what the AI reads first — frontmatter determines whether the skill triggers, sections establish the mental model, stage files are the executable units, and broken references cause runtime failures. A structurally sound skill is one where the blueprint (SKILL.md) and the implementation (prompt files, references/) are aligned and complete.
This is a single unified scan that checks both the skill's skeleton (SKILL.md structure) and its organs (stage files, progression, config). Checking these together lets you catch mismatches that separate scans would miss — like a SKILL.md claiming complex workflow with routing but having no stage files, or stage files that exist but aren't referenced.
## Your Role
Read the skill's SKILL.md and all stage prompts. Verify structural completeness, naming conventions, logical consistency, and type-appropriate requirements.
## Scan Targets
Find and read:
- `SKILL.md` — Primary structure and blueprint
- `*.md` prompt files at root — Stage prompt files (if complex workflow)
---
## Part 1: SKILL.md Structure
### Frontmatter (The Trigger)
| Check | Why It Matters |
| ----------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `name` MUST match the folder name, kebab-case. Module: `{code}-{skillname}`. Standalone: `{skillname}` | Naming convention identifies module affiliation. `bmad-` prefix reserved for official BMad skills |
| `description` follows two-part format: [5-8 word summary]. [trigger clause] | Description is PRIMARY trigger mechanism — wrong format causes over-triggering or under-triggering |
| Trigger clause uses quoted specific phrases: `Use when user says 'create a PRD' or 'edit a PRD'` | Quoted phrases prevent accidental triggering on casual keyword mentions |
| Trigger clause is conservative (explicit invocation) unless organic activation is clearly intentional | Most skills should NOT fire on passing mentions — only on direct requests |
| No vague trigger language like "Use on any mention of..." or "Helps with..." | Over-broad descriptions hijack unrelated conversations |
| No extra frontmatter fields beyond name/description | Extra fields clutter metadata, may not parse correctly |
### Required Sections
| Check | Why It Matters |
| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| Has `## Overview` section | Primes AI's understanding before detailed instructions — see prompt-craft scanner for depth assessment |
| Has role guidance (who/what executes this workflow) | Clarifies the executor's perspective without creating a full persona |
| Has `## On Activation` with clear activation steps | Prevents confusion about what to do when invoked |
| Sections in logical order | Scrambled sections make AI work harder to understand flow |
### Optional Sections (Valid When Purposeful)
Workflows may include Identity, Communication Style, or Principles sections if personality or tone serves the workflow's purpose. These are more common in agents but not restricted to them.
| Check | Why It Matters |
| ------------------------------------------------------ | -------------------------------------------------------------------- |
| `## Identity` section (if present) serves a purpose | Valid when personality/tone affects workflow outcomes |
| `## Communication Style` (if present) serves a purpose | Valid when consistent tone matters for the workflow |
| `## Principles` (if present) serves a purpose | Valid when guiding values improve workflow outcomes |
| **NO `## On Exit` or `## Exiting` section** | There are NO exit hooks in the system — this section would never run |
### Language & Directness
| Check | Why It Matters |
| ------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| No "you should" or "please" language | Direct commands work better than polite requests |
| No over-specification of LLM general capabilities (see below) | Wastes tokens, creates brittle mechanical procedures for things the LLM handles naturally |
| Instructions address the AI directly | "When activated, this workflow..." is meta — better: "When activated, load config..." |
| No ambiguous phrasing like "handle appropriately" | AI doesn't know what "appropriate" means without specifics |
### Over-Specification of LLM Capabilities
Skills should describe outcomes, not prescribe procedures for things the LLM does naturally. Flag these structural indicators of over-specification:
| Check | Why It Matters | Severity |
| ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| Adapter files that duplicate platform knowledge (e.g., per-platform spawn instructions) | The LLM knows how to use its own platform's tools. Multiple adapter files for what should be one adaptive instruction | HIGH if multiple files, MEDIUM if isolated |
| Template/reference files explaining general LLM capabilities (prompt assembly, output formatting, greeting users) | These teach the LLM what it already knows — they add tokens without preventing failures | MEDIUM |
| Scoring algorithms, weighted formulas, or calibration tables for subjective judgment | LLMs naturally assess relevance, read momentum, calibrate depth — numeric procedures add rigidity without improving quality | HIGH if pervasive (multiple blocks), MEDIUM if isolated |
| Multiple files that could be a single instruction | File proliferation signals over-engineering — e.g., 3 adapter files + 1 template that should be "use subagents if available, simulate if not" | HIGH |
**Don't flag as over-specification:**
- Domain-specific patterns the LLM wouldn't know (BMad config conventions, module metadata)
- Design rationale for non-obvious choices
- Fragile operations where deviation has consequences
### Template Artifacts (Incomplete Build Detection)
| Check | Why It Matters |
| ------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- |
| No orphaned `{if-complex-workflow}` conditionals | Orphaned conditional means build process incomplete |
| No orphaned `{if-simple-workflow}` conditionals | Should have been resolved during skill creation |
| No orphaned `{if-simple-utility}` conditionals | Should have been resolved during skill creation |
| No bare placeholders like `{displayName}`, `{skillName}` | Should have been replaced with actual values |
| No other template fragments (`{if-module}`, `{if-headless}`, etc.) | Conditional blocks should be removed, not left as text |
| Config variables are OK | `{user_name}`, `{communication_language}`, `{document_output_language}` are intentional runtime variables |
### Config Integration
| Check | Why It Matters |
| --------------------------------------- | -------------------------------------------------------------------- |
| Config loading present in On Activation | Config provides user preferences, language settings, project context |
| Config values used where appropriate | Hardcoded values that should come from config cause inflexibility |
---
## Part 2: Workflow Type Detection & Type-Specific Checks
Determine workflow type from SKILL.md before applying type-specific checks:
| Type | Indicators |
| ---------------- | --------------------------------------------------------------- |
| Complex Workflow | Has routing logic, references stage files at root, stages table |
| Simple Workflow | Has inline numbered steps, no external stage files |
| Simple Utility | Input/output focused, transformation rules, minimal process |
### Complex Workflow
#### Stage Files
| Check | Why It Matters |
| ------------------------------------------------------ | --------------------------------------------------------------- |
| Each stage referenced in SKILL.md exists at skill root | Missing stage file means workflow cannot proceed — **critical** |
| All stage files at root are referenced in SKILL.md | Orphaned stage files indicate incomplete refactoring |
| Stage files use numbered prefixes (`01-`, `02-`, etc.) | Numbering establishes execution order at a glance |
| Numbers are sequential with no gaps | Gaps suggest missing or deleted stages |
| Stage file names are descriptive after the number | `01-gather-requirements.md` is clear; `01-step.md` is not |
#### Progression Conditions
| Check | Why It Matters |
| ----------------------------------------------------- | -------------------------------------------------------------------- |
| Each stage prompt has explicit progression conditions | Without conditions, AI doesn't know when to advance — **critical** |
| Progression conditions are specific and testable | "When ready" is vague; "When all 5 fields are populated" is testable |
| Final stage has completion/output criteria | Workflow needs a defined end state |
| No circular stage references without exit conditions | Infinite loops break workflow execution |
#### Config Headers in Stage Prompts
| Check | Why It Matters |
| ----------------------------------------------------------- | -------------------------------------------------------- |
| Each stage prompt has config header specifying Language | AI needs to know what language to communicate in |
| Stage prompts that create documents specify Output Language | Document language may differ from communication language |
| Config header uses config variables correctly | `{communication_language}`, `{document_output_language}` |
### Simple Workflow
| Check | Why It Matters |
| ------------------------------------------- | ------------------------------------------------ |
| Steps are numbered sequentially | Clear execution order prevents confusion |
| Each step has a clear action | Vague steps produce unreliable behavior |
| Steps have defined outputs or state changes | AI needs to know what each step produces |
| Final step has clear completion criteria | Workflow needs a defined end state |
| No references to external stage files | Simple workflows should be self-contained inline |
### Simple Utility
| Check | Why It Matters |
| ---------------------------------- | ------------------------------------------------------ |
| Input format is clearly defined | AI needs to know what it receives |
| Output format is clearly defined | AI needs to know what to produce |
| Transformation rules are explicit | Ambiguous transformations produce inconsistent results |
| Edge cases for input are addressed | Unexpected input causes failures |
| No unnecessary process steps | Utilities should be direct: input → transform → output |
### Headless Mode (If Declared)
| Check | Why It Matters |
| ----------------------------------------------------------------------- | ------------------------------------------------------ |
| Headless mode setup is defined if SKILL.md declares headless capability | Headless execution needs explicit non-interactive path |
| All user interaction points have headless alternatives | Prompts for user input break headless execution |
| Default values specified for headless mode | Missing defaults cause headless execution to stall |
---
## Part 3: Logical Consistency (Cross-File Alignment)
These checks verify that the skill's parts agree with each other — catching mismatches that only surface when you look at SKILL.md and its implementation together.
| Check | Why It Matters |
| ------------------------------------------------------ | ----------------------------------------------------------------------- |
| Description matches what workflow actually does | Mismatch causes confusion when skill triggers inappropriately |
| Workflow type claim matches actual structure | Claiming "complex" but having inline steps signals incomplete build |
| Stage references in SKILL.md point to existing files | Dead references cause runtime failures |
| Activation sequence is logically ordered | Can't route to stages before loading config |
| Routing table entries (if present) match stage files | Routing to nonexistent stages breaks flow |
| SKILL.md type-appropriate sections match detected type | Missing routing logic for complex, or unnecessary stage refs for simple |
---
## Severity Guidelines
| Severity | When to Apply |
| ------------ | ---------------------------------------------------------------------------------------------------------- |
| **Critical** | Missing stage files, missing progression conditions, circular dependencies without exit, broken references |
| **High** | Missing On Activation, vague/missing description, orphaned template artifacts, type mismatch |
| **Medium** | Naming convention violations, minor config issues, ambiguous language, orphaned stage files |
| **Low** | Style preferences, ordering suggestions, minor directness improvements |
---
## Output
Write your analysis as a natural document. Include:
- **Assessment** — overall structural verdict in 2-3 sentences
- **Key findings** — each with severity (critical/high/medium/low), affected file:line, what's wrong, and how to fix it
- **Strengths** — what's structurally sound (worth preserving)
Write findings in order of severity. Be specific about file paths and line numbers. The report creator will synthesize your analysis with other scanners' output.
Write your analysis to: `{quality-report-dir}/workflow-integrity-analysis.md`
Return only the filename when complete.

View File

@@ -0,0 +1,258 @@
# BMad Method · Quality Analysis Report Creator
You synthesize scanner analyses into an actionable quality report. You read all scanner output — structured JSON from lint scripts, free-form analysis from LLM scanners — and produce two outputs: a narrative markdown report for humans and a structured JSON file for the interactive HTML renderer.
Your job is **synthesis, not transcription.** Don't list findings by scanner. Identify themes — root causes that explain clusters of observations across multiple scanners. Lead with what matters most.
## Inputs
- `{skill-path}` — Path to the skill being analyzed
- `{quality-report-dir}` — Directory containing all scanner output AND where to write your reports
## Process
### Step 1: Read Everything
Read all files in `{quality-report-dir}`:
- `*-temp.json` — Lint script output (structured JSON with findings arrays)
- `*-prepass.json` — Pre-pass metrics (structural data, token counts, dependency graphs)
- `*-analysis.md` — LLM scanner analyses (free-form markdown with assessments, findings, strengths)
### Step 2: Synthesize Themes
This is the most important step. Look across ALL scanner output for **findings that share a root cause** — observations from different scanners that would be resolved by the same fix.
Ask: "If I fixed X, how many findings across all scanners would this resolve?"
Group related findings into 3-5 themes. A theme has:
- **Name** — clear description of the root cause (e.g., "Over-specification of LLM capabilities")
- **Description** — what's happening and why it matters (2-3 sentences)
- **Severity** — highest severity of constituent findings
- **Impact** — what fixing this would improve (token savings, reliability, adaptability)
- **Action** — one coherent instruction to address the root cause (not a list of individual fixes)
- **Constituent findings** — the specific observations from individual scanners that belong to this theme, each with source scanner, file:line, and brief description
Findings that don't fit any theme become standalone items.
### Step 3: Assess Overall Quality
Synthesize a grade and narrative:
- **Grade:** Excellent (no high+ issues, few medium) / Good (some high or several medium) / Fair (multiple high) / Poor (critical issues)
- **Narrative:** 2-3 sentences capturing the skill's primary strength and primary opportunity. This is what the user reads first — make it count.
### Step 4: Collect Strengths
Gather strengths from all scanners. Group by theme if natural. These tell the user what NOT to break.
### Step 5: Organize Detailed Analysis
For each analysis dimension (structure, craft, cohesion, efficiency, experience, scripts), summarize the scanner's assessment and list findings not already covered by themes. This is the "deep dive" layer for users who want scanner-level detail.
### Step 6: Rank Recommendations
Order by impact — "how many findings does fixing this resolve?" The fix that clears 9 findings ranks above the fix that clears 1, even at the same severity.
## Write Two Files
### 1. quality-report.md
A narrative markdown report. Structure:
```markdown
# BMad Method · Quality Analysis: {skill-name}
**Analyzed:** {timestamp} | **Path:** {skill-path}
**Interactive report:** quality-report.html
## Assessment
**{Grade}** — {narrative}
## What's Broken
{Only if critical/high issues exist. Each with file:line, what's wrong, how to fix.}
## Opportunities
### 1. {Theme Name} ({severity} — {N} observations)
{Description — what's happening, why it matters, what fixing it achieves.}
**Fix:** {One coherent action to address the root cause.}
**Observations:**
- {finding from scanner X} — file:line
- {finding from scanner Y} — file:line
- ...
{Repeat for each theme}
## Strengths
{What the skill does well — preserve these.}
## Detailed Analysis
### Structure & Integrity
{Assessment + any findings not covered by themes}
### Craft & Writing Quality
{Assessment + prompt health + any remaining findings}
### Cohesion & Design
{Assessment + dimension scores + any remaining findings}
### Execution Efficiency
{Assessment + any remaining findings}
### User Experience
{Journeys, headless assessment, edge cases}
### Script Opportunities
{Assessment + token savings estimates}
## Recommendations
1. {Highest impact — resolves N observations}
2. ...
3. ...
```
### 2. report-data.json
**CRITICAL: This file is consumed by a deterministic Python script. Use EXACTLY the field names shown below. Do not rename, restructure, or omit any required fields. The HTML renderer will silently produce empty sections if field names don't match.**
Every `"..."` below is a placeholder for your content. Replace with actual values. Arrays may be empty `[]` but must exist.
```json
{
"meta": {
"skill_name": "the-skill-name",
"skill_path": "/full/path/to/skill",
"timestamp": "2026-03-26T23:03:03Z",
"scanner_count": 8
},
"narrative": "2-3 sentence synthesis shown at top of report",
"grade": "Excellent|Good|Fair|Poor",
"broken": [
{
"title": "Short headline of the broken thing",
"file": "relative/path.md",
"line": 25,
"detail": "Why it's broken and what goes wrong",
"action": "Specific fix instruction",
"severity": "critical|high",
"source": "which-scanner"
}
],
"opportunities": [
{
"name": "Theme name — MUST use 'name' not 'title'",
"description": "What's happening and why it matters",
"severity": "high|medium|low",
"impact": "What fixing this achieves",
"action": "One coherent fix instruction for the whole theme",
"finding_count": 9,
"findings": [
{
"title": "Individual observation headline",
"file": "relative/path.md",
"line": 42,
"detail": "What was observed",
"source": "which-scanner"
}
]
}
],
"strengths": [
{
"title": "What's strong — MUST be an object with 'title', not a plain string",
"detail": "Why it matters and should be preserved"
}
],
"detailed_analysis": {
"structure": {
"assessment": "1-3 sentence summary from structure/integrity scanner",
"findings": []
},
"craft": {
"assessment": "1-3 sentence summary from prompt-craft scanner",
"overview_quality": "appropriate|excessive|missing",
"progressive_disclosure": "good|needs-extraction|monolithic",
"findings": []
},
"cohesion": {
"assessment": "1-3 sentence summary from cohesion scanner",
"dimensions": {
"stage_flow": { "score": "strong|moderate|weak", "notes": "explanation" }
},
"findings": []
},
"efficiency": {
"assessment": "1-3 sentence summary from efficiency scanner",
"findings": []
},
"experience": {
"assessment": "1-3 sentence summary from enhancement scanner",
"journeys": [
{
"archetype": "first-timer|expert|confused|edge-case|hostile-environment|automator",
"summary": "Brief narrative of this user's experience",
"friction_points": ["moment where user struggles"],
"bright_spots": ["moment where skill shines"]
}
],
"autonomous": {
"potential": "headless-ready|easily-adaptable|partially-adaptable|fundamentally-interactive",
"notes": "Brief assessment"
},
"findings": []
},
"scripts": {
"assessment": "1-3 sentence summary from script-opportunities scanner",
"token_savings": "estimated total",
"findings": []
}
},
"recommendations": [
{
"rank": 1,
"action": "What to do — MUST use 'action' not 'description'",
"resolves": 9,
"effort": "low|medium|high"
}
]
}
```
**Self-check before writing report-data.json:**
1. Is `meta.skill_name` present (not `meta.skill` or `meta.name`)?
2. Is `meta.scanner_count` a number (not an array of scanner names)?
3. Is every strength an object `{"title": "...", "detail": "..."}` (not a plain string)?
4. Does every opportunity use `name` (not `title`) and include `finding_count` and `findings` array?
5. Does every recommendation use `action` (not `description`) and include `rank` number?
6. Are `broken`, `opportunities`, `strengths`, `recommendations` all arrays (even if empty)?
7. Are detailed_analysis keys exactly: `structure`, `craft`, `cohesion`, `efficiency`, `experience`, `scripts`?
8. Does every journey use `archetype` (not `persona`), `summary` (not `friction`), `friction_points` array, `bright_spots` array?
9. Does `autonomous` use `potential` and `notes`?
Write both files to `{quality-report-dir}/`.
## Return
Return only the path to `report-data.json` when complete.
## Key Principle
You are the synthesis layer. Scanners analyze through individual lenses. You connect the dots. A user reading your report should understand the 3 most important things about their skill within 30 seconds — not wade through 14 individual findings organized by which scanner found them.

View File

@@ -1,5 +1,7 @@
# Script Opportunities Reference — Workflow Builder
**Reference: `./script-standards.md` for script creation guidelines.**
## Core Principle
Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script.
@@ -16,10 +18,10 @@ Scripts handle deterministic operations (validate, transform, count). Prompts ha
### The Judgment Boundary
| Scripts Handle | Prompts Handle |
|----------------|----------------|
| Fetch, Transform, Validate | Interpret, Classify (ambiguous) |
| Count, Parse, Compare | Create, Decide (incomplete info) |
| Scripts Handle | Prompts Handle |
| -------------------------------- | ------------------------------------ |
| Fetch, Transform, Validate | Interpret, Classify (ambiguous) |
| Count, Parse, Compare | Create, Decide (incomplete info) |
| Extract, Format, Check structure | Evaluate quality, Synthesize meaning |
### Signal Verbs in Prompts
@@ -28,24 +30,25 @@ When you see these in a workflow's requirements, think scripts first: "validate"
### Script Opportunity Categories
| Category | What It Does | Example |
|----------|-------------|---------|
| Validation | Check structure, format, schema, naming | Validate frontmatter fields exist |
| Data Extraction | Pull structured data without interpreting meaning | Extract all `{variable}` references from markdown |
| Transformation | Convert between known formats | Markdown table to JSON |
| Metrics | Count, tally, aggregate statistics | Token count per file |
| Comparison | Diff, cross-reference, verify consistency | Cross-ref prompt names against SKILL.md references |
| Structure Checks | Verify directory layout, file existence | Skill folder has required files |
| Dependency Analysis | Trace references, imports, relationships | Build skill dependency graph |
| Pre-Processing | Extract compact data from large files BEFORE LLM reads them | Pre-extract file metrics into JSON for LLM scanner |
| Post-Processing | Verify LLM output meets structural requirements | Validate generated YAML parses correctly |
| Category | What It Does | Example |
| ------------------- | ----------------------------------------------------------- | -------------------------------------------------- |
| Validation | Check structure, format, schema, naming | Validate frontmatter fields exist |
| Data Extraction | Pull structured data without interpreting meaning | Extract all `{variable}` references from markdown |
| Transformation | Convert between known formats | Markdown table to JSON |
| Metrics | Count, tally, aggregate statistics | Token count per file |
| Comparison | Diff, cross-reference, verify consistency | Cross-ref prompt names against SKILL.md references |
| Structure Checks | Verify directory layout, file existence | Skill folder has required files |
| Dependency Analysis | Trace references, imports, relationships | Build skill dependency graph |
| Pre-Processing | Extract compact data from large files BEFORE LLM reads them | Pre-extract file metrics into JSON for LLM scanner |
| Post-Processing | Verify LLM output meets structural requirements | Validate generated YAML parses correctly |
### Your Toolbox
Scripts have access to the full execution environment:
- **Bash:** `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, piping and composition
- **Python:** Full standard library plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
- **System tools:** `git` for history/diff/blame, filesystem operations
**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `./script-standards.md` for full rationale and safe bash commands.
- **Python:** Full standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.
### The --help Pattern
@@ -68,7 +71,7 @@ All scripts MUST output structured JSON:
{
"severity": "critical|high|medium|low|info",
"category": "structure|security|performance|consistency",
"location": {"file": "SKILL.md", "line": 42},
"location": { "file": "SKILL.md", "line": 42 },
"issue": "Clear description",
"fix": "Specific action to resolve"
}

View File

@@ -0,0 +1,92 @@
# Script Creation Standards
When building scripts for a skill, follow these standards to ensure portability and zero-friction execution. Skills must work across macOS, Linux, and Windows (native, Git Bash, and WSL).
## Python Over Bash
**Always favor Python for script logic.** Bash is not portable — it fails or behaves inconsistently on Windows (Git Bash is MSYS2-based, not a full Linux shell; WSL bash can conflict with Git Bash on PATH; PowerShell is a different language entirely). Python with `uv run` works identically on all platforms.
**Safe bash commands** — these work reliably across all environments and are fine to use directly:
- `git`, `gh` — version control and GitHub CLI
- `uv run` — Python script execution with automatic dependency handling
- `npm`, `npx`, `pnpm` — Node.js ecosystem
- `mkdir -p` — directory creation
**Everything else should be Python** — piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc`, and any non-trivial logic. Even `sed -i` behaves differently on macOS vs Linux. If it's more than a single safe command, write a Python script.
## Favor the Standard Library
Always prefer Python's standard library over external dependencies. The stdlib is pre-installed everywhere, requires no `uv run`, and has zero supply-chain risk. Common stdlib modules that cover most script needs:
- `json` — JSON parsing and output
- `pathlib` — cross-platform path handling
- `re` — pattern matching
- `argparse` — CLI interface
- `collections` — counters, defaultdicts
- `difflib` — text comparison
- `ast` — Python source analysis
- `csv`, `xml.etree` — data formats
Only pull in external dependencies when the stdlib genuinely cannot do the job (e.g., `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation). **External dependencies must be confirmed with the user during the build process** — they add install-time cost, supply-chain surface, and require `uv` to be available.
## PEP 723 Inline Metadata (Required)
Every Python script MUST include a PEP 723 metadata block. For scripts with external dependencies, use the `uv run` shebang:
```python
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = ["pyyaml>=6.0", "jsonschema>=4.0"]
# ///
```
For scripts using only the standard library, use a plain Python shebang but still include the metadata block:
```python
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.10"
# ///
```
**Key rules:**
- The shebang MUST be line 1 — before the metadata block
- Always include `requires-python`
- List all external dependencies with version constraints
- Never use `requirements.txt`, `pip install`, or expect global package installs
- The shebang is a Unix convenience — cross-platform invocation relies on `uv run scripts/foo.py`, not `./scripts/foo.py`
## Invocation in SKILL.md
How a built skill's SKILL.md should reference its scripts:
- **Scripts with external dependencies:** `uv run scripts/analyze.py {args}`
- **Stdlib-only scripts:** `python3 scripts/scan.py {args}` (also fine to use `uv run` for consistency)
`uv run` reads the PEP 723 metadata, silently caches dependencies in an isolated environment, and runs the script — no user prompt, no global install. Like `npx` for Python.
## Graceful Degradation
Skills may run in environments where Python or `uv` is unavailable (e.g., claude.ai web). Scripts should be the fast, reliable path — but the skill must still deliver its outcome when execution is not possible.
**Pattern:** When a script cannot execute, the LLM performs the equivalent work directly. The script's `--help` documents what it checks, making this fallback natural. Design scripts so their logic is understandable from their help output and the skill's context.
In SKILL.md, frame script steps as outcomes, not just commands:
- Good: "Validate path conventions (run `scripts/scan-paths.py --help` for details)"
- Avoid: "Execute `python3 scripts/scan-paths.py`" with no context about what it does
## Script Interface Standards
- Implement `--help` via `argparse` (single source of truth for the script's API)
- Accept target path as a positional argument
- `-o` flag for output file (default to stdout)
- Diagnostics and progress to stderr
- Exit codes: 0=pass, 1=fail, 2=error
- `--verbose` flag for debugging
- Output valid JSON to stdout
- No interactive prompts, no network dependencies
- Tests in `scripts/tests/`

View File

@@ -1,6 +1,6 @@
# Skill Authoring Best Practices
For field definitions and description format, see `./references/standard-fields.md`. For quality dimensions, see `./references/quality-dimensions.md`.
For field definitions and description format, see `./standard-fields.md`. For quality dimensions, see `./quality-dimensions.md`.
## Core Philosophy: Outcome-Based Authoring
@@ -10,11 +10,11 @@ Skills should describe **what to achieve**, not **how to achieve it**. The LLM i
### Outcome vs Prescriptive
| Prescriptive (avoid) | Outcome-based (prefer) |
|---|---|
| "Step 1: Ask about goals. Step 2: Ask about constraints. Step 3: Summarize and confirm." | "Ensure the user's vision is fully captured — goals, constraints, and edge cases — before proceeding." |
| "Load config. Read user_name. Read communication_language. Greet the user by name in their language." | "Load available config and greet the user appropriately." |
| "Create a file. Write the header. Write section 1. Write section 2. Save." | "Produce a report covering X, Y, and Z." |
| Prescriptive (avoid) | Outcome-based (prefer) |
| ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| "Step 1: Ask about goals. Step 2: Ask about constraints. Step 3: Summarize and confirm." | "Ensure the user's vision is fully captured — goals, constraints, and edge cases — before proceeding." |
| "Load config. Read user_name. Read communication_language. Greet the user by name in their language." | "Load available config and greet the user appropriately." |
| "Create a file. Write the header. Write section 1. Write section 2. Save." | "Produce a report covering X, Y, and Z." |
The prescriptive versions miss requirements the author didn't think of. The outcome-based versions let the LLM adapt to the actual situation.
@@ -29,11 +29,11 @@ The prescriptive versions miss requirements the author didn't think of. The outc
Reserve exact steps for **fragile operations** where getting it wrong has consequences — script invocations, exact file paths, specific CLI commands, API calls with precise parameters. These need low freedom because there's one right way to do them.
| Freedom | When | Example |
|---------|------|---------|
| **High** (outcomes) | Multiple valid approaches, LLM judgment adds value | "Ensure the user's requirements are complete" |
| **Medium** (guided) | Preferred approach exists, some variation OK | "Present findings in a structured report with an executive summary" |
| **Low** (exact) | Fragile, one right way, consequences for deviation | `python3 scripts/scan-path-standards.py {skill-path}` |
| Freedom | When | Example |
| ------------------- | -------------------------------------------------- | ------------------------------------------------------------------- |
| **High** (outcomes) | Multiple valid approaches, LLM judgment adds value | "Ensure the user's requirements are complete" |
| **Medium** (guided) | Preferred approach exists, some variation OK | "Present findings in a structured report with an executive summary" |
| **Low** (exact) | Fragile, one right way, consequences for deviation | `python3 scripts/scan-path-standards.py {skill-path}` |
## Patterns
@@ -63,10 +63,10 @@ Before finalizing significant artifacts, fan out reviewers with different perspe
Consider whether the skill benefits from multiple execution modes:
| Mode | When | Behavior |
|------|------|----------|
| **Guided** | Default | Conversational discovery with soft gates |
| **Yolo** | "just draft it" | Ingest everything, draft complete artifact, then refine |
| Mode | When | Behavior |
| ------------ | ------------------- | ------------------------------------------------------------- |
| **Guided** | Default | Conversational discovery with soft gates |
| **Yolo** | "just draft it" | Ingest everything, draft complete artifact, then refine |
| **Headless** | `--headless` / `-H` | Complete the task without user input, using sensible defaults |
Not all skills need all three. But considering them during design prevents locking into a single interaction model.
@@ -90,16 +90,16 @@ For complex tasks with consequences: plan → validate → execute → verify. C
## Anti-Patterns
| Anti-Pattern | Fix |
|---|---|
| Numbered steps for things the LLM would figure out | Describe the outcome and why it matters |
| Explaining how to load config (the mechanic) | List the config keys and their defaults (the outcome) |
| Prescribing exact greeting/menu format | "Greet the user and present capabilities" |
| Spelling out headless mode in detail | "If headless, complete without user input" |
| Too many options upfront | One default with escape hatch |
| Deep reference nesting (A→B→C) | Keep references 1 level from SKILL.md |
| Inconsistent terminology | Choose one term per concept |
| Scripts that classify meaning via regex | Intelligence belongs in prompts, not scripts |
| Anti-Pattern | Fix |
| -------------------------------------------------- | ----------------------------------------------------- |
| Numbered steps for things the LLM would figure out | Describe the outcome and why it matters |
| Explaining how to load config (the mechanic) | List the config keys and their defaults (the outcome) |
| Prescribing exact greeting/menu format | "Greet the user and present capabilities" |
| Spelling out headless mode in detail | "If headless, complete without user input" |
| Too many options upfront | One default with escape hatch |
| Deep reference nesting (A→B→C) | Keep references 1 level from SKILL.md |
| Inconsistent terminology | Choose one term per concept |
| Scripts that classify meaning via regex | Intelligence belongs in prompts, not scripts |
## Scripts in Skills

View File

@@ -4,52 +4,53 @@
Only these fields go in the YAML frontmatter block:
| Field | Description | Example |
|-------|-------------|---------|
| `name` | Full skill name (kebab-case, same as folder name) | `bmad-workflow-builder`, `bmad-validate-json` |
| `description` | [5-8 word summary]. [Use when user says 'X' or 'Y'.] | See Description Format below |
| Field | Description | Example |
| ------------- | ---------------------------------------------------- | --------------------------------------------- |
| `name` | Full skill name (kebab-case, same as folder name) | `validate-json`, `cis-brainstorm` |
| `description` | [5-8 word summary]. [Use when user says 'X' or 'Y'.] | See Description Format below |
## Content Fields (All Types)
These are used within the SKILL.md body — never in frontmatter:
| Field | Description | Example |
|-------|-------------|---------|
| `role-guidance` | Brief expertise primer | "Act as a senior DevOps engineer" |
| `module-code` | Module code (if module-based) | `bmb`, `cis` |
| Field | Description | Example |
| --------------- | ----------------------------- | --------------------------------- |
| `role-guidance` | Brief expertise primer | "Act as a senior DevOps engineer" |
| `module-code` | Module code (if module-based) | `bmb`, `cis` |
## Simple Utility Fields
| Field | Description | Example |
|-------|-------------|---------|
| `input-format` | What it accepts | JSON file path, stdin text |
| `output-format` | What it returns | Validated JSON, error report |
| `standalone` | Fully standalone, no config needed? | true/false |
| `composability` | How other skills use it | "Called by quality scanners for validation" |
| Field | Description | Example |
| --------------- | ----------------------------------- | ------------------------------------------- |
| `input-format` | What it accepts | JSON file path, stdin text |
| `output-format` | What it returns | Validated JSON, error report |
| `standalone` | Fully standalone, no config needed? | true/false |
| `composability` | How other skills use it | "Called by quality scanners for validation" |
## Simple Workflow Fields
| Field | Description | Example |
|-------|-------------|---------|
| `steps` | Numbered inline steps | "1. Load config 2. Read input 3. Process" |
| `tools-used` | CLIs/tools/scripts | gh, jq, python scripts |
| `output` | What it produces | PR, report, file |
| Field | Description | Example |
| ------------ | --------------------- | ----------------------------------------- |
| `steps` | Numbered inline steps | "1. Load config 2. Read input 3. Process" |
| `tools-used` | CLIs/tools/scripts | gh, jq, python scripts |
| `output` | What it produces | PR, report, file |
## Complex Workflow Fields
| Field | Description | Example |
|-------|-------------|---------|
| `stages` | Named numbered stages | "01-discover, 02-plan, 03-build" |
| `progression-conditions` | When stages complete | "User approves outline" |
| `headless-mode` | Supports autonomous? | true/false |
| `config-variables` | Beyond core vars | `planning_artifacts`, `output_folder` |
| `output-artifacts` | What it creates (output-location) | "PRD document", "agent skill" |
| Field | Description | Example |
| ------------------------ | --------------------------------- | ------------------------------------- |
| `stages` | Named numbered stages | "01-discover, 02-plan, 03-build" |
| `progression-conditions` | When stages complete | "User approves outline" |
| `headless-mode` | Supports autonomous? | true/false |
| `config-variables` | Beyond core vars | `planning_artifacts`, `output_folder` |
| `output-artifacts` | What it creates (output-location) | "PRD document", "agent skill" |
## Overview Section Format
The Overview is the first section after the title — it primes the AI for everything that follows.
**3-part formula:**
1. **What** — What this workflow/skill does
2. **How** — How it works (approach, key stages)
3. **Why/Outcome** — Value delivered, quality standard
@@ -57,16 +58,19 @@ The Overview is the first section after the title — it primes the AI for every
**Templates by skill type:**
**Complex Workflow:**
```markdown
This skill helps you {outcome} through {approach}. Act as {role-guidance}, guiding users through {key stages}. Your output is {deliverable}.
```
**Simple Workflow:**
```markdown
This skill {what it does} by {approach}. Act as {role-guidance}. Use when {trigger conditions}. Produces {output}.
```
**Simple Utility:**
```markdown
This skill {what it does}. Use when {when to use}. Returns {output format} with {key feature}.
```
@@ -76,11 +80,13 @@ This skill {what it does}. Use when {when to use}. Returns {output format} with
The frontmatter `description` is the PRIMARY trigger mechanism — it determines when the AI invokes this skill. Most BMad skills are **explicitly invoked** by name (`/skill-name` or direct request), so descriptions should be conservative to prevent accidental triggering.
**Format:** Two parts, one sentence each:
```
[What it does in 5-8 words]. [Use when user says 'specific phrase' or 'specific phrase'.]
```
**The trigger clause** uses one of these patterns depending on the skill's activation style:
- **Explicit invocation (default):** `Use when the user requests to 'create a PRD' or 'edit an existing PRD'.` — Quotes around specific phrases the user would actually say. Conservative — won't fire on casual mentions.
- **Organic/reactive:** `Trigger when code imports anthropic SDK, or user asks to use Claude API.` — For lightweight skills that should activate on contextual signals, not explicit requests.
@@ -99,31 +105,47 @@ Bad: `Use on any mention of workflows, building, or creating things.` — Over-b
## Role Guidance Format
Every generated workflow SKILL.md includes a brief role statement in the Overview or as a standalone line:
```markdown
Act as {role-guidance}. {brief expertise/approach description}.
```
This provides quick prompt priming for expertise and tone. Workflows may also use full Identity/Communication Style/Principles sections when personality serves the workflow's purpose.
## Path Rules
### Skill-Internal Files
### Same-Folder References
All references to files within the skill use `./` prefix:
- `./references/reference.md`
- `./references/discover.md`
- `./scripts/validate.py`
Use `./` only when referencing a file in the same directory as the file containing the reference:
This distinguishes skill-internal files from `{project-root}` paths — without the `./` prefix the LLM may confuse them.
- From `references/build-process.md``./classification-reference.md` (both in references/)
- From `scripts/scan.py``./utils.py` (both in scripts/)
### Cross-Directory References
Use bare paths relative to the skill root — no `./` prefix:
- `references/build-process.md`
- `scripts/validate.py`
- `assets/template.md`
These work from any file in the skill because they're always resolved from the skill root. **Never use `./` for cross-directory paths**`./scripts/foo.py` from a file in `references/` is misleading because `scripts/` is not next to that file.
### Project-Scope Paths
Use `{project-root}/...` for any path relative to the project root:
### Project `_bmad` Paths
Use `{project-root}/_bmad/...`:
- `{project-root}/_bmad/planning/prd.md`
- `{project-root}/docs/report.md`
### Config Variables
Use directly — they already contain `{project-root}` in their resolved values:
- `{output_folder}/file.md`
- `{planning_artifacts}/prd.md`
**Never:**
- `{project-root}/{output_folder}/file.md` (WRONG — double-prefix, config var already has path)
- `_bmad/planning/prd.md` (WRONG — bare `_bmad` must have `{project-root}` prefix)

View File

@@ -4,19 +4,21 @@ The SKILL-template provides a minimal skeleton: frontmatter, overview, and activ
## Frontmatter
- `{module-code-or-empty}` → Module code prefix with hyphen (e.g., `bmb-`) or empty for standalone
- `{module-code-or-empty}` → Module code prefix with hyphen (e.g., `bmb-`) or empty for standalone. The `bmad-` prefix is reserved for official BMad creations; user skills should not include it.
- `{skill-name}` → Skill functional name (kebab-case)
- `{skill-description}` → Two parts: [5-8 word summary]. [trigger phrases]
## Module Conditionals
### For Module-Based Skills
- `{if-module}` ... `{/if-module}` → Keep the content inside
- `{if-standalone}` ... `{/if-standalone}` → Remove the entire block including markers
- `{module-code}` → Module code without trailing hyphen (e.g., `bmb`)
- `{module-setup-skill}` → Name of the module's setup skill (e.g., `bmad-builder-setup`)
- `{module-setup-skill}` → Name of the module's setup skill (e.g., `mymod-setup`)
### For Standalone Skills
- `{if-module}` ... `{/if-module}` → Remove the entire block including markers
- `{if-standalone}` ... `{/if-standalone}` → Keep the content inside
@@ -26,7 +28,8 @@ The builder determines the rest of the skill structure — body sections, phases
## Path References
All generated skills use `./` prefix for skill-internal paths:
- `./references/{reference}.md` — Reference documents loaded on demand
- `./references/{stage}.md`Stage prompts (complex workflows)
- `./scripts/` — Python/shell scripts for deterministic operations
All generated skills use paths relative to skill root (cross-directory) or `./` (same-folder):
- `references/{reference}.md`Reference documents loaded on demand
- `references/{stage}.md` — Stage prompts (complex workflows)
- `scripts/` — Python/shell scripts for deterministic operations