Skills-for-fabric skill-test

install
source · Clone the upstream repo
git clone https://github.com/microsoft/skills-for-fabric
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/microsoft/skills-for-fabric "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.github/skills/skill-test" ~/.claude/skills/microsoft-skills-for-fabric-skill-test && rm -rf "$T"
manifest: .github/skills/skill-test/SKILL.md
source content

Skill Test — skills-for-fabric Evaluation Framework

Manage the end-to-end evaluation framework for skills-for-fabric. This skill routes requests to the correct workflow based on user intent — adding tests, listing tests, running tests, viewing results, generating data, or checking coverage.

When to Use

  • When a contributor wants to add evaluation test cases for a new or existing skill
  • When someone asks to see what tests exist or what results look like
  • When a user wants to run the test suite
  • When reviewing eval metrics or checking which skills lack test coverage

Intent Routing

Parse the user request and route to the appropriate workflow:

User IntentTrigger PhrasesAction
Add evals"add tests", "add evals", "add evals for missing skills", "create eval plan"Workflow: Add Evals
List tests"list tests", "list evals", "show me the list of tests", "what tests exist", "show eval plans"Workflow: List Tests
Run tests"run tests", "run evals", "execute tests", "run the eval suite"Workflow: Run Tests
View results"show eval results", "test results", "eval results", "executive summary"Workflow: View Results
Generate data"generate eval data", "generate test data", "create eval datasets"Workflow: Generate Data
View metrics"eval metrics", "test metrics", "what metrics", "how are tests scored"Workflow: View Metrics
Check coverage"test coverage", "which skills have tests", "missing tests", "skills without evals"Workflow: Check Coverage

Workflow: Add Evals

Follow the instructions in

tests/full-eval-tests/README.md
§ "Adding Evals for New Skills".

Automated Path (Recommended)

Give the agent the prompt:

Add evals for the missing skills

The agent will:

  1. Detect missing skills by comparing installed skills against existing eval plans in
    tests/full-eval-tests/plan/03-individual-skills/
  2. Generate individual eval plans (
    plan/03-individual-skills/eval-<skill-name>.md
    ) with 10–12 test cases
  3. Generate combined eval plans (
    plan/04-combined-skills/eval-<skill>-authoring-plus-consumption.md
    )
  4. Create golden data in
    tests/full-eval-tests/evalsets/expected-results/
  5. Update tracking files:
    plan/00-overview.md
    ,
    README.md
    ,
    plan/04-combined-skills/eval-full-pipeline.md

Manual Path

To add evals for a specific skill

<new-skill>
:

  1. Create
    tests/full-eval-tests/plan/03-individual-skills/eval-<new-skill>.md
    using the template in the README
  2. Each test case needs: Case ID (unique prefix), Prompt, Expected result, Pass criteria, at least one negative/ambiguous test
  3. If the skill has an authoring+consumption pair, create
    tests/full-eval-tests/plan/04-combined-skills/eval-<new-skill>-authoring-plus-consumption.md
  4. Add golden data to
    tests/full-eval-tests/evalsets/expected-results/
  5. Update
    plan/00-overview.md
    ,
    README.md
    directory tree, and
    plan/04-combined-skills/eval-full-pipeline.md

Eval Plan Template

Use the template from

tests/full-eval-tests/README.md
§ "Eval Plan Template". Every eval plan must include:

  • Skill overview (name, category, R/W, purpose)
  • Pre-requisites
  • Numbered test cases (XX-01 through XX-10+) with Prompt / Expected / Pass criteria
  • At least one negative/ambiguous test case as the last case
  • Write Operations table (if the skill writes data)
  • Expected Token Range

Workflow: List Tests

Show the user what eval plans and test cases exist.

Individual Skill Evals

List files in

tests/full-eval-tests/plan/03-individual-skills/
:

ls tests/full-eval-tests/plan/03-individual-skills/

Combined Skill Evals

List files in

tests/full-eval-tests/plan/04-combined-skills/
:

ls tests/full-eval-tests/plan/04-combined-skills/

Quick Tests (tests.json)

Show the test cases defined in

tests/tests.json
— these are the prompt-based tests run by the test runner.

Recommended Execution Order

OrderEval PlanReason
1eval-check-updates.mdVerify skills are installed
2eval-spark-authoring.mdCreate lakehouses and load data
3eval-sqldw-authoring.mdCreate warehouse tables and load data
4eval-eventhouse-authoring.mdCreate Eventhouse tables and ingest data
5eval-spark-consumption.mdRead back lakehouse data
6eval-sqldw-consumption.mdRead back warehouse data
7eval-eventhouse-consumption.mdRead back Eventhouse data
8eval-medallion.mdEnd-to-end medallion pipeline

Workflow: Run Tests

⛔ DO NOT execute tests from this skill. The agent must NEVER run

copilot
,
run-full-tests.ps1
, or any eval prompt directly. Instead, tell the user the exact commands to run manually.

When the user asks to run tests, respond only with instructions. Do not execute any commands. Tell the user:

  1. Open a terminal and navigate to the

    tests/
    directory at the repository root:

    cd tests
    
  2. Run the full test suite:

    .\run-full-tests.ps1
    
  3. To specify an output directory:

    .\run-full-tests.ps1 -TestFolder C:\temp\eval-run-01
    

Important

  • The agent must NEVER run tests itself — only provide the user with instructions
  • Tests must be run by the user from inside the
    tests/
    folder
  • The script copies the eval framework to a working folder and launches copilot there

Workflow: View Results

Show the user existing evaluation results.

Detailed Results

Read

tests/full-eval-tests/eval-results.md
— contains per-skill, per-test-case pass/fail with notes, consistency test results, failure analysis, and skip reasons.

Executive Summary

Read

tests/full-eval-tests/executive-summary.md
— contains the high-level summary: overall pass rate, results by skill, data consistency scores, failure analysis, and recommendations.

Key Metrics from Latest Run

MetricValue
Overall pass rate94.7% (54/57 executed)
Write/Read consistency100% (5/5 exact matches)
Total test cases74
Skipped17

Workflow: Generate Data

Generate synthetic evaluation datasets using the specifications in

tests/full-eval-tests/plan/01-data-generation.md
.

Using the Generation Script

python tests/full-eval-tests/evalsets/data-generation/generate.py

Datasets

DatasetRowsFormatUsed By
sales_transactions100 / 1K / 10KCSVSQL DW, Spark
customers100CSVJoin testing
products50CSVJoin testing
sensor_readings500JSONSpark semi-structured

Golden Results

Pre-computed expected results are in

tests/full-eval-tests/evalsets/expected-results/
and are used to verify consistency.


Workflow: View Metrics

Explain the evaluation metrics defined in

tests/full-eval-tests/plan/02-metrics.md
.

MetricDefinition
Success Rate
passed / total × 100
— whether the skill executed correctly
Token UsageInput + output tokens consumed per eval prompt
Read/Write ConsistencyData written by authoring skill must be exactly retrievable by consumption skill

Grading

GradeCriteria
PASSSkill invoked correctly, output matches expected
FAIL_INVOCATIONWrong skill invoked or not invoked
FAIL_EXECUTIONSkill invoked but errored
FAIL_RESULTSkill completed but output mismatches

Pass Thresholds

MetricThreshold
Success Rate≥ 90% per skill
Token UsageWithin 2× of baseline
Read/Write Consistency100% exact match

Workflow: Check Coverage

Compare installed skills against existing eval plans to identify gaps.

Steps

  1. List all skills from the marketplace/plugin:

    check-updates, spark-authoring-cli, spark-consumption-cli, sqldw-authoring-cli,
    sqldw-consumption-cli, eventhouse-authoring-cli, eventhouse-consumption-cli, e2e-medallion-architecture
    
  2. List existing individual eval plans:

    ls tests/full-eval-tests/plan/03-individual-skills/
    
  3. Compare and report which skills have eval coverage and which are missing.

  4. For missing skills, suggest running the Add Evals workflow.


Must

  • NEVER execute tests, eval prompts, or the test runner script — only provide instructions for the user to run manually
  • Always route "run tests" to the
    tests/
    folder
    — tell the user to navigate there and run
    run-full-tests.ps1
  • Follow the eval plan template when creating new eval plans — every test case needs Case ID, Prompt, Expected, Pass criteria
  • Include at least one negative/ambiguous test in every new eval plan
  • Update tracking files when adding evals —
    plan/00-overview.md
    ,
    README.md
    ,
    eval-full-pipeline.md
  • Reference golden data in
    evalsets/expected-results/
    for consistency tests

Prefer

  • Automated eval generation ("add evals for missing skills") over manual creation
  • Reading existing eval plans as templates before creating new ones
  • Running the full suite rather than individual tests for comprehensive coverage
  • Checking the executive summary before diving into detailed results

Avoid

  • Running eval prompts outside the
    tests/
    directory
  • Executing tests, copilot commands, or the run-full-tests.ps1 script — the agent must only tell the user how to run them
  • Creating eval plans without a negative/ambiguous test case
  • Duplicating content already in the README or plan documents — reference them instead
  • Modifying golden result files unless the underlying data generation rules change
  • Skipping the tracking file updates when adding new evals

Examples

Adding Tests

User: "Add tests for the powerbi-consumption-cli skill"

Agent: Creates

tests/full-eval-tests/plan/03-individual-skills/eval-powerbi-consumption.md
with 10–12 test cases following the template, adds golden data to
evalsets/expected-results/
, and updates tracking files.

Listing Tests

User: "What tests exist?"

Agent: Lists all eval plans in

plan/03-individual-skills/
and
plan/04-combined-skills/
, plus the quick tests in
tests/tests.json
, with the recommended execution order.

Running Tests

User: "Run the tests"

Agent:

Tests must be run from the

tests/
folder. Navigate there and execute:

cd tests
.\run-full-tests.ps1

Viewing Results

User: "Show me the eval results"

Agent: Reads

tests/full-eval-tests/eval-results.md
and presents the summary table, highlighting pass rates and any failures.