Skills-for-fabric spark-consumption-cli
install
source · Clone the upstream repo
git clone https://github.com/microsoft/skills-for-fabric
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/microsoft/skills-for-fabric "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/spark-consumption-cli" ~/.claude/skills/microsoft-skills-for-fabric-spark-consumption-cli && rm -rf "$T"
manifest:
skills/spark-consumption-cli/SKILL.mdsource content
Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
skill.check-updates- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
- To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
Data Engineering Consumption — CLI Skill
Table of Contents
| Task | Reference | Notes |
|---|---|---|
| Fabric Topology & Key Concepts | COMMON-CORE.md § Fabric Topology & Key Concepts | |
| Environment URLs | COMMON-CORE.md § Environment URLs | |
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication & Token Acquisition | Wrong audience = 401; read before any auth issue |
| Core Control-Plane REST APIs | COMMON-CORE.md § Core Control-Plane REST APIs | |
| Pagination | COMMON-CORE.md § Pagination | |
| Long-Running Operations (LRO) | COMMON-CORE.md § Long-Running Operations (LRO) | |
| Rate Limiting & Throttling | COMMON-CORE.md § Rate Limiting & Throttling | |
| OneLake Data Access | COMMON-CORE.md § OneLake Data Access | Requires token, not Fabric token |
| Job Execution | COMMON-CORE.md § Job Execution | |
| Capacity Management | COMMON-CORE.md § Capacity Management | |
| Gotchas & Troubleshooting | COMMON-CORE.md § Gotchas & Troubleshooting | |
| Best Practices | COMMON-CORE.md § Best Practices | |
| Tool Selection Rationale | COMMON-CLI.md § Tool Selection Rationale | |
| Finding Workspaces and Items in Fabric | COMMON-CLI.md § Finding Workspaces and Items in Fabric | Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id] |
| Authentication Recipes | COMMON-CLI.md § Authentication Recipes | flows and token acquisition |
Fabric Control-Plane API via | COMMON-CLI.md § Fabric Control-Plane API via az rest | Always pass or fails |
| Pagination Pattern | COMMON-CLI.md § Pagination Pattern | |
| Long-Running Operations (LRO) Pattern | COMMON-CLI.md § Long-Running Operations (LRO) Pattern | |
OneLake Data Access via | COMMON-CLI.md § OneLake Data Access via curl | Use not (different token audience) |
| SQL / TDS Data-Plane Access | COMMON-CLI.md § SQL / TDS Data-Plane Access | (Go) connect, query, CSV export |
| Job Execution (CLI) | COMMON-CLI.md § Job Execution | |
| OneLake Shortcuts | COMMON-CLI.md § OneLake Shortcuts | |
| Capacity Management (CLI) | COMMON-CLI.md § Capacity Management | |
| Composite Recipes | COMMON-CLI.md § Composite Recipes | |
| Gotchas & Troubleshooting (CLI-Specific) | COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific) | audience, shell escaping, token expiry |
Quick Reference: Template | COMMON-CLI.md § Quick Reference: az rest Template | |
| Quick Reference: Token Audience / CLI Tool Matrix | COMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool Matrix | Which + tool for each service |
| Relationship to SPARK-AUTHORING-CORE.md | SPARK-CONSUMPTION-CORE.md § Relationship to SPARK-AUTHORING-CORE.md | |
| Data Engineering Consumption Capability Matrix | SPARK-CONSUMPTION-CORE.md § Data Engineering Consumption Capability Matrix | |
| OneLake Table APIs (Schema-enabled Lakehouses) | SPARK-CONSUMPTION-CORE.md § OneLake Table APIs (Schema-enabled Lakehouses) | Unity Catalog-compatible metadata; requires token |
| Livy Session Management | SPARK-CONSUMPTION-CORE.md § Livy Session Management | Session creation, states, lifecycle, termination |
| Interactive Data Exploration | SPARK-CONSUMPTION-CORE.md § Interactive Data Exploration | Statement execution, output retrieval, data discovery |
| PySpark Analytics Patterns | SPARK-CONSUMPTION-CORE.md § PySpark Analytics Patterns | Cross-lakehouse 3-part naming, performance optimization |
| Must/Prefer/Avoid | SKILL.md § Must/Prefer/Avoid | MUST DO / AVOID / PREFER checklists |
| Quick Start | SKILL.md § Quick Start | CLI-specific Livy session setup and data exploration |
| Key Fabric Patterns | SKILL.md § Key Fabric Patterns | Spark pattern quick-reference table |
| Session Cleanup | SKILL.md § Session Cleanup | Clean up idle Livy sessions via CLI |
Must/Prefer/Avoid
MUST DO
- Check for existing idle sessions before creating new ones
- Use dynamic workspace/lakehouse discovery
- Follow API patterns from COMMON-CLI.md
PREFER
- sqldw-consumption-cli for simple lakehouse queries — row counts, SELECT, schema exploration, filtering, and aggregation on lakehouse Delta tables should use the SQL Endpoint via
, not Spark. Only use this skill when the user explicitly requests PySpark, DataFrames, or Spark-specific features.sqlcmd - SQL Endpoint for Delta tables
- Livy for unstructured/JSON data or complex Python analytics
- Session reuse over creation
AVOID
- Hardcoded workspace IDs
- Creating unnecessary sessions
- Large result sets without LIMIT
Quick Start
Environment Setup
Apply environment detection from COMMON-CORE.md Environment Detection Pattern to set:
and$FABRIC_API_BASE$FABRIC_RESOURCE_SCOPE
and$FABRIC_API_URL
for Livy operations$LIVY_API_PATH
Authentication: Use token acquisition from COMMON-CLI.md Environment Detection and API Configuration
Workspace & Item Discovery
Preferred: Use COMMON-CLI.md item discovery patterns (Finding things in Fabric) to find workspaces and items by name.
Fallback (when workspace is already known):
# List workspaces az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces" --query "value[].{name:displayName, id:id}" --output table read -p "Workspace ID: " workspaceId # List lakehouses in workspace az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/items?type=Lakehouse" --query "value[].{name:displayName, id:id}" --output table read -p "Lakehouse ID: " lakehouseId
Session Management
# Check for existing idle session (avoid resource waste) sessionId=$(az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'][0].id" --output tsv) # Create if none available - FORCE STARTER POOL USAGE if [[ -z "$sessionId" ]]; then cat > /tmp/body.json << 'EOF' { "name":"analysis", "driverMemory":"56g", "driverCores":8, "executorMemory":"56g", "executorCores":8, "conf": { "spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool" } } EOF sessionId=$(az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --body @/tmp/body.json --query "id" --output tsv) echo "⏳ Waiting for starter pool session to be ready..." # With starter pools, this should be 3-5 seconds timeout=30 # Reduced from 90s since starter pools are fast while [ $timeout -gt 0 ]; do state=$(az rest --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId" --query "state" --output tsv) if [[ "$state" == "idle" ]]; then echo "✅ Session ready in starter pool!" break fi echo " Session state: $state (${timeout}s remaining)" sleep 3 timeout=$((timeout - 3)) done fi
Data Exploration (Fabric-Specific Patterns)
# Execute statement (LLM knows Python/Spark syntax) cat > /tmp/body.json << 'EOF' { "code": "spark.sql(\"SHOW TABLES\").show(); df = spark.table(\"your_table\"); df.describe().show()", "kind": "pyspark" } EOF az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId/statements" --body @/tmp/body.json
Key Fabric Patterns
| Pattern | Code | Use Case |
|---|---|---|
| Table Discovery | | List available tables |
| Cross-Lakehouse | | Query across workspaces |
| Delta Features | , | Time travel, versioning |
| Schema Evolution | | Understand structure |
Session Cleanup
# Clean up idle sessions (optional) az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'].id" --output tsv | xargs -I {} az rest --method delete --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/{}"
Focus: This skill provides Fabric-specific REST API patterns. LLM already knows Python/Spark syntax — we focus on Fabric integration, session management, and API endpoints.