Marketplace gasp-diagnostics
System diagnostics using GASP (General AI Specialized Process monitor). Use when user asks about Linux system performance, requests system checks, mentions GASP, asks to diagnose hosts, or says things like "check my system" or "what's wrong with [hostname]". Can actively fetch GASP metrics from hosts via HTTP or interpret provided JSON output.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/acceleratedindustries/gasp-diagnostics" ~/.claude/skills/aiskillstore-marketplace-gasp-diagnostics && rm -rf "$T"
skills/acceleratedindustries/gasp-diagnostics/SKILL.mdGASP Diagnostics
Enables comprehensive Linux system diagnostics using GASP's AI-optimized monitoring output. Actively fetches metrics from hosts and provides intelligent analysis with context-aware interpretation.
Fetching GASP Metrics
When user mentions a host or requests a system check:
-
Fetch the metrics endpoint
web_fetch("http://{hostname}:8080/metrics") -
Hostname formats supported
- mDNS/local:
,accelerated.localhyperion.local - DNS names:
,proxmox1
,dev-serverworkstation - IP addresses:
192.168.1.100
- mDNS/local:
-
Default port: 8080 (unless user specifies otherwise)
-
Error handling
- Host unreachable: Inform user, suggest checking if GASP is running
- Port closed/refused: Try suggesting
on the hostsystemctl status gasp - JSON parse error: GASP may not be installed or wrong endpoint
- Timeout: Network issue or host down
-
Multi-host queries: If user mentions multiple hosts, fetch each in sequence and compare
Quick Diagnosis Workflow
For any system check request:
- Fetch metrics from specified host(s)
- Check summary first: Look at
andsummary.healthsummary.concerns[] - Identify issues using metric correlations below
- Report findings with severity and specific recommendations
Trigger Examples
These user messages should trigger this skill and active fetching:
- "Check hyperion for me"
- "What's going on with accelerated.local?"
- "Is proxmox1 having issues?"
- "Compare hyperion and proxmox1"
- "Why is my system slow?" (fetch localhost)
- "Diagnose 192.168.1.50"
- "Check all my proxmox nodes"
Metric Interpretation
Health Summary
: Quick assessmentsummary.health- "healthy": No action needed
- "degraded": Issues present but not critical
- "critical": Immediate attention required
: Pre-analyzed issues to investigate firstsummary.concerns[]
: Context for current statesummary.recent_changes[]
CPU Analysis
Load ratio =
load_avg_1m / cores:
- < 0.7: Normal usage
- 0.7-1.0: Busy but healthy
- 1.0-2.0: Saturated (may cause slowness)
- > 2.0: Severe overload
Key indicators:
: "increasing" is concerning even if current load is acceptabletrend
: Delta from baseline is more important than absolute valuebaseline_load
: Check for unexpected CPU hogstop_processes[]
Memory Analysis
Red flags (priority order):
: CRITICAL - system killed processes, find memory hog immediatelyoom_kills_recent > 0
: Performance degradation in progressswap_used_mb > 0
: System struggling with memory contentionpressure_pct > 5%
: Getting close to limitsusage_percent > 90%
Important: Linux uses memory for cache, so high
usage_percent alone is normal. Focus on pressure and swap.
Disk I/O
Saturation indicators:
: Significant disk bottleneckio_wait_ms > 10
consistently high: Disk can't keep upqueue_depth- High
orread_iops
with slow response: Disk performance issuewrite_iops
Storage capacity:
: Running out of spaceusage_percent > 90%
: Critical - will cause failures soonusage_percent > 95%
Network
/rx_bytes_per_sec
: Check for unexpected traffic spikestx_bytes_per_sec
orerrors > 0
: Network hardware/configuration issuedrops > 0- Large number of
connections: May indicate connection leaktime_wait
Process Intelligence
: Process management bug (usually benign but indicates issue)zombie > 0- Processes in
: Stuck in uninterruptible sleep (disk or kernel issue)D state
: Check for unexpected process spawningnew_since_last[]
Systemd Services
: Checkunits_failed > 0
arrayfailed_units[]
: May indicate instabilityrecent_restarts[]
Log Summary
: Elevated error rate indicates problemserrors_last_interval
: Spikes suggest logging storm or serious issuemessage_rate_per_min- Review
for specific problemsrecent_errors[]
Desktop Metrics (when present)
vs CPU: Identify GPU-bound vs CPU-bound workloadsgpu.utilization_pct
: Thermal throttling likelygpu.temperature_c > 85
: Provides context for resource usageactive_window
Common System Patterns
Development Workstation (Expected)
- High memory usage from IDEs, browsers
- Firefox/Chrome often in top memory consumers
- Docker daemon using CPU/memory
- VSCode, JetBrains IDEs in top processes
- Baseline load: 10-30% of cores
Container Host (Expected)
- Elevated baseline load (many processes)
- dockerd/containerd in top processes
- 50-70% memory usage normal
- Many processes in top list
Proxmox/Virtualization Host (Expected)
- Baseline load proportional to VM count
- Consistent low-level resource usage
- ~2GB overhead for Proxmox itself
- Multiple QEMU/KVM processes
GPU Workload (Expected)
- High GPU utilization with lower CPU
- Significant GPU memory usage
- Common for: rendering, ML inference, gaming
Multi-Host Analysis
When checking multiple hosts:
- Fetch all hosts first (parallel thinking)
- Compare baselines: Identify outliers
- Look for correlations: Network event vs individual host issue
- Check recent_changes: Migrations, deployments, package updates
- Identify the odd one out: Which host differs from the pattern?
Example analysis pattern:
Host 1: Load 2.1/8 cores (26%), normal Host 2: Load 7.8/8 cores (97%), ATTENTION NEEDED ← outlier Host 3: Load 1.9/8 cores (24%), normal Focus on Host 2 - investigate top_processes
Diagnosis Strategies
"System is slow"
- Check load ratio (CPU saturation?)
- Check io_wait (disk bottleneck?)
- Check memory pressure (swapping?)
- Identify top consumer in relevant category
- Assess if consumption is expected for that process
"High memory usage"
- First: Check pressure_pct (real issue or just cache?)
- Check swap_used_mb (actual problem?)
- Find top memory consumers
- Check process uptime (leak or normal?)
- Compare to baseline (delta more important than absolute)
"Unexpected behavior"
- Check recent_changes for clues
- Review systemd failed units
- Check recent_errors in logs
- Look for new processes since last snapshot
- Compare current metrics to baseline
Reporting Guidelines
When reporting findings:
- Start with verdict: "Healthy", "Issue found", "Critical problem"
- Be specific: Name the process/service causing issues
- Provide context: Is this expected for this host type?
- Give actionable recommendations: What should user do?
- Include relevant metrics: Back up findings with data
Good example:
"Issue found on accelerated.local: Memory pressure at 8.2%. The postgres container started swapping 2 hours ago and is now using 12GB RAM (up from 4GB baseline). This likely indicates a query leak. Recommend checking recent queries and restarting the container."
Bad example:
"Memory usage is high. You might want to look into it."
Advanced Diagnostics
For complex issues or when initial analysis is unclear, consult:
- references/diagnostic-workflows.md - Detailed diagnostic procedures
- references/common-patterns.md - Infrastructure-specific patterns
Using with Provided JSON
If user pastes GASP JSON instead of requesting a fetch:
- Analyze the provided JSON using all guidance above
- Don't attempt to fetch (data already provided)
- Apply same interpretation and reporting guidelines