Claude-skill-registry-data LVMS Analyzer
Analyzes LVMS must-gather data to diagnose storage issues
git clone https://github.com/majiayu000/claude-skill-registry-data
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/lvms-analyzer" ~/.claude/skills/majiayu000-claude-skill-registry-data-lvms-analyzer && rm -rf "$T"
data/lvms-analyzer/SKILL.mdLVMS Analyzer Skill
This skill provides detailed guidance for analyzing LVMS (Logical Volume Manager Storage) must-gather data to identify and troubleshoot storage issues.
When to Use This Skill
Use this skill when:
- Analyzing LVMS must-gather data offline
- Diagnosing PVCs stuck in Pending state
- Investigating LVMCluster readiness issues
- Troubleshooting volume group creation failures
- Debugging TopoLVM CSI driver problems
- Checking operator health in LVMS namespace
This skill is automatically invoked by the
/lvms:analyze command when working with must-gather data.
Prerequisites
Required:
- LVMS must-gather directory extracted and accessible
- Must-gather contains LVMS namespace directory:
(newer versions)namespaces/openshift-lvm-storage/- OR
(older versions)namespaces/openshift-storage/
- Python 3.6 or higher installed
- PyYAML library:
pip install pyyaml
Namespace Compatibility:
- LVMS namespace changed from
toopenshift-storage
in recent versionsopenshift-lvm-storage - The analysis script automatically detects which namespace is present
- Both namespaces are fully supported for backward compatibility
Must-Gather Structure:
must-gather/ └── registry-{image-registry}-lvms-must-gather-{version}-sha256-{hash}/ ├── cluster-scoped-resources/ │ ├── core/ │ │ └── persistentvolumes/ │ │ └── pvc-*.yaml # Individual PV files │ ├── storage.k8s.io/ │ │ └── storageclasses/ │ │ ├── lvms-vg1.yaml │ │ └── lvms-vg1-immediate.yaml │ └── security.openshift.io/ │ └── securitycontextconstraints/ │ └── lvms-vgmanager.yaml ├── namespaces/ │ └── openshift-lvm-storage/ # or openshift-storage for older versions │ ├── oc_output/ # IMPORTANT: Primary location for LVMS resources │ │ ├── lvmcluster.yaml # Full LVMCluster resource with status │ │ ├── lvmcluster # Text output (oc describe) │ │ ├── lvmvolumegroup # Text output │ │ ├── lvmvolumegroupnodestatus # Text output │ │ ├── logicalvolume # Text output │ │ ├── pods # Text output (oc get pods) │ │ └── events # Text output │ ├── pods/ │ │ ├── lvms-operator-{hash}/ │ │ │ └── lvms-operator-{hash}.yaml │ │ └── vg-manager-{hash}/ │ │ └── vg-manager-{hash}.yaml │ └── apps/ # May contain deployments/daemonsets └── ...
Key Note: LVMS resources are primarily in the
oc_output/ directory, with lvmcluster.yaml being the most important file containing full cluster and node status.
Implementation Steps
Step 1: Validate Must-Gather Path
Before running analysis, verify the must-gather directory structure:
# Check if LVMS namespace directory exists (try both namespaces) ls {must-gather-path}/namespaces/openshift-lvm-storage 2>/dev/null || \ ls {must-gather-path}/namespaces/openshift-storage # Verify required resource directories ls {must-gather-path}/cluster-scoped-resources/core/persistentvolumes
Namespace Detection: The analysis script automatically detects which namespace is present:
- Newer LVMS versions use
openshift-lvm-storage - Older LVMS versions use
openshift-storage - The script will inform you which namespace was detected
Common Issue: User provides parent directory instead of subdirectory
- Must-gather extracts to a directory like
must-gather.local.12345/ - Inside is a subdirectory like
registry-ci-openshift-org-origin-4-18.../ - Always use the subdirectory (the one with cluster-scoped-resources/ and namespaces/)
Handling:
# If user provides parent directory, try to find the correct subdirectory if [ ! -d "{path}/namespaces/openshift-lvm-storage" ] && \ [ ! -d "{path}/namespaces/openshift-storage" ]; then # Try to find either namespace find {path} -type d \( -name "openshift-lvm-storage" -o -name "openshift-storage" \) -path "*/namespaces/*" # Suggest the correct path to user fi
Step 2: Run Analysis Script
Use the Python analysis script for structured analysis:
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ {must-gather-path}
Script Location:
- Always use:
plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py - Use relative path from repository root
- Script is part of the LVMS plugin
Component-Specific Analysis:
For focused analysis on specific components:
# Analyze only storage/PVC issues python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ {must-gather-path} --component storage # Analyze only operator health python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ {must-gather-path} --component operator # Analyze only volume groups python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ {must-gather-path} --component volumes # Analyze only pod logs python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ {must-gather-path} --component logs
Step 3: Interpret Analysis Results
The script provides structured output across several sections:
1. LVMCluster Status
Key fields to check:
: Should be "Ready"state
: Should be trueready
: All should have status "True"conditions- ResourcesAvailable: Resources deployed successfully
- VolumeGroupsReady: VGs created on all nodes
Example healthy output:
LVMCluster: lvmcluster-sample ✓ State: Ready ✓ Ready: true Conditions: ✓ ResourcesAvailable: True ✓ VolumeGroupsReady: True
Example unhealthy output (real case from must-gather):
LVMCluster: my-lvmcluster ❌ State: Degraded ❌ Ready: false Conditions: ✓ ResourcesAvailable: True Reason: ResourcesAvailable Message: Reconciliation is complete and all the resources are available ❌ VolumeGroupsReady: False Reason: VGsDegraded Message: One or more VGs are degraded
2. Volume Group Status
Checks volume group creation per node and device availability:
Example output (real case from must-gather):
Volume Group/Device Class: vg1 Nodes: 3 Node: ocpnode1.ocpiopex.growipx.com ⚠ Status: Progressing Devices: /dev/mapper/3600a098038315048302b586c38397562, /dev/mapper/mpatha Excluded devices: 24 device(s) - /dev/sdb: /dev/sdb has children block devices and could not be considered - /dev/sdb4: /dev/sdb4 has an invalid filesystem signature (xfs) and cannot be used - /dev/mapper/3600a098038315047433f586c53477272: has an invalid filesystem signature (xfs) ... and 21 more excluded devices Node: ocpnode2.ocpiopex.growipx.com ❌ Status: Degraded Reason: failed to create/extend volume group vg1: failed to extend volume group vg1: WARNING: VG name vg0 is used by VGs VVnkhP-khYQ-blyc-2TNo-d3cv-b6di-4RbSyY and EUV3xv-ft6q-39xK-J3ki-rglf-9H44-rVIHIq. Fix duplicate VG names with vgrename uuid, a device filter, or system IDs. Physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' is already in volume group 'vg0' Unable to add physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' to volume group 'vg0' ... (truncated, see LVMCluster status for full details) Devices: /dev/mapper/mpatha
This real example shows a common LVMS issue: duplicate volume group names preventing VG extension.
3. Storage (PVC/PV) Status
Lists pending or failed PVCs:
Example output:
Pending PVCs: database/postgres-data ❌ Status: Pending (10m) Storage Class: lvms-vg1 Requested: 100Gi Recent Events: ⚠ ProvisioningFailed: no node has enough free space
4. Operator Health
Checks LVMS operator pods, deployments, and daemonsets:
Example issues:
❌ vg-manager-abc123 (worker-0) Status: CrashLoopBackOff Restarts: 15 Error: volume group "vg1" not found
5. Pod Logs
Extracts and analyzes error/warning messages from pod logs:
Example output (from real must-gather):
═══════════════════════════════════════════════════════════ POD LOGS ANALYSIS ═══════════════════════════════════════════════════════════ Pod: vg-manager-nz4pc Unique errors/warnings: 1 ❌ 2025-10-28T10:47:28Z: Reconciler error Controller: lvmvolumegroup Error Details: failed to create/extend volume group vg1: failed to extend volume group vg1: WARNING: VG name vg0 is used by VGs WsNJwk-DK3q-tSHg-zvQJ-imF1-SdRv-8oh4e0 ... Cannot use /dev/dm-10: device is too small (pv_min_size) Command requires all devices to be found. Pod: lvms-operator-65df9f4dbb-92jwl Unique errors/warnings: 1 ❌ 2025-10-28T10:52:48Z: failed to validate device class setup Controller: lvmcluster Error: VG vg1 on node Degraded is not in ready state (ocpnode1.ocpiopex.growipx.com)
Key Points:
- Logs are parsed from JSON format
- Errors are deduplicated (same error repeated in reconciliation loops)
- Shows unique error messages with first occurrence timestamp
- Provides additional context not visible in resource status
Step 4: Analyze Root Causes
Connect related issues to identify root causes:
Common Pattern 1: Device Filesystem Conflict
Chain of failures: 1. Device /dev/sdb has existing ext4 filesystem 2. vg-manager cannot create volume group 3. Volume group missing on node 4. PVCs stuck in Pending Root cause: Device not properly wiped before LVMS use
Common Pattern 2: Insufficient Capacity
Chain of failures: 1. Thin pool at 95% capacity 2. No free space for new volumes 3. PVCs stuck in Pending Root cause: Insufficient storage capacity or old volumes not cleaned up
Common Pattern 3: Node-Specific Failures
Chain of failures: 1. Volume group missing on specific node 2. TopoLVM CSI driver not functional on that node 3. PVCs with node affinity to that node stuck Pending Root cause: Node-specific device configuration issue
Step 5: Generate Remediation Plan
Based on analysis results, provide prioritized recommendations:
CRITICAL Issues (Fix Immediately):
-
Device Conflicts:
# Clean device on affected node oc debug node/{node-name} chroot /host wipefs -a /dev/{device} # Restart vg-manager to recreate VG oc delete pod -n openshift-lvm-storage -l app.kubernetes.io/component=vg-manager -
Pod Crashes:
# After fixing underlying issue, restart failed pods oc delete pod -n openshift-lvm-storage {pod-name} -
LVMCluster Not Ready:
# Review and fix device configuration oc edit lvmcluster -n openshift-lvm-storage # Ensure devices match actual available devices
WARNING Issues (Address Soon):
-
Capacity Issues:
# Check logical volume usage oc debug node/{node} -- chroot /host lvs --units g # Remove unused volumes or expand thin pool -
Partial Node Coverage:
# Investigate why daemonsets not on all nodes oc get nodes --show-labels oc describe daemonset -n openshift-lvm-storage
Step 6: Provide Next Steps
Always provide clear next steps:
-
Review logs (if available in must-gather):
- Operator logs:
namespaces/openshift-lvm-storage/pods/lvms-operator-*/logs/ - VG-manager logs:
namespaces/openshift-lvm-storage/pods/vg-manager-*/logs/ - TopoLVM logs:
namespaces/openshift-lvm-storage/pods/topolvm-*/logs/
- Operator logs:
-
Verify fixes (if cluster is accessible):
# After implementing fixes, verify: oc get lvmcluster -n openshift-lvm-storage oc get lvmvolumegroup -A oc get pvc -A | grep Pending -
Re-collect must-gather (if making changes):
oc adm must-gather --image=quay.io/lvms_dev/lvms-must-gather:latest
Error Handling
Script Execution Errors
Script not found:
# Verify script exists ls plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py # Ensure it's executable chmod +x plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
Python dependencies missing:
# Install PyYAML pip install pyyaml # Or use pip3 pip3 install pyyaml
Invalid YAML in must-gather:
- Script handles YAML parsing errors gracefully
- Reports which files failed to parse
- Continues analysis with available data
Must-Gather Issues
Missing directories:
- Script validates required directories exist
- Reports missing components
- Provides guidance on what's missing
Incomplete must-gather:
- If critical resources missing, script reports what it can analyze
- Suggests re-collecting must-gather
Examples
Example 1: Full Analysis
# Run comprehensive analysis python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ ./must-gather/registry-ci-openshift-org-origin-4-18.../
Output:
═══════════════════════════════════════════════════════════ LVMCLUSTER STATUS ═══════════════════════════════════════════════════════════ LVMCluster: lvmcluster-sample ❌ State: Failed ❌ Ready: false ... ═══════════════════════════════════════════════════════════ LVMS ANALYSIS SUMMARY ═══════════════════════════════════════════════════════════ ❌ CRITICAL ISSUES: 3 - LVMCluster not Ready (state: Failed) - Volume group vg1 not created on worker-0 - 3 PVCs stuck in Pending state
Example 2: Storage-Only Analysis
# Focus on PVC issues python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ ./must-gather/... --component storage
Analyzes only:
- PVC/PV status
- Storage class configuration
- Volume provisioning issues
Example 3: Operator Health Check
# Check operator components python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \ ./must-gather/... --component operator
Analyzes only:
- LVMCluster resource
- Deployments and daemonsets
- Pod status and crashes
Best Practices
-
Always validate path first:
- Check for
directorynamespaces/openshift-lvm-storage/ - Use the correct subdirectory, not parent
- Check for
-
Run full analysis first:
- Get overall health picture
- Then drill down with component-specific analysis if needed
-
Correlate issues:
- Look for patterns across components
- Connect pod failures to VG issues to PVC problems
-
Check timestamps:
- Events and pod restarts have timestamps
- Helps understand sequence of failures
-
Provide actionable output:
- Don't just list issues
- Explain root causes
- Give specific remediation steps
- Include verification commands
-
Reference documentation:
- Link to LVMS troubleshooting guide
- Point to relevant sections in must-gather logs