Awesome-omni-skill Node Tuning Helper Scripts
Generate tuned manifests and evaluate node tuning snapshots
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/node-tuning-helper-scripts-majiayu000" ~/.claude/skills/diegosouzapw-awesome-omni-skill-node-tuning-helper-scripts && rm -rf "$T"
manifest:
skills/development/node-tuning-helper-scripts-majiayu000/SKILL.mdsource content
Node Tuning Helper Scripts
Detailed instructions for invoking the helper utilities that back
/node-tuning commands:
renders Tuned manifests (generate_tuned_profile.py
).tuned.openshift.io/v1
inspects live nodes or sosreports for tuning gaps.analyze_node_tuning.py
When to Use These Scripts
- Translate structured command inputs into Tuned manifests for the Node Tuning Operator.
- Iterate on generated YAML outside the assistant or integrate the generator into automation.
- Analyze CPU isolation, IRQ affinity, huge pages, sysctl values, and networking counters from live clusters or archived sosreports.
Prerequisites
- Python 3.8 or newer (
).python3 --version - Repository checkout so the scripts under
are accessible.plugins/node-tuning/skills/scripts/ - Optional:
CLI when validating or applying manifests.oc - Optional: Extracted sosreport directory when running the analysis script offline.
- Optional (remote analysis):
CLI access plus a validoc
when capturingKUBECONFIG
//proc
or sosreport via/sys
. The sosreport workflow pulls theoc debug node/<name>
image (override withregistry.redhat.io/rhel9/support-tools
or--toolbox-image
) and requires registry access. HTTP(S) proxy env vars from the host are forwarded automatically when present, but using a proxy is optional.TOOLBOX_IMAGE
Script: generate_tuned_profile.py
generate_tuned_profile.pyImplementation Steps
-
Collect Inputs
: Tuned resource name.--profile-name
:--summary
section summary.[main]- Repeatable options:
,--include
,--main-option
,--variable
,--sysctl
(--section
).SECTION:KEY=VALUE - Target selectors:
,--machine-config-label key=value
.--match-label key[=value] - Optional:
(default 20),--priority
,--namespace
,--output
.--dry-run - Use
/--list-nodes
to inspect nodes and--node-selector
(plus--label-node NODE:KEY[=VALUE]
) to tag machines.--overwrite-labels
-
Inspect or Label Nodes (optional)
# List all worker nodes python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py --list-nodes --node-selector "node-role.kubernetes.io/worker" --skip-manifest # Label a specific node for the worker-hp pool python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \ --label-node ip-10-0-1-23.ec2.internal:node-role.kubernetes.io/worker-hp= \ --overwrite-labels \ --skip-manifest -
Render the Manifest
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \ --profile-name "$PROFILE" \ --summary "$SUMMARY" \ --sysctl net.core.netdev_max_backlog=16384 \ --match-label tuned.openshift.io/custom-net \ --output .work/node-tuning/$PROFILE/tuned.yaml- Omit
to write--output
in the current directory.<profile-name>.yaml - Add
to print the manifest to stdout.--dry-run
- Omit
-
Review Output
- Inspect the generated YAML for accuracy.
- Optionally format with
or open in an editor for readability.yq
-
Validate and Apply
- Dry-run:
.oc apply --server-dry-run=client -f <manifest> - Apply:
.oc apply -f <manifest>
- Dry-run:
Error Handling
- Missing required options raise
with descriptive messages.ValueError - The script exits non-zero when no target selectors (
or--machine-config-label
) are supplied.--match-label - Invalid key/value or section inputs identify the failing argument explicitly.
Examples
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \ --profile-name realtime-worker \ --summary "Realtime tuned profile" \ --include openshift-node --include realtime \ --variable isolated_cores=1 \ --section bootloader:cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded} \ --machine-config-label machineconfiguration.openshift.io/role=worker-rt \ --priority 25 \ --output .work/node-tuning/realtime-worker/tuned.yaml
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \ --profile-name openshift-node-hugepages \ --summary "Boot time configuration for hugepages" \ --include openshift-node \ --section bootloader:cmdline_openshift_node_hugepages="hugepagesz=2M hugepages=50" \ --machine-config-label machineconfiguration.openshift.io/role=worker-hp \ --priority 30 \ --output .work/node-tuning/openshift-node-hugepages/hugepages-tuned-boottime.yaml
Script: analyze_node_tuning.py
analyze_node_tuning.pyPurpose
Inspect either a live node (
/proc, /sys) or an extracted sosreport snapshot for tuning signals (CPU isolation, IRQ affinity, huge pages, sysctl state, networking counters) and emit actionable recommendations.
Usage Patterns
- Live node analysis
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py --format markdown - Remote analysis via oc debug
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --node worker-rt-0 \ --kubeconfig ~/.kube/prod \ --format markdown - Collect sosreport via oc debug and analyze locally
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --node worker-rt-0 \ --toolbox-image registry.example.com/support-tools:latest \ --sosreport-arg "--case-id=01234567" \ --sosreport-output .work/node-tuning/sosreports \ --format json - Offline sosreport analysis
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --sosreport /path/to/sosreport-2025-10-20 - Automation-friendly JSON
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --sosreport /path/to/sosreport \ --format json --output .work/node-tuning/node-analysis.json
Implementation Steps
- Select data source
- Provide
(with optional--node <name>
/--kubeconfig
). By default the helper runs--oc-binary
remotely from inside the RHCOS toolbox container (sosreport
). Override the image withregistry.redhat.io/rhel9/support-tools
, extend the sosreport command with--toolbox-image
, or disable the curated OpenShift flags via--sosreport-arg
. Pass--skip-default-sosreport-flags
to fall back to the direct--no-collect-sosreport
snapshot mode./proc - Provide
for archived diagnostics; detection finds embedded--sosreport <dir>
andproc/
.sys/ - Omit both switches to query the live filesystem (defaults to
and/proc
)./sys - Override paths with
or--proc-root
when the layout differs.--sys-root
- Provide
- Run analysis
- The script parses
, kernel cmdline parameters (cpuinfo
,isolcpus
,nohz_full
), default IRQ affinities, huge page counters, sysctl values (net, vm, kernel), transparent hugepage settings,tuned.non_isolcpus
/netstat
counters, andsockstat
snapshots (when available in sosreport).ps
- The script parses
- Review the report
- Markdown output groups findings by section (System Overview, CPU & Isolation, Huge Pages, Sysctl Highlights, Network Signals, IRQ Affinity, Process Snapshot) and lists recommendations.
- JSON output contains the same information in structured form for pipelines or dashboards.
- Act on recommendations
- Apply Tuned profiles, MachineConfig updates, or manual sysctl/irqbalance adjustments.
- Feed actionable items back into
to codify desired state./node-tuning:generate-tuned-profile
Error Handling
- Missing
orproc/
directories trigger descriptive errors.sys/ - Unreadable files are skipped gracefully and noted in observations where relevant.
- Non-numeric sysctl values are flagged for manual investigation.
Example Output (Markdown excerpt)
# Node Tuning Analysis ## System Overview - Hostname: worker-rt-1 - Kernel: 4.18.0-477.el8 - NUMA nodes: 2 - Kernel cmdline: `BOOT_IMAGE=... isolcpus=2-15 tuned.non_isolcpus=0-1` ## CPU & Isolation - Logical CPUs: 32 - Physical cores: 16 across 2 socket(s) - SMT detected: yes - Isolated CPUs: 2-15 ... ## Recommended Actions - Configure net.core.netdev_max_backlog (>=32768) to accommodate bursty NIC traffic. - Transparent Hugepages are not disabled (`[never]` not selected). Consider setting to `never` for latency-sensitive workloads. - 4 IRQs overlap isolated CPUs. Relocate interrupt affinities using tuned profiles or irqbalance.
Follow-up Automation Ideas
- Persist JSON results in
for historical tracing..work/node-tuning/<host>/analysis.json - Gate upgrades by comparing recommendations across nodes.
- Integrate with CI jobs that validate cluster tuning post-change.