Awesome-omni-skill Node Tuning Helper Scripts

Generate tuned manifests and evaluate node tuning snapshots

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/node-tuning-helper-scripts-majiayu000" ~/.claude/skills/diegosouzapw-awesome-omni-skill-node-tuning-helper-scripts && rm -rf "$T"
manifest: skills/development/node-tuning-helper-scripts-majiayu000/SKILL.md
source content

Node Tuning Helper Scripts

Detailed instructions for invoking the helper utilities that back

/node-tuning
commands:

  • generate_tuned_profile.py
    renders Tuned manifests (
    tuned.openshift.io/v1
    ).
  • analyze_node_tuning.py
    inspects live nodes or sosreports for tuning gaps.

When to Use These Scripts

  • Translate structured command inputs into Tuned manifests for the Node Tuning Operator.
  • Iterate on generated YAML outside the assistant or integrate the generator into automation.
  • Analyze CPU isolation, IRQ affinity, huge pages, sysctl values, and networking counters from live clusters or archived sosreports.

Prerequisites

  • Python 3.8 or newer (
    python3 --version
    ).
  • Repository checkout so the scripts under
    plugins/node-tuning/skills/scripts/
    are accessible.
  • Optional:
    oc
    CLI when validating or applying manifests.
  • Optional: Extracted sosreport directory when running the analysis script offline.
  • Optional (remote analysis):
    oc
    CLI access plus a valid
    KUBECONFIG
    when capturing
    /proc
    /
    /sys
    or sosreport via
    oc debug node/<name>
    . The sosreport workflow pulls the
    registry.redhat.io/rhel9/support-tools
    image (override with
    --toolbox-image
    or
    TOOLBOX_IMAGE
    ) and requires registry access. HTTP(S) proxy env vars from the host are forwarded automatically when present, but using a proxy is optional.

Script:
generate_tuned_profile.py

Implementation Steps

  1. Collect Inputs

    • --profile-name
      : Tuned resource name.
    • --summary
      :
      [main]
      section summary.
    • Repeatable options:
      --include
      ,
      --main-option
      ,
      --variable
      ,
      --sysctl
      ,
      --section
      (
      SECTION:KEY=VALUE
      ).
    • Target selectors:
      --machine-config-label key=value
      ,
      --match-label key[=value]
      .
    • Optional:
      --priority
      (default 20),
      --namespace
      ,
      --output
      ,
      --dry-run
      .
    • Use
      --list-nodes
      /
      --node-selector
      to inspect nodes and
      --label-node NODE:KEY[=VALUE]
      (plus
      --overwrite-labels
      ) to tag machines.
  2. Inspect or Label Nodes (optional)

    # List all worker nodes
    python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py --list-nodes --node-selector "node-role.kubernetes.io/worker" --skip-manifest
    
    # Label a specific node for the worker-hp pool
    python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
      --label-node ip-10-0-1-23.ec2.internal:node-role.kubernetes.io/worker-hp= \
      --overwrite-labels \
      --skip-manifest
    
  3. Render the Manifest

    python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
      --profile-name "$PROFILE" \
      --summary "$SUMMARY" \
      --sysctl net.core.netdev_max_backlog=16384 \
      --match-label tuned.openshift.io/custom-net \
      --output .work/node-tuning/$PROFILE/tuned.yaml
    
    • Omit
      --output
      to write
      <profile-name>.yaml
      in the current directory.
    • Add
      --dry-run
      to print the manifest to stdout.
  4. Review Output

    • Inspect the generated YAML for accuracy.
    • Optionally format with
      yq
      or open in an editor for readability.
  5. Validate and Apply

    • Dry-run:
      oc apply --server-dry-run=client -f <manifest>
      .
    • Apply:
      oc apply -f <manifest>
      .

Error Handling

  • Missing required options raise
    ValueError
    with descriptive messages.
  • The script exits non-zero when no target selectors (
    --machine-config-label
    or
    --match-label
    ) are supplied.
  • Invalid key/value or section inputs identify the failing argument explicitly.

Examples

python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
  --profile-name realtime-worker \
  --summary "Realtime tuned profile" \
  --include openshift-node --include realtime \
  --variable isolated_cores=1 \
  --section bootloader:cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded} \
  --machine-config-label machineconfiguration.openshift.io/role=worker-rt \
  --priority 25 \
  --output .work/node-tuning/realtime-worker/tuned.yaml
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
  --profile-name openshift-node-hugepages \
  --summary "Boot time configuration for hugepages" \
  --include openshift-node \
  --section bootloader:cmdline_openshift_node_hugepages="hugepagesz=2M hugepages=50" \
  --machine-config-label machineconfiguration.openshift.io/role=worker-hp \
  --priority 30 \
  --output .work/node-tuning/openshift-node-hugepages/hugepages-tuned-boottime.yaml

Script:
analyze_node_tuning.py

Purpose

Inspect either a live node (

/proc
,
/sys
) or an extracted sosreport snapshot for tuning signals (CPU isolation, IRQ affinity, huge pages, sysctl state, networking counters) and emit actionable recommendations.

Usage Patterns

  • Live node analysis
    python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py --format markdown
    
  • Remote analysis via oc debug
    python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
      --node worker-rt-0 \
      --kubeconfig ~/.kube/prod \
      --format markdown
    
  • Collect sosreport via oc debug and analyze locally
    python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
      --node worker-rt-0 \
      --toolbox-image registry.example.com/support-tools:latest \
      --sosreport-arg "--case-id=01234567" \
      --sosreport-output .work/node-tuning/sosreports \
      --format json
    
  • Offline sosreport analysis
    python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
      --sosreport /path/to/sosreport-2025-10-20
    
  • Automation-friendly JSON
    python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
      --sosreport /path/to/sosreport \
      --format json --output .work/node-tuning/node-analysis.json
    

Implementation Steps

  1. Select data source
    • Provide
      --node <name>
      (with optional
      --kubeconfig
      /
      --oc-binary
      ). By default the helper runs
      sosreport
      remotely from inside the RHCOS toolbox container (
      registry.redhat.io/rhel9/support-tools
      ). Override the image with
      --toolbox-image
      , extend the sosreport command with
      --sosreport-arg
      , or disable the curated OpenShift flags via
      --skip-default-sosreport-flags
      . Pass
      --no-collect-sosreport
      to fall back to the direct
      /proc
      snapshot mode.
    • Provide
      --sosreport <dir>
      for archived diagnostics; detection finds embedded
      proc/
      and
      sys/
      .
    • Omit both switches to query the live filesystem (defaults to
      /proc
      and
      /sys
      ).
    • Override paths with
      --proc-root
      or
      --sys-root
      when the layout differs.
  2. Run analysis
    • The script parses
      cpuinfo
      , kernel cmdline parameters (
      isolcpus
      ,
      nohz_full
      ,
      tuned.non_isolcpus
      ), default IRQ affinities, huge page counters, sysctl values (net, vm, kernel), transparent hugepage settings,
      netstat
      /
      sockstat
      counters, and
      ps
      snapshots (when available in sosreport).
  3. Review the report
    • Markdown output groups findings by section (System Overview, CPU & Isolation, Huge Pages, Sysctl Highlights, Network Signals, IRQ Affinity, Process Snapshot) and lists recommendations.
    • JSON output contains the same information in structured form for pipelines or dashboards.
  4. Act on recommendations
    • Apply Tuned profiles, MachineConfig updates, or manual sysctl/irqbalance adjustments.
    • Feed actionable items back into
      /node-tuning:generate-tuned-profile
      to codify desired state.

Error Handling

  • Missing
    proc/
    or
    sys/
    directories trigger descriptive errors.
  • Unreadable files are skipped gracefully and noted in observations where relevant.
  • Non-numeric sysctl values are flagged for manual investigation.

Example Output (Markdown excerpt)

# Node Tuning Analysis

## System Overview
- Hostname: worker-rt-1
- Kernel: 4.18.0-477.el8
- NUMA nodes: 2
- Kernel cmdline: `BOOT_IMAGE=... isolcpus=2-15 tuned.non_isolcpus=0-1`

## CPU & Isolation
- Logical CPUs: 32
- Physical cores: 16 across 2 socket(s)
- SMT detected: yes
- Isolated CPUs: 2-15
...

## Recommended Actions
- Configure net.core.netdev_max_backlog (>=32768) to accommodate bursty NIC traffic.
- Transparent Hugepages are not disabled (`[never]` not selected). Consider setting to `never` for latency-sensitive workloads.
- 4 IRQs overlap isolated CPUs. Relocate interrupt affinities using tuned profiles or irqbalance.

Follow-up Automation Ideas

  • Persist JSON results in
    .work/node-tuning/<host>/analysis.json
    for historical tracing.
  • Gate upgrades by comparing recommendations across nodes.
  • Integrate with CI jobs that validate cluster tuning post-change.