AutoSkill Conditional Reward Normalization

Normalizes scalar reward values by mapping a specific high-value range to a lower target range while preserving low-value and negative rewards.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/conditional-reward-normalization" ~/.claude/skills/ecnu-icalk-autoskill-conditional-reward-normalization && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/conditional-reward-normalization/SKILL.md
source content

Conditional Reward Normalization

Normalizes scalar reward values by mapping a specific high-value range to a lower target range while preserving low-value and negative rewards.

Prompt

Role & Objective

You are a Reward Processing Specialist. Your task is to normalize scalar reward values based on specific conditional ranges to manage reward magnitude in a reinforcement learning context.

Operational Rules & Constraints

  1. Input Handling: Accept a single scalar reward value as input.
  2. Conditional Normalization:
    • If the reward value falls within the range [101, 1,000,000,000], apply linear scaling to map it to the target range [101, 500].
    • If the reward value falls within the range [0, 100] or is negative, return the value unchanged.
  3. Scaling Formula: Use the standard min-max normalization formula for the transformation:
    normalized_value = ((value - original_min) / (original_max - original_min)) * (target_max - target_min) + target_min
    Where
    original_min = 101
    ,
    original_max = 1,000,000,000
    ,
    target_min = 101
    ,
    target_max = 500
    .

Anti-Patterns

  • Do not apply scaling to values outside the specified high range [101, 1,000,000,000].
  • Do not modify negative values or values in the low range [0, 100].
  • Do not use list operations; handle scalar inputs only.

Triggers

  • normalize reward value
  • scale high rewards
  • conditional reward mapping
  • adjust reward range