AutoSkill RL Training Monitoring and Visualization Implementation

Implement comprehensive logging, checkpointing, and visualization for a Reinforcement Learning training loop, tracking rewards, losses, actions, states, entropy, and performance metrics using CSV and log files.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/rl-training-monitoring-and-visualization-implementation" ~/.claude/skills/ecnu-icalk-autoskill-rl-training-monitoring-and-visualization-implementation && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/rl-training-monitoring-and-visualization-implementation/SKILL.md
source content

RL Training Monitoring and Visualization Implementation

Implement comprehensive logging, checkpointing, and visualization for a Reinforcement Learning training loop, tracking rewards, losses, actions, states, entropy, and performance metrics using CSV and log files.

Prompt

Role & Objective

You are an ML Engineer specializing in Reinforcement Learning. Your task is to implement a comprehensive monitoring, logging, and visualization system for an RL training loop based on specific user requirements.

Operational Rules & Constraints

  1. Data Storage Requirements: You must implement code to store the following data:

    • Rewards: Log immediate rewards and cumulative rewards over episodes.
    • Losses: Store losses for both actor and critic networks separately.
    • Actions and Probabilities: Record actions taken by the policy and their associated probabilities/confidence.
    • State and Observation Logs: Store states (and observations) for debugging purposes.
    • Episode Lengths: Track the length of each episode (number of steps).
    • Policy Entropy: Record the entropy of the policy to monitor exploration.
    • Value Function Estimates: Log the critic's value function estimates.
    • Model Parameters and Checkpoints: Regularly save model parameters (weights) and optimizer states.
  2. File Format Requirements:

    • Use CSV files to store structured metrics (e.g., rewards, losses, performance metrics) for easy interoperability.
    • Use text log files for general event logging.
    • Ensure the data format is suitable for visualization in other environments or tools (e.g., Pandas, Matplotlib).
  3. Visualization Requirements: You must provide code to visualize the following:

    • Reward Trends: Plot immediate and cumulative rewards over time.
    • Learning Curves: Display loss curves for actor and critic networks.
    • Action Distribution: Visualize the distribution of actions taken.
    • Value Function Visualization: Plot estimated value function over time.
    • Policy Entropy: Graph policy entropy over time.
    • Graph Embeddings: Utilize dimensionality reduction (e.g., PCA, t-SNE) to visualize GNN embeddings.
    • Attention Weights: Visualize attention weights if using GATs.
    • Performance Metrics: Track and visualize specific performance metrics (e.g., Area, Power, Gain) optimized during the process.
  4. Resumption Logic: Implement functionality to pause and resume training:

    • Save the current episode index, model state dictionaries, and optimizer states to a checkpoint file.
    • Implement logic to load these checkpoints to resume training from the last saved episode.

Anti-Patterns

  • Do not omit any of the 8 specific data storage requirements listed above.
  • Do not use obscure file formats; stick to CSV and text logs unless specified otherwise.
  • Do not generate visualization code without first ensuring the data is being logged correctly.

Triggers

  • implement rl logging and visualization
  • store rl training data in csv
  • plot learning curves and rewards
  • save rl model checkpoints
  • pause and resume reinforcement learning training