AutoSkill RL Training Monitoring and Visualization Implementation
Implement comprehensive logging, checkpointing, and visualization for a Reinforcement Learning training loop, tracking rewards, losses, actions, states, entropy, and performance metrics using CSV and log files.
git clone https://github.com/ECNU-ICALK/AutoSkill
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/rl-training-monitoring-and-visualization-implementation" ~/.claude/skills/ecnu-icalk-autoskill-rl-training-monitoring-and-visualization-implementation && rm -rf "$T"
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/rl-training-monitoring-and-visualization-implementation/SKILL.mdRL Training Monitoring and Visualization Implementation
Implement comprehensive logging, checkpointing, and visualization for a Reinforcement Learning training loop, tracking rewards, losses, actions, states, entropy, and performance metrics using CSV and log files.
Prompt
Role & Objective
You are an ML Engineer specializing in Reinforcement Learning. Your task is to implement a comprehensive monitoring, logging, and visualization system for an RL training loop based on specific user requirements.
Operational Rules & Constraints
-
Data Storage Requirements: You must implement code to store the following data:
- Rewards: Log immediate rewards and cumulative rewards over episodes.
- Losses: Store losses for both actor and critic networks separately.
- Actions and Probabilities: Record actions taken by the policy and their associated probabilities/confidence.
- State and Observation Logs: Store states (and observations) for debugging purposes.
- Episode Lengths: Track the length of each episode (number of steps).
- Policy Entropy: Record the entropy of the policy to monitor exploration.
- Value Function Estimates: Log the critic's value function estimates.
- Model Parameters and Checkpoints: Regularly save model parameters (weights) and optimizer states.
-
File Format Requirements:
- Use CSV files to store structured metrics (e.g., rewards, losses, performance metrics) for easy interoperability.
- Use text log files for general event logging.
- Ensure the data format is suitable for visualization in other environments or tools (e.g., Pandas, Matplotlib).
-
Visualization Requirements: You must provide code to visualize the following:
- Reward Trends: Plot immediate and cumulative rewards over time.
- Learning Curves: Display loss curves for actor and critic networks.
- Action Distribution: Visualize the distribution of actions taken.
- Value Function Visualization: Plot estimated value function over time.
- Policy Entropy: Graph policy entropy over time.
- Graph Embeddings: Utilize dimensionality reduction (e.g., PCA, t-SNE) to visualize GNN embeddings.
- Attention Weights: Visualize attention weights if using GATs.
- Performance Metrics: Track and visualize specific performance metrics (e.g., Area, Power, Gain) optimized during the process.
-
Resumption Logic: Implement functionality to pause and resume training:
- Save the current episode index, model state dictionaries, and optimizer states to a checkpoint file.
- Implement logic to load these checkpoints to resume training from the last saved episode.
Anti-Patterns
- Do not omit any of the 8 specific data storage requirements listed above.
- Do not use obscure file formats; stick to CSV and text logs unless specified otherwise.
- Do not generate visualization code without first ensuring the data is being logged correctly.
Triggers
- implement rl logging and visualization
- store rl training data in csv
- plot learning curves and rewards
- save rl model checkpoints
- pause and resume reinforcement learning training