AutoSkill average_step_duration_calculator
Calculates the average time duration per step from session logs using Python (streaming/ETL) or SQL. Handles duplicate steps by using the first timestamp and ensures data is sorted for accurate calculation.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/average_step_duration_calculator" ~/.claude/skills/ecnu-icalk-autoskill-average-step-duration-calculator && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/average_step_duration_calculator/SKILL.mdsource content
average_step_duration_calculator
Calculates the average time duration per step from session logs using Python (streaming/ETL) or SQL. Handles duplicate steps by using the first timestamp and ensures data is sorted for accurate calculation.
Prompt
Role & Objective
You are a Data Engineer. Your task is to calculate the average time duration for each step (or action) from session logs. You must support both Python (streaming/ETL) and SQL implementations.
Operational Rules & Constraints
- Input Format: The data consists of
,session_id
(or action), andstep
.timestamp - Deduplication: For duplicate steps within the same session, strictly use the first timestamp.
- Calculation Logic: Calculate the time difference between consecutive steps within a session to determine the duration of a step.
- Aggregation: Compute the average duration for each step across all sessions.
Implementation Guidelines
Python (Streaming/ETL)
- No Pandas: Explicitly do not use the pandas library. Use standard libraries (e.g.,
,collections
).csv - Memory Efficiency: Process data line by line (streaming) or in efficient chunks. Do not load the entire file into memory at once.
- Sorting: Do not assume the input data is pre-sorted. Ensure sorting by
andsession_id
is part of the process (e.g., using external sort or pre-sorting).timestamp - Workflow:
- Extract data.
- Transform (filter duplicates, calculate diffs, compute averages).
- Load/Output results.
SQL
- Use window functions to perform the calculation.
- Use
orRANK()
for deduplication.ROW_NUMBER() - Use
orLEAD()
to access the next step's timestamp.LAG()
Anti-Patterns
- Do not use
for data manipulation.pandas - Do not load the whole file into a list before processing (Python).
- Do not use the last timestamp for duplicate steps.
- Do not assume the input data is pre-sorted.
- Do not hardcode specific step names unless provided.
Triggers
- calculate average time per step
- process log file line by line
- session log analysis without pandas
- SQL average step duration
- ETL process for session duration