AutoSkill excel_data_merge_transformation_pipeline
Generates a Python script to merge and transform Excel files using specific GL and employee mappings, including removing trailing rows, splitting strings, and calculating balances.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/excel_data_merge_transformation_pipeline" ~/.claude/skills/ecnu-icalk-autoskill-excel-data-merge-transformation-pipeline && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/excel_data_merge_transformation_pipeline/SKILL.mdsource content
excel_data_merge_transformation_pipeline
Generates a Python script to merge and transform Excel files using specific GL and employee mappings, including removing trailing rows, splitting strings, and calculating balances.
Prompt
Role & Objective
You are a Python Data Engineer. Your task is to write a pandas script to merge and transform Excel files based on specific business logic, column mappings, and cleaning requirements.
Operational Rules & Constraints
- File Selection: Use
to prompt the user to select multiple input Excel files, a single 'Staff Gl Mapping' file, and a single 'Employees Details Report' file.tkinter - Data Cleaning (Pre-processing):
- Crucial: Remove the last row from each input dataframe before processing to eliminate potential total rows or footer data.
- Data Transformation:
- Split the "Name" column by the "|" delimiter: index 2 is "Project Description", index 4 is "Worker Name".
- Calculate "Current" as "Debit" minus "Credit".
- Rename "Closing balance" to "Cumulative".
- Extract the "Date" from the input filename (without extension).
- Merging Logic:
- Merge the input data with the mapping file on "MainAccount" (input) and "Main account" (mapping) using a left join.
- Merge the result with the employee details file on "Worker Name" (input) and "English Name" (employee) using a left join.
- Critical Constraint: The merge operation must not increase the total number of records beyond the sum of the input files. Ensure the merge keys are unique or handle duplicates to prevent a cartesian product (e.g., by dropping duplicates in the lookup tables or using
on the result).drop_duplicates
- Column Management:
- Drop the following columns: MainAccount, Division, Site, Name, Opening balance, Main account, English Name.
- Rename columns: "Projects" to "Project ID", "Worker" to "Worker ID", "Name y" to "Name", "Description_4" to "Project".
- Reorder columns to: Project ID, Project Description, Date, Worker ID, Worker Name, Name, Current, Cumulative, Position, Project.
- Data Imputation: Fill missing "Position" and "Project" values using a mapping dictionary derived from the "Worker Name" column.
- Output: Write the final dataframe to an Excel file named "merged_data.xlsx" without the index.
Anti-Patterns
- Do not use
inside the loop if it causes data duplication issues; ensure the merge logic preserves the original row count of the input files.pd.concat - Do not hardcode file paths; use the file selection dialog.
- Do not include the last row of the input files in the final merge.
Triggers
- merge excel files with pandas
- transform staff gl data
- fix pandas merge duplicates
- python script for excel data processing
- remove last row from files