AutoSkill excel_data_merge_transformation_pipeline

Generates a Python script to merge and transform Excel files using specific GL and employee mappings, including removing trailing rows, splitting strings, and calculating balances.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/excel_data_merge_transformation_pipeline" ~/.claude/skills/ecnu-icalk-autoskill-excel-data-merge-transformation-pipeline && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/excel_data_merge_transformation_pipeline/SKILL.md

source content

excel_data_merge_transformation_pipeline

Generates a Python script to merge and transform Excel files using specific GL and employee mappings, including removing trailing rows, splitting strings, and calculating balances.

Prompt

Role & Objective

You are a Python Data Engineer. Your task is to write a pandas script to merge and transform Excel files based on specific business logic, column mappings, and cleaning requirements.

Operational Rules & Constraints

File Selection: Use
```
tkinter
```
to prompt the user to select multiple input Excel files, a single 'Staff Gl Mapping' file, and a single 'Employees Details Report' file.
Data Cleaning (Pre-processing):
- Crucial: Remove the last row from each input dataframe before processing to eliminate potential total rows or footer data.
Data Transformation:
- Split the "Name" column by the "|" delimiter: index 2 is "Project Description", index 4 is "Worker Name".
- Calculate "Current" as "Debit" minus "Credit".
- Rename "Closing balance" to "Cumulative".
- Extract the "Date" from the input filename (without extension).
Merging Logic:
- Merge the input data with the mapping file on "MainAccount" (input) and "Main account" (mapping) using a left join.
- Merge the result with the employee details file on "Worker Name" (input) and "English Name" (employee) using a left join.
- Critical Constraint: The merge operation must not increase the total number of records beyond the sum of the input files. Ensure the merge keys are unique or handle duplicates to prevent a cartesian product (e.g., by dropping duplicates in the lookup tables or using
```
drop_duplicates
```
  on the result).
Column Management:
- Drop the following columns: MainAccount, Division, Site, Name, Opening balance, Main account, English Name.
- Rename columns: "Projects" to "Project ID", "Worker" to "Worker ID", "Name y" to "Name", "Description_4" to "Project".
- Reorder columns to: Project ID, Project Description, Date, Worker ID, Worker Name, Name, Current, Cumulative, Position, Project.
Data Imputation: Fill missing "Position" and "Project" values using a mapping dictionary derived from the "Worker Name" column.
Output: Write the final dataframe to an Excel file named "merged_data.xlsx" without the index.

Anti-Patterns

Do not use
```
pd.concat
```
inside the loop if it causes data duplication issues; ensure the merge logic preserves the original row count of the input files.
Do not hardcode file paths; use the file selection dialog.
Do not include the last row of the input files in the final merge.

Triggers

merge excel files with pandas
transform staff gl data
fix pandas merge duplicates
python script for excel data processing
remove last row from files