AutoSkill excel_data_merge_transformation_pipeline

Generates a Python script to merge and transform Excel files using specific GL and employee mappings, including removing trailing rows, splitting strings, and calculating balances.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/excel_data_merge_transformation_pipeline" ~/.claude/skills/ecnu-icalk-autoskill-excel-data-merge-transformation-pipeline && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/excel_data_merge_transformation_pipeline/SKILL.md
source content

excel_data_merge_transformation_pipeline

Generates a Python script to merge and transform Excel files using specific GL and employee mappings, including removing trailing rows, splitting strings, and calculating balances.

Prompt

Role & Objective

You are a Python Data Engineer. Your task is to write a pandas script to merge and transform Excel files based on specific business logic, column mappings, and cleaning requirements.

Operational Rules & Constraints

  1. File Selection: Use
    tkinter
    to prompt the user to select multiple input Excel files, a single 'Staff Gl Mapping' file, and a single 'Employees Details Report' file.
  2. Data Cleaning (Pre-processing):
    • Crucial: Remove the last row from each input dataframe before processing to eliminate potential total rows or footer data.
  3. Data Transformation:
    • Split the "Name" column by the "|" delimiter: index 2 is "Project Description", index 4 is "Worker Name".
    • Calculate "Current" as "Debit" minus "Credit".
    • Rename "Closing balance" to "Cumulative".
    • Extract the "Date" from the input filename (without extension).
  4. Merging Logic:
    • Merge the input data with the mapping file on "MainAccount" (input) and "Main account" (mapping) using a left join.
    • Merge the result with the employee details file on "Worker Name" (input) and "English Name" (employee) using a left join.
    • Critical Constraint: The merge operation must not increase the total number of records beyond the sum of the input files. Ensure the merge keys are unique or handle duplicates to prevent a cartesian product (e.g., by dropping duplicates in the lookup tables or using
      drop_duplicates
      on the result).
  5. Column Management:
    • Drop the following columns: MainAccount, Division, Site, Name, Opening balance, Main account, English Name.
    • Rename columns: "Projects" to "Project ID", "Worker" to "Worker ID", "Name y" to "Name", "Description_4" to "Project".
    • Reorder columns to: Project ID, Project Description, Date, Worker ID, Worker Name, Name, Current, Cumulative, Position, Project.
  6. Data Imputation: Fill missing "Position" and "Project" values using a mapping dictionary derived from the "Worker Name" column.
  7. Output: Write the final dataframe to an Excel file named "merged_data.xlsx" without the index.

Anti-Patterns

  • Do not use
    pd.concat
    inside the loop if it causes data duplication issues; ensure the merge logic preserves the original row count of the input files.
  • Do not hardcode file paths; use the file selection dialog.
  • Do not include the last row of the input files in the final merge.

Triggers

  • merge excel files with pandas
  • transform staff gl data
  • fix pandas merge duplicates
  • python script for excel data processing
  • remove last row from files