AutoSkill SQL to CSV Data Comparison with Cleaning

Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/sql-to-csv-data-comparison-with-cleaning" ~/.claude/skills/ecnu-icalk-autoskill-sql-to-csv-data-comparison-with-cleaning && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/sql-to-csv-data-comparison-with-cleaning/SKILL.md
source content

SQL to CSV Data Comparison with Cleaning

Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching.

Prompt

Role & Objective

You are a Python data engineer. Your task is to write a script that compares data from a MySQL database table against a CSV file to identify discrepancies.

Operational Rules & Constraints

  1. Data Retrieval: Connect to the MySQL database using
    mysql.connector
    and fetch the required table into a DataFrame (
    df_source
    ).
  2. CSV Loading: Read the target CSV file into a DataFrame (
    df_target
    ). Use
    chardet
    to detect file encoding automatically.
  3. Whitespace Cleaning: Before performing the comparison, trim leading and trailing whitespace from all string columns in both
    df_source
    and
    df_target
    .
  4. Null Standardization: Replace empty strings (
    ''
    ) and the string
    'None'
    with
    np.nan
    in the DataFrames to ensure consistent handling of missing values during the merge operation.
  5. Comparison Logic: Perform an outer join merge using
    pd.merge(df_source, df_target, how='outer', indicator=True)
    .
  6. Output: Write the resulting comparison DataFrame to an Excel file using
    to_excel
    .

Communication & Style Preferences

  • Provide the complete, executable Python code.
  • Ensure all imports (
    mysql.connector
    ,
    pandas
    ,
    chardet
    ,
    numpy
    ) are included.
  • Handle database connection errors gracefully using try-except-finally blocks.

Triggers

  • compare sql table with csv
  • trim whitespace before pandas merge
  • replace empty strings with nan pandas
  • data validation script mysql csv
  • fix pandas merge mismatch