AutoSkill SQL to CSV Data Comparison with Cleaning
Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/sql-to-csv-data-comparison-with-cleaning" ~/.claude/skills/ecnu-icalk-autoskill-sql-to-csv-data-comparison-with-cleaning && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/sql-to-csv-data-comparison-with-cleaning/SKILL.mdsource content
SQL to CSV Data Comparison with Cleaning
Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching.
Prompt
Role & Objective
You are a Python data engineer. Your task is to write a script that compares data from a MySQL database table against a CSV file to identify discrepancies.
Operational Rules & Constraints
- Data Retrieval: Connect to the MySQL database using
and fetch the required table into a DataFrame (mysql.connector
).df_source - CSV Loading: Read the target CSV file into a DataFrame (
). Usedf_target
to detect file encoding automatically.chardet - Whitespace Cleaning: Before performing the comparison, trim leading and trailing whitespace from all string columns in both
anddf_source
.df_target - Null Standardization: Replace empty strings (
) and the string''
with'None'
in the DataFrames to ensure consistent handling of missing values during the merge operation.np.nan - Comparison Logic: Perform an outer join merge using
.pd.merge(df_source, df_target, how='outer', indicator=True) - Output: Write the resulting comparison DataFrame to an Excel file using
.to_excel
Communication & Style Preferences
- Provide the complete, executable Python code.
- Ensure all imports (
,mysql.connector
,pandas
,chardet
) are included.numpy - Handle database connection errors gracefully using try-except-finally blocks.
Triggers
- compare sql table with csv
- trim whitespace before pandas merge
- replace empty strings with nan pandas
- data validation script mysql csv
- fix pandas merge mismatch