Skilllibrary csv-ready
Generate well-formed CSV files with correct encoding, delimiter selection, field escaping, consistent headers, and streaming for large datasets. Use when producing CSV output, converting data to CSV format, fixing CSV encoding issues, or validating CSV structure. Do not use for Excel-specific features (prefer xlsx-generation) or parsing CSV input from external sources.
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/15-docs-artifacts-media/csv-ready" ~/.claude/skills/merceralex397-collab-skilllibrary-csv-ready && rm -rf "$T"
manifest:
15-docs-artifacts-media/csv-ready/SKILL.mdsource content
Purpose
Generate well-formed CSV files with correct encoding, proper delimiter selection, robust field escaping, consistent header naming conventions, and streaming support for large datasets. This skill ensures CSV output is portable across Excel, Google Sheets, database import tools, and downstream data pipelines.
When to use this skill
- The task requires producing a
file from structured or semi-structured data.csv - Data must be exported in a tabular format consumable by spreadsheets or ETL pipelines
- CSV encoding issues need diagnosis or repair (BOM, UTF-8 vs Latin-1, mojibake)
- Header normalization is needed (snake_case, deduplication, type suffixes)
- A large dataset (>100K rows) needs streamed output instead of in-memory generation
- Existing CSV output has escaping bugs (unquoted commas, embedded newlines, bare quotes)
Do not use this skill when
- The output requires Excel-specific features like formulas, cell formatting, or multiple sheets — prefer
xlsx-generation - The task is parsing or reading CSV input from external sources rather than generating output
- The data target is JSON, Parquet, or another non-CSV structured format
- Image or binary data is involved in the output
Operating procedure
- Identify the target consumer — determine whether the CSV will be consumed by Excel, a database COPY command, a Python pandas pipeline, or a web download. This drives encoding and delimiter choices.
- Select encoding — default to UTF-8 with BOM (
) for Excel compatibility. Use plain UTF-8 for programmatic consumers. Use Latin-1 only when explicitly required by a legacy system.utf-8-sig - Choose the delimiter — use comma (
) as default. Switch to tab (,
) for fields that frequently contain commas (addresses, descriptions). Switch to pipe (\t
) only for legacy mainframe integrations. Document the choice in the file header or metadata.| - Normalize headers — convert all column names to
, strip leading/trailing whitespace, replace spaces and special characters with underscores, and deduplicate by appendingsnake_case
,_2
suffixes._3 - Apply field escaping — quote any field containing the delimiter character, double-quote characters, or newlines. Use RFC 4180 double-quote escaping (embed
for literal quotes). Never use backslash escaping."" - Handle null and empty values — represent SQL NULL as an empty unquoted field. Represent an actual empty string as
. Document the convention in output metadata."" - Validate row consistency — confirm every row has exactly the same number of fields as the header row. Reject or quarantine rows with field-count mismatches and log the line numbers.
- Stream large datasets — for datasets exceeding 50K rows, write rows incrementally using
(Python), a streaming writer, or chunked output. Never build the entire file in memory.csv.writer - Add line-ending consistency — use CRLF (
) for Windows/Excel targets, LF (\r\n
) for Unix pipelines. Do not mix line endings within a single file.\n - Validate the output — open the generated file in a secondary parser (e.g.,
round-trip) to confirm field counts, encoding, and escaping are correct. Spot-check the first 5 and last 5 rows.csv.reader - Document the file — include a sidecar
or inline comment row (if supported by the consumer) specifying: encoding, delimiter, null convention, row count, and generation timestamp..meta
Decision rules
- When in doubt about delimiter, default to comma with proper quoting over switching delimiters.
- Prefer RFC 4180 compliance over custom escaping schemes.
- If the consumer is unknown, produce UTF-8-BOM + comma + CRLF as the safest combination.
- If any field contains binary data or multi-line HTML, flag it for a different format (JSON or XLSX) rather than forcing it into CSV.
- For datasets with >50 columns, consider whether CSV is the right format — suggest Parquet or XLSX if column count makes CSV unwieldy.
Output requirements
- CSV file — well-formed file passing RFC 4180 validation
- Header row — normalized snake_case column names as the first row
- Encoding declaration — explicit encoding stated in filename convention or sidecar metadata
- Row count summary — total rows written (excluding header), logged to stdout or metadata
- Validation report — confirmation that round-trip parsing succeeded with zero field-count errors
References
- RFC 4180: Common Format and MIME Type for CSV Files — https://tools.ietf.org/html/rfc4180
- Python
module documentation — https://docs.python.org/3/library/csv.htmlcsv - W3C CSV on the Web recommendations — https://www.w3.org/TR/tabular-data-primer/
Related skills
— when Excel-specific features (formulas, formatting, sheets) are neededxlsx-generation
— when source data comes from HTML or PDF tablestable-extraction
— when upstream input is unstructured text or documentsdocument-to-structured-data
Anti-patterns
- Skipping quoting because "the data looks clean" — always apply RFC 4180 quoting rules regardless of apparent data content.
- Using
instead of a CSV library — this silently breaks on fields containing commas, quotes, or newlines.str.join(",") - Mixing encodings — concatenating UTF-8 and Latin-1 sources without re-encoding produces mojibake.
- Hardcoding LF line endings for Excel consumers — Excel on Windows requires CRLF for reliable column detection.
- Omitting BOM for UTF-8 Excel files — without BOM, Excel may misinterpret UTF-8 as ANSI encoding.
Failure handling
- If encoding detection fails, default to UTF-8 and log a warning listing the byte sequences that triggered the failure.
- If a row has a field-count mismatch, quarantine the row to a separate
file with the original line number and write valid rows to the primary output..errors.csv - If the dataset exceeds available memory during generation, switch to chunked streaming and retry.
- If the target consumer rejects the file, inspect delimiter detection by opening in a hex editor and checking for mixed line endings or BOM issues.
- If the task requires features beyond CSV capabilities (merged cells, formulas), redirect to
.xlsx-generation