Genmz-shop file-converter
Convert, merge, split, and compress files across formats — documents, images, audio, and more.
git clone https://github.com/Darsh20009/genmz-shop
T=$(mktemp -d) && git clone --depth=1 https://github.com/Darsh20009/genmz-shop "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.local/secondary_skills/file-converter" ~/.claude/skills/darsh20009-genmz-shop-file-converter && rm -rf "$T"
.local/secondary_skills/file-converter/SKILL.mdFile Converter
Convert between data, document, image, audio formats, and ZIP archives. One-liners for each conversion pair.
Tool Map
| Domain | Tool | Install |
|---|---|---|
| CSV/JSON/Excel/Parquet |
pandas | pip install pandas openpyxl pyarrow |
| YAML |
pyyaml | pip install pyyaml |
| XML ↔ dict |
xmltodict | pip install xmltodict |
| Any doc format ↔ any | pandoc (CLI) |
apt install pandoc or pip install pypandoc_binary |
| Markdown → HTML |
markdown | pip install markdown |
| HTML → Markdown |
markdownify | pip install markdownify |
| .docx read/write |
python-docx | pip install python-docx |
| PDF → text/tables |
pdfplumber | pip install pdfplumber |
| PDF → images |
pdf2image | pip install pdf2image + apt install poppler-utils |
| PDF manipulation |
pypdf | pip install pypdf |
| Images |
Pillow | pip install Pillow |
| SVG → PNG |
cairosvg | pip install cairosvg |
| HEIC → JPG |
pillow-heif | pip install pillow-heif |
| Audio formats |
pydub | pip install pydub + apt install ffmpeg |
| EPUB ↔ other | pandoc or
ebooklib | pip install ebooklib |
| HTML → PDF |
weasyprint | pip install weasyprint |
| GIF creation |
Pillow or imageio | pip install imageio[ffmpeg] |
| PDF → SVG |
pdf2image + potrace or pymupdf | pip install pymupdf |
| ZIP archives |
zipfile (stdlib) | built-in, no install needed |
File Input Handling
When a user wants to convert a file that can't be attached directly in the chat (e.g.,
.heic, .flac, .epub, .psd, .m4a, .wma, .parquet), ask them to upload it to the project's file system. Uploaded files typically appear in attached_assets/ or the project root. Always check both locations. If the file isn't found, ask the user where they saved it.
Common unsupported-in-chat but convertible formats:
.heic, .avif, .webp, .flac, .ogg, .m4a, .wma, .aiff, .epub, .parquet, .psd, .svg, .zip
Data Formats
import pandas as pd, json, yaml, xmltodict # --- CSV ↔ JSON --- pd.read_csv("in.csv").to_json("out.json", orient="records", indent=2) pd.read_json("in.json").to_csv("out.csv", index=False) # --- CSV → Excel / Excel → CSV --- pd.read_csv("in.csv").to_excel("out.xlsx", index=False, engine="openpyxl") pd.read_excel("in.xlsx", sheet_name="Sheet1").to_csv("out.csv", index=False) # All sheets: pd.read_excel("in.xlsx", sheet_name=None) → dict of DataFrames # --- CSV → Parquet (columnar, compressed) --- pd.read_csv("in.csv").to_parquet("out.parquet", engine="pyarrow", compression="snappy") # --- YAML ↔ JSON --- data = yaml.safe_load(open("in.yaml")) \# ALWAYS safe_load, never load() json.dump(data, open("out.json", "w"), indent=2) yaml.safe_dump(json.load(open("in.json")), open("out.yaml", "w"), sort_keys=False) # --- XML ↔ JSON --- data = xmltodict.parse(open("in.xml").read()) json.dump(data, open("out.json", "w"), indent=2) open("out.xml", "w").write(xmltodict.unparse(data, pretty=True)) # --- JSONL (one JSON object per line) --- pd.read_json("in.jsonl", lines=True).to_csv("out.csv", index=False)
Encoding gotchas:
-
strips the BOM that Excel insertspd.read_csv("f.csv", encoding="utf-8-sig") -
Auto-detect:
import chardet; enc = chardet.detect(open("f.csv","rb").read())["encoding"] -
CSV delimiter sniffing:
pd.read_csv("f.csv", sep=None, engine="python")
Nested JSON → flat CSV:
pd.json_normalize(data, sep=".").to_csv("out.csv", index=False) \# {"a":{"b":1}} → column "a.b"
Document Formats — pandoc is the Swiss Army knife
# Markdown → PDF (requires LaTeX: apt install texlive-xetex) pandoc input.md -o output.pdf --pdf-engine=xelatex # Markdown → DOCX pandoc input.md -o output.docx # DOCX → Markdown (extracts images to ./media/) pandoc input.docx -o output.md --extract-media=. # HTML → Markdown pandoc input.html -o output.md -t gfm # Any → Any (pandoc supports ~40 formats) pandoc -f docx -t rst input.docx -o output.rst
# From Python import pypandoc pypandoc.convert_file("in.md", "docx", outputfile="out.docx")
Without pandoc (pure Python):
# Markdown → HTML import markdown html = markdown.markdown(open("in.md").read(), extensions=["tables", "fenced_code", "toc"]) # HTML → Markdown from markdownify import markdownify md = markdownify(html, heading_style="ATX") \# ATX = \# headers, not underlines
PDF Operations
# --- Extract text + tables --- import pdfplumber with pdfplumber.open("in.pdf") as pdf: text = "\n".join(p.extract_text() or "" for p in pdf.pages) tables = pdf.pages[0].extract_tables() \# list of list-of-rows # --- PDF → images (one PNG per page) --- from pdf2image import convert_from_path for i, img in enumerate(convert_from_path("in.pdf", dpi=200)): img.save(f"page_{i+1}.png") # --- Merge / split / rotate --- from pypdf import PdfReader, PdfWriter writer = PdfWriter() for path in ["a.pdf", "b.pdf"]: for page in PdfReader(path).pages: writer.add_page(page) writer.write("merged.pdf") # Extract pages 2–5 reader = PdfReader("in.pdf") writer = PdfWriter() for p in reader.pages[1:5]: writer.add_page(p) writer.write("pages_2-5.pdf")
PDF gotchas:
-
needspdf2image
installed system-wide (not a pip package)poppler-utils -
Scanned PDFs have no text layer — pdfplumber returns
. UseNone
OCR on pdf2image output.pytesseract -
is deprecated → usePyPDF2
(same API, maintained fork)pypdf
Image Formats
from PIL import Image # --- Basic conversion --- Image.open("in.png").convert("RGB").save("out.jpg", quality=90) # convert("RGB") is REQUIRED: JPEG can't store alpha channel, will raise OSError # --- WebP (best web format) --- Image.open("in.jpg").save("out.webp", quality=85, method=6) \# method 0-6, 6=best compression # --- AVIF (smallest, Pillow 11+) --- Image.open("in.jpg").save("out.avif", quality=75) # --- HEIC (iPhone photos) → JPG --- from pillow_heif import register_heif_opener register_heif_opener() Image.open("in.heic").convert("RGB").save("out.jpg", quality=90) # --- SVG → PNG --- import cairosvg cairosvg.svg2png(url="in.svg", write_to="out.png", output_width=1024) # --- Batch convert directory --- from pathlib import Path for p in Path("imgs").glob("*.png"): Image.open(p).convert("RGB").save(p.with_suffix(".jpg"), quality=85)
Image gotchas:
-
PNG → JPG: must
first or transparency crashes the saveconvert("RGB") -
for PNG is meaningless (lossless) — usequalityoptimize=True, compress_level=9 -
Pillow can't open
natively — use.svg
orcairosvgsvglib -
GIF → MP4 is a video operation:
ffmpeg -i in.gif -pix_fmt yuv420p out.mp4
Audio Formats
from pydub import AudioSegment # --- MP3 ↔ WAV --- AudioSegment.from_mp3("in.mp3").export("out.wav", format="wav") AudioSegment.from_wav("in.wav").export("out.mp3", format="mp3", bitrate="192k") # --- FLAC → MP3 --- AudioSegment.from_file("in.flac", format="flac").export("out.mp3", format="mp3", bitrate="320k") # --- OGG → MP3 --- AudioSegment.from_ogg("in.ogg").export("out.mp3", format="mp3", bitrate="192k") # --- M4A / AAC → MP3 --- AudioSegment.from_file("in.m4a", format="m4a").export("out.mp3", format="mp3", bitrate="256k") # --- Any → Any (pydub supports mp3, wav, ogg, flac, m4a, aac, wma, aiff) --- AudioSegment.from_file("in.wma", format="wma").export("out.flac", format="flac") # --- Trim audio (first 30 seconds) --- audio = AudioSegment.from_file("in.mp3") audio[:30000].export("first_30s.mp3", format="mp3") \# milliseconds # --- Merge / concatenate --- combined = AudioSegment.from_file("a.mp3") + AudioSegment.from_file("b.mp3") combined.export("merged.mp3", format="mp3") # --- Adjust volume --- audio = AudioSegment.from_file("in.mp3") louder = audio + 6 \# +6 dB quieter = audio - 6 \# -6 dB louder.export("louder.mp3", format="mp3") # --- Get audio info --- audio = AudioSegment.from_file("in.mp3") print(f"Duration: {len(audio)/1000:.1f}s, Channels: {audio.channels}, " f"Sample rate: {audio.frame_rate}Hz, Sample width: {audio.sample_width*8}bit") # --- Batch convert directory --- from pathlib import Path for p in Path("audio").glob("*.wav"): AudioSegment.from_wav(str(p)).export(p.with_suffix(".mp3"), format="mp3", bitrate="192k")
Audio gotchas:
-
requirespydub
installed system-wide for non-WAV formatsffmpeg -
Bitrate options: "128k" (small/low quality), "192k" (balanced), "256k" (high), "320k" (max for MP3)
-
WAV files are uncompressed — expect 10x larger file sizes than MP3
-
For sample rate conversion:
audio.set_frame_rate(44100).export("out.wav", format="wav") -
Mono to stereo:
/ Stereo to mono:audio.set_channels(2)audio.set_channels(1)
ZIP Archives
import zipfile from pathlib import Path # --- Create ZIP from files --- with zipfile.ZipFile("archive.zip", "w", zipfile.ZIP_DEFLATED) as zf: zf.write("file1.txt") zf.write("file2.csv") zf.write("images/photo.jpg") # --- Create ZIP from entire directory --- import shutil shutil.make_archive("archive", "zip", root_dir="my_folder") \# creates archive.zip # --- Extract all --- with zipfile.ZipFile("archive.zip", "r") as zf: zf.extractall("output_dir") # --- Extract single file --- with zipfile.ZipFile("archive.zip", "r") as zf: zf.extract("file1.txt", "output_dir") # --- List contents without extracting --- with zipfile.ZipFile("archive.zip", "r") as zf: for info in zf.infolist(): print(f"{info.filename} {info.file_size:,} bytes {info.compress_size:,} compressed") # --- Read file from ZIP without extracting --- with zipfile.ZipFile("archive.zip", "r") as zf: content = zf.read("file1.txt").decode("utf-8") # --- Add files to existing ZIP --- with zipfile.ZipFile("archive.zip", "a") as zf: zf.write("new_file.txt") # --- Create ZIP with password (read-only, use pyzipper for write) --- # pip install pyzipper import pyzipper with pyzipper.AESZipFile("secure.zip", "w", compression=pyzipper.ZIP_DEFLATED, encryption=pyzipper.WZ_AES) as zf: zf.setpassword(b"my_password") zf.write("secret.txt") # --- Batch: ZIP all PDFs in a directory --- with zipfile.ZipFile("all_pdfs.zip", "w", zipfile.ZIP_DEFLATED) as zf: for p in Path(".").glob("**/*.pdf"): zf.write(p)
ZIP gotchas:
-
is in Python's standard library — no install neededzipfile -
Always use
compression (default isZIP_DEFLATED
= no compression)ZIP_STORED -
For password-protected ZIPs, stdlib
can only read (not write) — usezipfile
for encrypted writespyzipper -
Max file size in standard ZIP is 4 GB; use
(default in Python 3) for larger filesallowZip64=True -
is the simplest way to ZIP an entire directory treeshutil.make_archive
EPUB Formats
# --- EPUB → other formats (via pandoc) --- # pandoc is the easiest way to convert EPUB import pypandoc # EPUB → Markdown pypandoc.convert_file("in.epub", "markdown", outputfile="out.md") # EPUB → HTML pypandoc.convert_file("in.epub", "html", outputfile="out.html") # EPUB → DOCX pypandoc.convert_file("in.epub", "docx", outputfile="out.docx") # EPUB → plain text pypandoc.convert_file("in.epub", "plain", outputfile="out.txt") # --- Other formats → EPUB --- # Markdown → EPUB pypandoc.convert_file("in.md", "epub", outputfile="out.epub", extra_args=["--metadata", "title=My Book"]) # HTML → EPUB pypandoc.convert_file("in.html", "epub", outputfile="out.epub", extra_args=["--metadata", "title=My Book"]) # DOCX → EPUB pypandoc.convert_file("in.docx", "epub", outputfile="out.epub")
# --- CLI equivalents --- pandoc in.epub -o out.md pandoc in.epub -o out.pdf --pdf-engine=xelatex pandoc in.md -o out.epub --metadata title="My Book" pandoc in.html -o out.epub --metadata title="My Book" --epub-cover-image=cover.jpg
# --- Pure Python: read/write EPUB with ebooklib --- from ebooklib import epub # Read EPUB book = epub.read_epub("in.epub") for item in book.get_items_of_type(9): \# 9 = ITEM_DOCUMENT (HTML chapters) print(item.get_name()) html_content = item.get_content().decode("utf-8") # Create EPUB from scratch book = epub.EpubBook() book.set_identifier("id123") book.set_title("My Book") book.set_language("en") book.add_author("Author Name") ch1 = epub.EpubHtml(title="Chapter 1", file_name="ch1.xhtml", lang="en") ch1.content = "<h1>Chapter 1</h1><p>Hello world.</p>" book.add_item(ch1) book.toc = [epub.Link("ch1.xhtml", "Chapter 1", "ch1")] book.add_item(epub.EpubNcx()) book.add_item(epub.EpubNav()) book.spine = ["nav", ch1] epub.write_epub("out.epub", book)
EPUB gotchas:
-
is the simplest for format-to-format EPUB conversionpandoc -
Always add
when creating EPUB — readers require a title--metadata title="..." -
EPUB is essentially a ZIP of HTML files —
gives you fine-grained controlebooklib -
For EPUB → PDF, pandoc needs a LaTeX engine (
)texlive-xetex -
Cover images: use
with pandoc--epub-cover-image=cover.jpg
HTML to PDF
# --- weasyprint (best CSS support, no browser needed) --- from weasyprint import HTML # Simple file conversion HTML("in.html").write_pdf("out.pdf") # From URL HTML("https://example.com").write_pdf("page.pdf") # From HTML string HTML(string="<h1>Hello</h1><p>World</p>").write_pdf("out.pdf") # With custom CSS HTML("in.html").write_pdf("out.pdf", stylesheets=["custom.css"]) # With page size and margins from weasyprint import CSS HTML("in.html").write_pdf("out.pdf", stylesheets=[ CSS(string="@page { size: A4; margin: 2cm; }") ]) # Landscape orientation HTML("in.html").write_pdf("out.pdf", stylesheets=[ CSS(string="@page { size: A4 landscape; margin: 1.5cm; }") ])
# --- CLI alternatives --- # pandoc (simpler, less CSS fidelity) pandoc in.html -o out.pdf --pdf-engine=xelatex # weasyprint CLI weasyprint in.html out.pdf weasyprint https://example.com page.pdf
HTML to PDF gotchas:
-
supports CSS3 including flexbox, grid, andweasyprint
rules — best for styled documents@page -
does NOT run JavaScript — for JS-heavy pages, useweasyprint
orplaywright
insteadpyppeteer -
For JS-rendered pages:
→playwright
is the most reliable optionpage.pdf() -
pandoc HTML → PDF goes through LaTeX, so complex CSS layouts may not render correctly
-
Large HTML files with many images: use
so relative image paths resolveHTML(filename="in.html", base_url=".")
GIF Creation
from PIL import Image import imageio.v3 as iio from pathlib import Path # --- Images → animated GIF (Pillow) --- frames = [Image.open(f"frame_{i}.png") for i in range(10)] frames[0].save("out.gif", save_all=True, append_images=frames[1:], duration=100, loop=0) \# duration in ms, loop=0 means infinite # --- Images → GIF with optimization --- frames = [Image.open(f"frame_{i}.png").convert("RGBA") for i in range(10)] frames[0].save("out.gif", save_all=True, append_images=frames[1:], duration=100, loop=0, optimize=True) # --- Directory of images → GIF --- frame_paths = sorted(Path("frames").glob("*.png")) frames = [Image.open(p) for p in frame_paths] frames[0].save("out.gif", save_all=True, append_images=frames[1:], duration=100, loop=0) # --- GIF → individual frames --- gif = Image.open("in.gif") for i in range(gif.n_frames): gif.seek(i) gif.save(f"frame_{i}.png") # --- Resize GIF --- gif = Image.open("in.gif") resized_frames = [] for i in range(gif.n_frames): gif.seek(i) resized_frames.append(gif.copy().resize((320, 240), Image.LANCZOS)) resized_frames[0].save("small.gif", save_all=True, append_images=resized_frames[1:], duration=gif.info.get("duration", 100), loop=0) # --- Video → GIF (imageio + ffmpeg) --- import imageio.v3 as iio frames = iio.imread("in.mp4", plugin="pyav") iio.imwrite("out.gif", frames, plugin="pillow", duration=40, loop=0) # --- GIF → MP4 (ffmpeg CLI, much smaller file) --- # ffmpeg -i in.gif -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" out.mp4
GIF gotchas:
-
GIF is limited to 256 colors per frame — complex images lose quality
-
Use
to reduce file size, but large GIFs are still huge compared to MP4optimize=True -
is per-frame in milliseconds (100ms = 10 FPS, 40ms = 25 FPS)duration -
means infinite loop;loop=0
plays once then stopsloop=1 -
For video → GIF, consider downscaling first — full-resolution GIFs are enormous
-
Pillow GIF output doesn't support transparency well — use
for better resultsimageio -
For best quality: create GIF from video with ffmpeg:
ffmpeg -i in.mp4 -vf "fps=15,scale=480:-1" out.gif
PDF to SVG
# --- PyMuPDF (fitz) — best quality, vector-preserving --- import fitz \# pip install pymupdf # Single page doc = fitz.open("in.pdf") page = doc[0] svg_text = page.get_svg_image() with open("page_1.svg", "w") as f: f.write(svg_text) # All pages doc = fitz.open("in.pdf") for i, page in enumerate(doc): svg_text = page.get_svg_image() with open(f"page_{i+1}.svg", "w") as f: f.write(svg_text) doc.close() # With custom resolution (matrix scales the output) doc = fitz.open("in.pdf") page = doc[0] mat = fitz.Matrix(2, 2) \# 2x scale for higher detail svg_text = page.get_svg_image(matrix=mat) with open("page_hires.svg", "w") as f: f.write(svg_text)
# --- CLI alternatives --- # pdf2svg (if installed) pdf2svg in.pdf out.svg 1 \# page number # Inkscape CLI inkscape in.pdf --export-type=svg --export-filename=out.svg # pdftocairo (from poppler-utils) pdftocairo -svg in.pdf out.svg
PDF to SVG gotchas:
-
(imported aspymupdf
) produces true vector SVGs — text stays as text, paths stay as pathsfitz -
Scanned PDFs produce SVGs with embedded raster images (no vector data to extract)
-
Large PDFs with complex graphics produce very large SVG files
-
CLI tool is simple but must be installed separately (pdf2svg
)apt install pdf2svg -
For rasterized SVG (simpler but not truly vector): render PDF to PNG first, then embed in SVG
Validation
Always verify output:
# Row count parity assert len(pd.read_csv("out.csv")) == len(pd.read_json("in.json")) # JSON well-formed json.load(open("out.json")) # Image opens Image.open("out.jpg").verify()