Genmz-shop file-converter

Convert, merge, split, and compress files across formats — documents, images, audio, and more.

install

source · Clone the upstream repo

git clone https://github.com/Darsh20009/genmz-shop

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Darsh20009/genmz-shop "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.local/secondary_skills/file-converter" ~/.claude/skills/darsh20009-genmz-shop-file-converter && rm -rf "$T"

manifest: .local/secondary_skills/file-converter/SKILL.md

source content

File Converter

Convert between data, document, image, audio formats, and ZIP archives. One-liners for each conversion pair.

Tool Map

| Domain | Tool | Install |

|---|---|---|

| CSV/JSON/Excel/Parquet |

pandas

pip install pandas openpyxl pyarrow

| YAML |

pyyaml

pip install pyyaml

| XML ↔ dict |

xmltodict

pip install xmltodict

| Any doc format ↔ any | pandoc (CLI) |

apt install pandoc

pip install pypandoc_binary

| Markdown → HTML |

markdown

pip install markdown

| HTML → Markdown |

markdownify

pip install markdownify

| .docx read/write |

python-docx

pip install python-docx

| PDF → text/tables |

pdfplumber

pip install pdfplumber

| PDF → images |

pdf2image

pip install pdf2image

apt install poppler-utils

| PDF manipulation |

pypdf

pip install pypdf

| Images |

Pillow

pip install Pillow

| SVG → PNG |

cairosvg

pip install cairosvg

| HEIC → JPG |

pillow-heif

pip install pillow-heif

| Audio formats |

pydub

pip install pydub

apt install ffmpeg

| EPUB ↔ other | pandoc or

ebooklib

pip install ebooklib

| HTML → PDF |

weasyprint

pip install weasyprint

| GIF creation |

Pillow

imageio

pip install imageio[ffmpeg]

| PDF → SVG |

pdf2image

potrace

pymupdf

pip install pymupdf

| ZIP archives |

zipfile

(stdlib) | built-in, no install needed |

File Input Handling

When a user wants to convert a file that can't be attached directly in the chat (e.g.,

.heic

.flac

.epub

.psd

.m4a

.wma

.parquet

), ask them to upload it to the project's file system. Uploaded files typically appear in

attached_assets/

or the project root. Always check both locations. If the file isn't found, ask the user where they saved it.

Common unsupported-in-chat but convertible formats:

.heic

.avif

.webp

.flac

.ogg

.m4a

.wma

.aiff

.epub

.parquet

.psd

.svg

.zip

Data Formats


import pandas as pd, json, yaml, xmltodict

# --- CSV ↔ JSON ---

pd.read_csv("in.csv").to_json("out.json", orient="records", indent=2)

pd.read_json("in.json").to_csv("out.csv", index=False)

# --- CSV → Excel / Excel → CSV ---

pd.read_csv("in.csv").to_excel("out.xlsx", index=False, engine="openpyxl")

pd.read_excel("in.xlsx", sheet_name="Sheet1").to_csv("out.csv", index=False)

# All sheets: pd.read_excel("in.xlsx", sheet_name=None) → dict of DataFrames

# --- CSV → Parquet (columnar, compressed) ---

pd.read_csv("in.csv").to_parquet("out.parquet", engine="pyarrow", compression="snappy")

# --- YAML ↔ JSON ---

data = yaml.safe_load(open("in.yaml")) \# ALWAYS safe_load, never load()

json.dump(data, open("out.json", "w"), indent=2)

yaml.safe_dump(json.load(open("in.json")), open("out.yaml", "w"), sort_keys=False)

# --- XML ↔ JSON ---

data = xmltodict.parse(open("in.xml").read())

json.dump(data, open("out.json", "w"), indent=2)

open("out.xml", "w").write(xmltodict.unparse(data, pretty=True))

# --- JSONL (one JSON object per line) ---

pd.read_json("in.jsonl", lines=True).to_csv("out.csv", index=False)

Encoding gotchas:

pd.read_csv("f.csv", encoding="utf-8-sig")

strips the BOM that Excel inserts

Auto-detect:

import chardet; enc = chardet.detect(open("f.csv","rb").read())["encoding"]

CSV delimiter sniffing:

pd.read_csv("f.csv", sep=None, engine="python")

Nested JSON → flat CSV:


pd.json_normalize(data, sep=".").to_csv("out.csv", index=False) \# {"a":{"b":1}} → column "a.b"

Document Formats — pandoc is the Swiss Army knife


# Markdown → PDF (requires LaTeX: apt install texlive-xetex)

pandoc input.md -o output.pdf --pdf-engine=xelatex

# Markdown → DOCX

pandoc input.md -o output.docx

# DOCX → Markdown (extracts images to ./media/)

pandoc input.docx -o output.md --extract-media=.

# HTML → Markdown

pandoc input.html -o output.md -t gfm

# Any → Any (pandoc supports ~40 formats)

pandoc -f docx -t rst input.docx -o output.rst


# From Python

import pypandoc

pypandoc.convert_file("in.md", "docx", outputfile="out.docx")

Without pandoc (pure Python):


# Markdown → HTML

import markdown

html = markdown.markdown(open("in.md").read(), extensions=["tables", "fenced_code", "toc"])

# HTML → Markdown

from markdownify import markdownify

md = markdownify(html, heading_style="ATX") \# ATX = \# headers, not underlines

PDF Operations


# --- Extract text + tables ---

import pdfplumber

with pdfplumber.open("in.pdf") as pdf:

text = "\n".join(p.extract_text() or "" for p in pdf.pages)

tables = pdf.pages[0].extract_tables() \# list of list-of-rows

# --- PDF → images (one PNG per page) ---

from pdf2image import convert_from_path

for i, img in enumerate(convert_from_path("in.pdf", dpi=200)):

img.save(f"page_{i+1}.png")

# --- Merge / split / rotate ---

from pypdf import PdfReader, PdfWriter

writer = PdfWriter()

for path in ["a.pdf", "b.pdf"]:

for page in PdfReader(path).pages:

writer.add_page(page)

writer.write("merged.pdf")

# Extract pages 2–5

reader = PdfReader("in.pdf")

writer = PdfWriter()

for p in reader.pages[1:5]:

writer.add_page(p)

writer.write("pages_2-5.pdf")

PDF gotchas:

```
pdf2image
```
needs
```
poppler-utils
```
installed system-wide (not a pip package)
Scanned PDFs have no text layer — pdfplumber returns
```
None
```
. Use
```
pytesseract
```
OCR on pdf2image output.
```
PyPDF2
```
is deprecated → use
```
pypdf
```
(same API, maintained fork)

Image Formats


from PIL import Image

# --- Basic conversion ---

Image.open("in.png").convert("RGB").save("out.jpg", quality=90)

# convert("RGB") is REQUIRED: JPEG can't store alpha channel, will raise OSError

# --- WebP (best web format) ---

Image.open("in.jpg").save("out.webp", quality=85, method=6) \# method 0-6, 6=best compression

# --- AVIF (smallest, Pillow 11+) ---

Image.open("in.jpg").save("out.avif", quality=75)

# --- HEIC (iPhone photos) → JPG ---

from pillow_heif import register_heif_opener

register_heif_opener()

Image.open("in.heic").convert("RGB").save("out.jpg", quality=90)

# --- SVG → PNG ---

import cairosvg

cairosvg.svg2png(url="in.svg", write_to="out.png", output_width=1024)

# --- Batch convert directory ---

from pathlib import Path

for p in Path("imgs").glob("*.png"):

Image.open(p).convert("RGB").save(p.with_suffix(".jpg"), quality=85)

Image gotchas:

PNG → JPG: must
```
convert("RGB")
```
first or transparency crashes the save
```
quality
```
for PNG is meaningless (lossless) — use
```
optimize=True, compress_level=9
```
Pillow can't open
```
.svg
```
natively — use
```
cairosvg
```
or
```
svglib
```

GIF → MP4 is a video operation:

ffmpeg -i in.gif -pix_fmt yuv420p out.mp4

Audio Formats


from pydub import AudioSegment

# --- MP3 ↔ WAV ---

AudioSegment.from_mp3("in.mp3").export("out.wav", format="wav")

AudioSegment.from_wav("in.wav").export("out.mp3", format="mp3", bitrate="192k")

# --- FLAC → MP3 ---

AudioSegment.from_file("in.flac", format="flac").export("out.mp3", format="mp3", bitrate="320k")

# --- OGG → MP3 ---

AudioSegment.from_ogg("in.ogg").export("out.mp3", format="mp3", bitrate="192k")

# --- M4A / AAC → MP3 ---

AudioSegment.from_file("in.m4a", format="m4a").export("out.mp3", format="mp3", bitrate="256k")

# --- Any → Any (pydub supports mp3, wav, ogg, flac, m4a, aac, wma, aiff) ---

AudioSegment.from_file("in.wma", format="wma").export("out.flac", format="flac")

# --- Trim audio (first 30 seconds) ---

audio = AudioSegment.from_file("in.mp3")

audio[:30000].export("first_30s.mp3", format="mp3") \# milliseconds

# --- Merge / concatenate ---

combined = AudioSegment.from_file("a.mp3") + AudioSegment.from_file("b.mp3")

combined.export("merged.mp3", format="mp3")

# --- Adjust volume ---

audio = AudioSegment.from_file("in.mp3")

louder = audio + 6 \# +6 dB

quieter = audio - 6 \# -6 dB

louder.export("louder.mp3", format="mp3")

# --- Get audio info ---

audio = AudioSegment.from_file("in.mp3")

print(f"Duration: {len(audio)/1000:.1f}s, Channels: {audio.channels}, "

f"Sample rate: {audio.frame_rate}Hz, Sample width: {audio.sample_width*8}bit")

# --- Batch convert directory ---

from pathlib import Path

for p in Path("audio").glob("*.wav"):

AudioSegment.from_wav(str(p)).export(p.with_suffix(".mp3"), format="mp3", bitrate="192k")

Audio gotchas:

```
pydub
```
requires
```
ffmpeg
```
installed system-wide for non-WAV formats
Bitrate options: "128k" (small/low quality), "192k" (balanced), "256k" (high), "320k" (max for MP3)
WAV files are uncompressed — expect 10x larger file sizes than MP3

For sample rate conversion:

audio.set_frame_rate(44100).export("out.wav", format="wav")

Mono to stereo:

audio.set_channels(2)

/ Stereo to mono:

audio.set_channels(1)

ZIP Archives


import zipfile

from pathlib import Path

# --- Create ZIP from files ---

with zipfile.ZipFile("archive.zip", "w", zipfile.ZIP_DEFLATED) as zf:

zf.write("file1.txt")

zf.write("file2.csv")

zf.write("images/photo.jpg")

# --- Create ZIP from entire directory ---

import shutil

shutil.make_archive("archive", "zip", root_dir="my_folder") \# creates archive.zip

# --- Extract all ---

with zipfile.ZipFile("archive.zip", "r") as zf:

zf.extractall("output_dir")

# --- Extract single file ---

with zipfile.ZipFile("archive.zip", "r") as zf:

zf.extract("file1.txt", "output_dir")

# --- List contents without extracting ---

with zipfile.ZipFile("archive.zip", "r") as zf:

for info in zf.infolist():

print(f"{info.filename} {info.file_size:,} bytes {info.compress_size:,} compressed")

# --- Read file from ZIP without extracting ---

with zipfile.ZipFile("archive.zip", "r") as zf:

content = zf.read("file1.txt").decode("utf-8")

# --- Add files to existing ZIP ---

with zipfile.ZipFile("archive.zip", "a") as zf:

zf.write("new_file.txt")

# --- Create ZIP with password (read-only, use pyzipper for write) ---

# pip install pyzipper

import pyzipper

with pyzipper.AESZipFile("secure.zip", "w", compression=pyzipper.ZIP_DEFLATED,

encryption=pyzipper.WZ_AES) as zf:

zf.setpassword(b"my_password")

zf.write("secret.txt")

# --- Batch: ZIP all PDFs in a directory ---

with zipfile.ZipFile("all_pdfs.zip", "w", zipfile.ZIP_DEFLATED) as zf:

for p in Path(".").glob("**/*.pdf"):

zf.write(p)

ZIP gotchas:

```
zipfile
```
is in Python's standard library — no install needed
Always use
```
ZIP_DEFLATED
```
compression (default is
```
ZIP_STORED
```
= no compression)
For password-protected ZIPs, stdlib
```
zipfile
```
can only read (not write) — use
```
pyzipper
```
for encrypted writes
Max file size in standard ZIP is 4 GB; use
```
allowZip64=True
```
(default in Python 3) for larger files
```
shutil.make_archive
```
is the simplest way to ZIP an entire directory tree

EPUB Formats


# --- EPUB → other formats (via pandoc) ---

# pandoc is the easiest way to convert EPUB

import pypandoc

# EPUB → Markdown

pypandoc.convert_file("in.epub", "markdown", outputfile="out.md")

# EPUB → HTML

pypandoc.convert_file("in.epub", "html", outputfile="out.html")

# EPUB → DOCX

pypandoc.convert_file("in.epub", "docx", outputfile="out.docx")

# EPUB → plain text

pypandoc.convert_file("in.epub", "plain", outputfile="out.txt")

# --- Other formats → EPUB ---

# Markdown → EPUB

pypandoc.convert_file("in.md", "epub", outputfile="out.epub",

extra_args=["--metadata", "title=My Book"])

# HTML → EPUB

pypandoc.convert_file("in.html", "epub", outputfile="out.epub",

extra_args=["--metadata", "title=My Book"])

# DOCX → EPUB

pypandoc.convert_file("in.docx", "epub", outputfile="out.epub")


# --- CLI equivalents ---

pandoc in.epub -o out.md

pandoc in.epub -o out.pdf --pdf-engine=xelatex

pandoc in.md -o out.epub --metadata title="My Book"

pandoc in.html -o out.epub --metadata title="My Book" --epub-cover-image=cover.jpg


# --- Pure Python: read/write EPUB with ebooklib ---

from ebooklib import epub

# Read EPUB

book = epub.read_epub("in.epub")

for item in book.get_items_of_type(9): \# 9 = ITEM_DOCUMENT (HTML chapters)

print(item.get_name())

html_content = item.get_content().decode("utf-8")

# Create EPUB from scratch

book = epub.EpubBook()

book.set_identifier("id123")

book.set_title("My Book")

book.set_language("en")

book.add_author("Author Name")

ch1 = epub.EpubHtml(title="Chapter 1", file_name="ch1.xhtml", lang="en")

ch1.content = "<h1>Chapter 1</h1><p>Hello world.</p>"

book.add_item(ch1)

book.toc = [epub.Link("ch1.xhtml", "Chapter 1", "ch1")]

book.add_item(epub.EpubNcx())

book.add_item(epub.EpubNav())

book.spine = ["nav", ch1]

epub.write_epub("out.epub", book)

EPUB gotchas:

```
pandoc
```
is the simplest for format-to-format EPUB conversion
Always add
```
--metadata title="..."
```
when creating EPUB — readers require a title
EPUB is essentially a ZIP of HTML files —
```
ebooklib
```
gives you fine-grained control
For EPUB → PDF, pandoc needs a LaTeX engine (
```
texlive-xetex
```
)
Cover images: use
```
--epub-cover-image=cover.jpg
```
with pandoc

HTML to PDF


# --- weasyprint (best CSS support, no browser needed) ---

from weasyprint import HTML

# Simple file conversion

HTML("in.html").write_pdf("out.pdf")

# From URL

HTML("https://example.com").write_pdf("page.pdf")

# From HTML string

HTML(string="<h1>Hello</h1><p>World</p>").write_pdf("out.pdf")

# With custom CSS

HTML("in.html").write_pdf("out.pdf", stylesheets=["custom.css"])

# With page size and margins

from weasyprint import CSS

HTML("in.html").write_pdf("out.pdf", stylesheets=[

CSS(string="@page { size: A4; margin: 2cm; }")

])

# Landscape orientation

HTML("in.html").write_pdf("out.pdf", stylesheets=[

CSS(string="@page { size: A4 landscape; margin: 1.5cm; }")

])


# --- CLI alternatives ---

# pandoc (simpler, less CSS fidelity)

pandoc in.html -o out.pdf --pdf-engine=xelatex

# weasyprint CLI

weasyprint in.html out.pdf

weasyprint https://example.com page.pdf

HTML to PDF gotchas:

```
weasyprint
```
supports CSS3 including flexbox, grid, and
```
@page
```
rules — best for styled documents
```
weasyprint
```
does NOT run JavaScript — for JS-heavy pages, use
```
playwright
```
or
```
pyppeteer
```
instead
For JS-rendered pages:
```
playwright
```
→
```
page.pdf()
```
is the most reliable option
pandoc HTML → PDF goes through LaTeX, so complex CSS layouts may not render correctly
Large HTML files with many images: use
```
HTML(filename="in.html", base_url=".")
```
so relative image paths resolve

GIF Creation


from PIL import Image

import imageio.v3 as iio

from pathlib import Path

# --- Images → animated GIF (Pillow) ---

frames = [Image.open(f"frame_{i}.png") for i in range(10)]

frames[0].save("out.gif", save_all=True, append_images=frames[1:],

duration=100, loop=0) \# duration in ms, loop=0 means infinite

# --- Images → GIF with optimization ---

frames = [Image.open(f"frame_{i}.png").convert("RGBA") for i in range(10)]

frames[0].save("out.gif", save_all=True, append_images=frames[1:],

duration=100, loop=0, optimize=True)

# --- Directory of images → GIF ---

frame_paths = sorted(Path("frames").glob("*.png"))

frames = [Image.open(p) for p in frame_paths]

frames[0].save("out.gif", save_all=True, append_images=frames[1:],

duration=100, loop=0)

# --- GIF → individual frames ---

gif = Image.open("in.gif")

for i in range(gif.n_frames):

gif.seek(i)

gif.save(f"frame_{i}.png")

# --- Resize GIF ---

gif = Image.open("in.gif")

resized_frames = []

for i in range(gif.n_frames):

gif.seek(i)

resized_frames.append(gif.copy().resize((320, 240), Image.LANCZOS))

resized_frames[0].save("small.gif", save_all=True, append_images=resized_frames[1:],

duration=gif.info.get("duration", 100), loop=0)

# --- Video → GIF (imageio + ffmpeg) ---

import imageio.v3 as iio

frames = iio.imread("in.mp4", plugin="pyav")

iio.imwrite("out.gif", frames, plugin="pillow", duration=40, loop=0)

# --- GIF → MP4 (ffmpeg CLI, much smaller file) ---

# ffmpeg -i in.gif -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" out.mp4

GIF gotchas:

GIF is limited to 256 colors per frame — complex images lose quality
Use
```
optimize=True
```
to reduce file size, but large GIFs are still huge compared to MP4
```
duration
```
is per-frame in milliseconds (100ms = 10 FPS, 40ms = 25 FPS)
```
loop=0
```
means infinite loop;
```
loop=1
```
plays once then stops
For video → GIF, consider downscaling first — full-resolution GIFs are enormous
Pillow GIF output doesn't support transparency well — use
```
imageio
```
for better results
For best quality: create GIF from video with ffmpeg:
```
ffmpeg -i in.mp4 -vf "fps=15,scale=480:-1" out.gif
```

PDF to SVG


# --- PyMuPDF (fitz) — best quality, vector-preserving ---

import fitz \# pip install pymupdf

# Single page

doc = fitz.open("in.pdf")

page = doc[0]

svg_text = page.get_svg_image()

with open("page_1.svg", "w") as f:

f.write(svg_text)

# All pages

doc = fitz.open("in.pdf")

for i, page in enumerate(doc):

svg_text = page.get_svg_image()

with open(f"page_{i+1}.svg", "w") as f:

f.write(svg_text)

doc.close()

# With custom resolution (matrix scales the output)

doc = fitz.open("in.pdf")

page = doc[0]

mat = fitz.Matrix(2, 2) \# 2x scale for higher detail

svg_text = page.get_svg_image(matrix=mat)

with open("page_hires.svg", "w") as f:

f.write(svg_text)


# --- CLI alternatives ---

# pdf2svg (if installed)

pdf2svg in.pdf out.svg 1 \# page number

# Inkscape CLI

inkscape in.pdf --export-type=svg --export-filename=out.svg

# pdftocairo (from poppler-utils)

pdftocairo -svg in.pdf out.svg

PDF to SVG gotchas:

```
pymupdf
```
(imported as
```
fitz
```
) produces true vector SVGs — text stays as text, paths stay as paths
Scanned PDFs produce SVGs with embedded raster images (no vector data to extract)
Large PDFs with complex graphics produce very large SVG files
```
pdf2svg
```
CLI tool is simple but must be installed separately (
```
apt install pdf2svg
```
)
For rasterized SVG (simpler but not truly vector): render PDF to PNG first, then embed in SVG

Validation

Always verify output:


# Row count parity

assert len(pd.read_csv("out.csv")) == len(pd.read_json("in.json"))

# JSON well-formed

json.load(open("out.json"))

# Image opens

Image.open("out.jpg").verify()