DevHive-Cli file-converter

Convert files between formats including CSV, JSON, YAML, XML, Markdown, EPUB, image, audio formats, GIF creation, PDF to SVG, HTML to PDF, and ZIP archives.

install

source · Clone the upstream repo

git clone https://github.com/El3tar-cmd/DevHive-Cli

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/El3tar-cmd/DevHive-Cli "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/file-converter" ~/.claude/skills/el3tar-cmd-devhive-cli-file-converter && rm -rf "$T"

manifest: agents/file-converter/SKILL.md

source content

File Converter

Convert between data, document, image, audio formats, and ZIP archives. One-liners for each conversion pair.

Tool Map

Domain	Tool	Install
CSV/JSON/Excel/Parquet	`pandas`	`pip install pandas openpyxl pyarrow`
YAML	`pyyaml`	`pip install pyyaml`
XML ↔ dict	`xmltodict`	`pip install xmltodict`
Any doc format ↔ any	pandoc (CLI)	`apt install pandoc` or `pip install pypandoc_binary`
Markdown → HTML	`markdown`	`pip install markdown`
HTML → Markdown	`markdownify`	`pip install markdownify`
.docx read/write	`python-docx`	`pip install python-docx`
PDF → text/tables	`pdfplumber`	`pip install pdfplumber`
PDF → images	`pdf2image`	`pip install pdf2image` + `apt install poppler-utils`
PDF manipulation	`pypdf`	`pip install pypdf`
Images	`Pillow`	`pip install Pillow`
SVG → PNG	`cairosvg`	`pip install cairosvg`
HEIC → JPG	`pillow-heif`	`pip install pillow-heif`
Audio formats	`pydub`	`pip install pydub` + `apt install ffmpeg`
EPUB ↔ other	pandoc or `ebooklib`	`pip install ebooklib`
HTML → PDF	`weasyprint`	`pip install weasyprint`
GIF creation	`Pillow` or `imageio`	`pip install imageio[ffmpeg]`
PDF → SVG	`pdf2image` + `potrace` or `pymupdf`	`pip install pymupdf`
ZIP archives	`zipfile` (stdlib)	built-in, no install needed

Data Formats

import pandas as pd, json, yaml, xmltodict

# --- CSV ↔ JSON ---
pd.read_csv("in.csv").to_json("out.json", orient="records", indent=2)
pd.read_json("in.json").to_csv("out.csv", index=False)

# --- CSV → Excel / Excel → CSV ---
pd.read_csv("in.csv").to_excel("out.xlsx", index=False, engine="openpyxl")
pd.read_excel("in.xlsx", sheet_name="Sheet1").to_csv("out.csv", index=False)

# All sheets: pd.read_excel("in.xlsx", sheet_name=None) → dict of DataFrames

# --- CSV → Parquet (columnar, compressed) ---
pd.read_csv("in.csv").to_parquet("out.parquet", engine="pyarrow", compression="snappy")

# --- YAML ↔ JSON ---
data = yaml.safe_load(open("in.yaml"))          # ALWAYS safe_load, never load()
json.dump(data, open("out.json", "w"), indent=2)
yaml.safe_dump(json.load(open("in.json")), open("out.yaml", "w"), sort_keys=False)

# --- XML ↔ JSON ---
data = xmltodict.parse(open("in.xml").read())
json.dump(data, open("out.json", "w"), indent=2)
open("out.xml", "w").write(xmltodict.unparse(data, pretty=True))

# --- JSONL (one JSON object per line) ---
pd.read_json("in.jsonl", lines=True).to_csv("out.csv", index=False)

Encoding gotchas:

pd.read_csv("f.csv", encoding="utf-8-sig")

strips the BOM that Excel inserts

Auto-detect:

import chardet; enc = chardet.detect(open("f.csv","rb").read())["encoding"]

CSV delimiter sniffing:

pd.read_csv("f.csv", sep=None, engine="python")

Nested JSON → flat CSV:

pd.json_normalize(data, sep=".").to_csv("out.csv", index=False)  # {"a":{"b":1}} → column "a.b"

Document Formats — pandoc is the Swiss Army knife


# Markdown → PDF (requires LaTeX: apt install texlive-xetex)
pandoc input.md -o output.pdf --pdf-engine=xelatex

# Markdown → DOCX
pandoc input.md -o output.docx

# DOCX → Markdown (extracts images to ./media/)
pandoc input.docx -o output.md --extract-media=.

# HTML → Markdown
pandoc input.html -o output.md -t gfm

# Any → Any (pandoc supports ~40 formats)
pandoc -f docx -t rst input.docx -o output.rst


# From Python
import pypandoc
pypandoc.convert_file("in.md", "docx", outputfile="out.docx")

Without pandoc (pure Python):


# Markdown → HTML
import markdown
html = markdown.markdown(open("in.md").read(), extensions=["tables", "fenced_code", "toc"])

# HTML → Markdown
from markdownify import markdownify
md = markdownify(html, heading_style="ATX")  # ATX = # headers, not underlines

PDF Operations


# --- Extract text + tables ---
import pdfplumber
with pdfplumber.open("in.pdf") as pdf:
    text = "\n".join(p.extract_text() or "" for p in pdf.pages)
    tables = pdf.pages[0].extract_tables()  # list of list-of-rows

# --- PDF → images (one PNG per page) ---
from pdf2image import convert_from_path
for i, img in enumerate(convert_from_path("in.pdf", dpi=200)):
    img.save(f"page_{i+1}.png")

# --- Merge / split / rotate ---
from pypdf import PdfReader, PdfWriter
writer = PdfWriter()
for path in ["a.pdf", "b.pdf"]:
    for page in PdfReader(path).pages:
        writer.add_page(page)
writer.write("merged.pdf")

# Extract pages 2–5
reader = PdfReader("in.pdf")
writer = PdfWriter()
for p in reader.pages[1:5]:
    writer.add_page(p)
writer.write("pages_2-5.pdf")

PDF gotchas:

```
pdf2image
```
needs
```
poppler-utils
```
installed system-wide (not a pip package)
Scanned PDFs have no text layer — pdfplumber returns
```
None
```
. Use
```
pytesseract
```
OCR on pdf2image output.
```
PyPDF2
```
is deprecated → use
```
pypdf
```
(same API, maintained fork)

Image Formats

from PIL import Image

# --- Basic conversion ---
Image.open("in.png").convert("RGB").save("out.jpg", quality=90)

# convert("RGB") is REQUIRED: JPEG can't store alpha channel, will raise OSError

# --- WebP (best web format) ---
Image.open("in.jpg").save("out.webp", quality=85, method=6)  # method 0-6, 6=best compression

# --- AVIF (smallest, Pillow 11+) ---
Image.open("in.jpg").save("out.avif", quality=75)

# --- HEIC (iPhone photos) → JPG ---
from pillow_heif import register_heif_opener
register_heif_opener()
Image.open("in.heic").convert("RGB").save("out.jpg", quality=90)

# --- SVG → PNG ---
import cairosvg
cairosvg.svg2png(url="in.svg", write_to="out.png", output_width=1024)

# --- Batch convert directory ---
from pathlib import Path
for p in Path("imgs").glob("*.png"):
    Image.open(p).convert("RGB").save(p.with_suffix(".jpg"), quality=85)

Image gotchas:

PNG → JPG: must
```
convert("RGB")
```
first or transparency crashes the save
```
quality
```
for PNG is meaningless (lossless) — use
```
optimize=True, compress_level=9
```
Pillow can't open
```
.svg
```
natively — use
```
cairosvg
```
or
```
svglib
```

GIF → MP4 is a video operation:

ffmpeg -i in.gif -pix_fmt yuv420p out.mp4

Audio Formats

from pydub import AudioSegment

# --- MP3 ↔ WAV ---
AudioSegment.from_mp3("in.mp3").export("out.wav", format="wav")
AudioSegment.from_wav("in.wav").export("out.mp3", format="mp3", bitrate="192k")

# --- FLAC → MP3 ---
AudioSegment.from_file("in.flac", format="flac").export("out.mp3", format="mp3", bitrate="320k")

# --- OGG → MP3 ---
AudioSegment.from_ogg("in.ogg").export("out.mp3", format="mp3", bitrate="192k")

# --- M4A / AAC → MP3 ---
AudioSegment.from_file("in.m4a", format="m4a").export("out.mp3", format="mp3", bitrate="256k")

# --- Any → Any (pydub supports mp3, wav, ogg, flac, m4a, aac, wma, aiff) ---
AudioSegment.from_file("in.wma", format="wma").export("out.flac", format="flac")

# --- Trim audio (first 30 seconds) ---
audio = AudioSegment.from_file("in.mp3")
audio[:30000].export("first_30s.mp3", format="mp3")  # milliseconds

# --- Merge / concatenate ---
combined = AudioSegment.from_file("a.mp3") + AudioSegment.from_file("b.mp3")
combined.export("merged.mp3", format="mp3")

# --- Adjust volume ---
audio = AudioSegment.from_file("in.mp3")
louder = audio + 6    # +6 dB
quieter = audio - 6   # -6 dB
louder.export("louder.mp3", format="mp3")

# --- Get audio info ---
audio = AudioSegment.from_file("in.mp3")
print(f"Duration: {len(audio)/1000:.1f}s, Channels: {audio.channels}, "
      f"Sample rate: {audio.frame_rate}Hz, Sample width: {audio.sample_width*8}bit")

# --- Batch convert directory ---
from pathlib import Path
for p in Path("audio").glob("*.wav"):
    AudioSegment.from_wav(str(p)).export(p.with_suffix(".mp3"), format="mp3", bitrate="192k")

Audio gotchas:

```
pydub
```
requires
```
ffmpeg
```
installed system-wide for non-WAV formats
Bitrate options: "128k" (small/low quality), "192k" (balanced), "256k" (high), "320k" (max for MP3)
WAV files are uncompressed — expect 10x larger file sizes than MP3

For sample rate conversion:

audio.set_frame_rate(44100).export("out.wav", format="wav")

Mono to stereo:

audio.set_channels(2)

/ Stereo to mono:

audio.set_channels(1)

ZIP Archives

import zipfile
from pathlib import Path

# --- Create ZIP from files ---
with zipfile.ZipFile("archive.zip", "w", zipfile.ZIP_DEFLATED) as zf:
    zf.write("file1.txt")
    zf.write("file2.csv")
    zf.write("images/photo.jpg")

# --- Create ZIP from entire directory ---
import shutil
shutil.make_archive("archive", "zip", root_dir="my_folder")  # creates archive.zip

# --- Extract all ---
with zipfile.ZipFile("archive.zip", "r") as zf:
    zf.extractall("output_dir")

# --- Extract single file ---
with zipfile.ZipFile("archive.zip", "r") as zf:
    zf.extract("file1.txt", "output_dir")

# --- List contents without extracting ---
with zipfile.ZipFile("archive.zip", "r") as zf:
    for info in zf.infolist():
        print(f"{info.filename}  {info.file_size:,} bytes  {info.compress_size:,} compressed")

# --- Read file from ZIP without extracting ---
with zipfile.ZipFile("archive.zip", "r") as zf:
    content = zf.read("file1.txt").decode("utf-8")

# --- Add files to existing ZIP ---
with zipfile.ZipFile("archive.zip", "a") as zf:
    zf.write("new_file.txt")

# --- Create ZIP with password (read-only, use pyzipper for write) ---
# pip install pyzipper
import pyzipper
with pyzipper.AESZipFile("secure.zip", "w", compression=pyzipper.ZIP_DEFLATED,
                          encryption=pyzipper.WZ_AES) as zf:
    zf.setpassword(b"my_password")
    zf.write("secret.txt")

# --- Batch: ZIP all PDFs in a directory ---
with zipfile.ZipFile("all_pdfs.zip", "w", zipfile.ZIP_DEFLATED) as zf:
    for p in Path(".").glob("**/*.pdf"):
        zf.write(p)

ZIP gotchas:

```
zipfile
```
is in Python's standard library — no install needed
Always use
```
ZIP_DEFLATED
```
compression (default is
```
ZIP_STORED
```
= no compression)
For password-protected ZIPs, stdlib
```
zipfile
```
can only read (not write) — use
```
pyzipper
```
for encrypted writes
Max file size in standard ZIP is 4 GB; use
```
allowZip64=True
```
(default in Python 3) for larger files
```
shutil.make_archive
```
is the simplest way to ZIP an entire directory tree

EPUB Formats

# --- EPUB → other formats (via pandoc) ---
# pandoc is the easiest way to convert EPUB
import pypandoc

# EPUB → Markdown
pypandoc.convert_file("in.epub", "markdown", outputfile="out.md")

# EPUB → HTML
pypandoc.convert_file("in.epub", "html", outputfile="out.html")

# EPUB → DOCX
pypandoc.convert_file("in.epub", "docx", outputfile="out.docx")

# EPUB → plain text
pypandoc.convert_file("in.epub", "plain", outputfile="out.txt")

# --- Other formats → EPUB ---
# Markdown → EPUB
pypandoc.convert_file("in.md", "epub", outputfile="out.epub",
                       extra_args=["--metadata", "title=My Book"])

# HTML → EPUB
pypandoc.convert_file("in.html", "epub", outputfile="out.epub",
                       extra_args=["--metadata", "title=My Book"])

# DOCX → EPUB
pypandoc.convert_file("in.docx", "epub", outputfile="out.epub")

# --- CLI equivalents ---
pandoc in.epub -o out.md
pandoc in.epub -o out.pdf --pdf-engine=xelatex
pandoc in.md -o out.epub --metadata title="My Book"
pandoc in.html -o out.epub --metadata title="My Book" --epub-cover-image=cover.jpg

# --- Pure Python: read/write EPUB with ebooklib ---
from ebooklib import epub

# Read EPUB
book = epub.read_epub("in.epub")
for item in book.get_items_of_type(9):  # 9 = ITEM_DOCUMENT (HTML chapters)
    print(item.get_name())
    html_content = item.get_content().decode("utf-8")

# Create EPUB from scratch
book = epub.EpubBook()
book.set_identifier("id123")
book.set_title("My Book")
book.set_language("en")
book.add_author("Author Name")

ch1 = epub.EpubHtml(title="Chapter 1", file_name="ch1.xhtml", lang="en")
ch1.content = "<h1>Chapter 1</h1><p>Hello world.</p>"
book.add_item(ch1)

book.toc = [epub.Link("ch1.xhtml", "Chapter 1", "ch1")]
book.add_item(epub.EpubNcx())
book.add_item(epub.EpubNav())
book.spine = ["nav", ch1]

epub.write_epub("out.epub", book)

EPUB gotchas:

```
pandoc
```
is the simplest for format-to-format EPUB conversion
Always add
```
--metadata title="..."
```
when creating EPUB — readers require a title
EPUB is essentially a ZIP of HTML files —
```
ebooklib
```
gives you fine-grained control
For EPUB → PDF, pandoc needs a LaTeX engine (
```
texlive-xetex
```
)
Cover images: use
```
--epub-cover-image=cover.jpg
```
with pandoc

HTML to PDF

# --- weasyprint (best CSS support, no browser needed) ---
from weasyprint import HTML

# Simple file conversion
HTML("in.html").write_pdf("out.pdf")

# From URL
HTML("https://example.com").write_pdf("page.pdf")

# From HTML string
HTML(string="<h1>Hello</h1><p>World</p>").write_pdf("out.pdf")

# With custom CSS
HTML("in.html").write_pdf("out.pdf", stylesheets=["custom.css"])

# With page size and margins
from weasyprint import CSS
HTML("in.html").write_pdf("out.pdf", stylesheets=[
    CSS(string="@page { size: A4; margin: 2cm; }")
])

# Landscape orientation
HTML("in.html").write_pdf("out.pdf", stylesheets=[
    CSS(string="@page { size: A4 landscape; margin: 1.5cm; }")
])

# --- CLI alternatives ---
# pandoc (simpler, less CSS fidelity)
pandoc in.html -o out.pdf --pdf-engine=xelatex

# weasyprint CLI
weasyprint in.html out.pdf
weasyprint https://example.com page.pdf

HTML to PDF gotchas:

```
weasyprint
```
supports CSS3 including flexbox, grid, and
```
@page
```
rules — best for styled documents
```
weasyprint
```
does NOT run JavaScript — for JS-heavy pages, use
```
playwright
```
or
```
pyppeteer
```
instead
For JS-rendered pages:
```
playwright
```
→
```
page.pdf()
```
is the most reliable option
pandoc HTML → PDF goes through LaTeX, so complex CSS layouts may not render correctly
Large HTML files with many images: use
```
HTML(filename="in.html", base_url=".")
```
so relative image paths resolve

GIF Creation

from PIL import Image
import imageio.v3 as iio
from pathlib import Path

# --- Images → animated GIF (Pillow) ---
frames = [Image.open(f"frame_{i}.png") for i in range(10)]
frames[0].save("out.gif", save_all=True, append_images=frames[1:],
               duration=100, loop=0)  # duration in ms, loop=0 means infinite

# --- Images → GIF with optimization ---
frames = [Image.open(f"frame_{i}.png").convert("RGBA") for i in range(10)]
frames[0].save("out.gif", save_all=True, append_images=frames[1:],
               duration=100, loop=0, optimize=True)

# --- Directory of images → GIF ---
frame_paths = sorted(Path("frames").glob("*.png"))
frames = [Image.open(p) for p in frame_paths]
frames[0].save("out.gif", save_all=True, append_images=frames[1:],
               duration=100, loop=0)

# --- GIF → individual frames ---
gif = Image.open("in.gif")
for i in range(gif.n_frames):
    gif.seek(i)
    gif.save(f"frame_{i}.png")

# --- Resize GIF ---
gif = Image.open("in.gif")
resized_frames = []
for i in range(gif.n_frames):
    gif.seek(i)
    resized_frames.append(gif.copy().resize((320, 240), Image.LANCZOS))
resized_frames[0].save("small.gif", save_all=True, append_images=resized_frames[1:],
                        duration=gif.info.get("duration", 100), loop=0)

# --- Video → GIF (imageio + ffmpeg) ---
import imageio.v3 as iio
frames = iio.imread("in.mp4", plugin="pyav")
iio.imwrite("out.gif", frames, plugin="pillow", duration=40, loop=0)

# --- GIF → MP4 (ffmpeg CLI, much smaller file) ---
# ffmpeg -i in.gif -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" out.mp4

GIF gotchas:

GIF is limited to 256 colors per frame — complex images lose quality
Use
```
optimize=True
```
to reduce file size, but large GIFs are still huge compared to MP4
```
duration
```
is per-frame in milliseconds (100ms = 10 FPS, 40ms = 25 FPS)
```
loop=0
```
means infinite loop;
```
loop=1
```
plays once then stops
For video → GIF, consider downscaling first — full-resolution GIFs are enormous
Pillow GIF output doesn't support transparency well — use
```
imageio
```
for better results
For best quality: create GIF from video with ffmpeg:
```
ffmpeg -i in.mp4 -vf "fps=15,scale=480:-1" out.gif
```

PDF to SVG

# --- PyMuPDF (fitz) — best quality, vector-preserving ---
import fitz  # pip install pymupdf

# Single page
doc = fitz.open("in.pdf")
page = doc[0]
svg_text = page.get_svg_image()
with open("page_1.svg", "w") as f:
    f.write(svg_text)

# All pages
doc = fitz.open("in.pdf")
for i, page in enumerate(doc):
    svg_text = page.get_svg_image()
    with open(f"page_{i+1}.svg", "w") as f:
        f.write(svg_text)
doc.close()

# With custom resolution (matrix scales the output)
doc = fitz.open("in.pdf")
page = doc[0]
mat = fitz.Matrix(2, 2)  # 2x scale for higher detail
svg_text = page.get_svg_image(matrix=mat)
with open("page_hires.svg", "w") as f:
    f.write(svg_text)

# --- CLI alternatives ---
# pdf2svg (if installed)
pdf2svg in.pdf out.svg 1  # page number

# Inkscape CLI
inkscape in.pdf --export-type=svg --export-filename=out.svg

# pdftocairo (from poppler-utils)
pdftocairo -svg in.pdf out.svg

PDF to SVG gotchas:

```
pymupdf
```
(imported as
```
fitz
```
) produces true vector SVGs — text stays as text, paths stay as paths
Scanned PDFs produce SVGs with embedded raster images (no vector data to extract)
Large PDFs with complex graphics produce very large SVG files
```
pdf2svg
```
CLI tool is simple but must be installed separately (
```
apt install pdf2svg
```
)
For rasterized SVG (simpler but not truly vector): render PDF to PNG first, then embed in SVG

Validation

Always verify output:


# Row count parity
assert len(pd.read_csv("out.csv")) == len(pd.read_json("in.json"))

# JSON well-formed
json.load(open("out.json"))

# Image opens
Image.open("out.jpg").verify()