px-asset-extract

install
source · Clone the upstream repo
git clone https://github.com/JadeLiu-tech/px-asset-extract
Claude Code · Install into ~/.claude/skills/
git clone --depth=1 https://github.com/JadeLiu-tech/px-asset-extract ~/.claude/skills/jadeliu-tech-px-asset-extract-px-asset-extract
manifest: SKILL.md
safety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
  • pip install
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content

px-asset-extract: Image Asset Extraction

What It Does

Decomposes images into individual transparent PNG assets with classification and a JSON manifest. The full pipeline runs in 2-6 seconds on CPU with zero ML models:

  1. Background detection — median color from image borders
  2. Foreground mask — Euclidean color distance thresholding
  3. Character bridging — dilation connects letters into words
  4. Connected components — union-find with 8-connectivity
  5. Classification — heuristic typing into 10 categories
  6. Text-line merging — groups word fragments into text lines
  7. Alpha extraction — anti-aliased transparent cropping
  8. Deduplication — removes overlapping and oversized segments

When to Use This

ScenarioUse px-asset-extract?
Extract all elements from a slide/posterYes — this is the primary use case
Get only illustrations, skip textYes — use
--types illustration
or
--exclude-types text
Extract specific objects by descriptionUse with
--regions
+ a grounding model (e.g., Florence-2)
Remove background from a single photoNo — use a background removal model instead
Segment a photo sceneNo — use SAM/FastSAM for photographic content
Image has textured/photographic backgroundLimited — works best on clean/solid backgrounds

Installation

git clone https://github.com/JadeLiu-tech/px-asset-extract.git
cd px-asset-extract
pip install .

Usage

CLI

# Basic extraction
px-extract <image> -o <output_dir>

# Only extract illustrations and icons
px-extract <image> -o <output_dir> --types illustration,icon

# Extract everything except text and dots
px-extract <image> -o <output_dir> --exclude-types text,dot,line

# Extract from pre-computed bounding boxes (e.g. from px-ground)
px-extract <image> -o <output_dir> --regions regions.json

# Segment only — output JSON, no PNGs
px-extract <image> --segments-only

# Batch processing
px-extract images/*.png -o output/ --batch

# JSON output to stdout
px-extract <image> -o <output_dir> --json --quiet

Python API

from px_asset_extract import extract_assets, load_regions

# Full extraction
result = extract_assets("slide.png", output_dir="assets/")
for asset in result.assets:
    print(f"{asset.id}: {asset.label} at ({asset.bbox.x}, {asset.bbox.y}) -> {asset.file_path}")

# Type filtering
result = extract_assets("slide.png", output_dir="icons/", types=["illustration", "icon"])
result = extract_assets("slide.png", output_dir="graphics/", exclude_types=["text", "line", "dot"])

# Pre-computed regions (from grounding model output)
regions = load_regions("grounded.json")
result = extract_assets("slide.png", output_dir="targeted/", regions=regions)

# Combine regions + type filter
result = extract_assets("slide.png", output_dir="charts/", regions=regions, types=["chart"])

CLI Options

OptionDefaultDescription
-o
,
--output
assets
Output directory
--bg-threshold
22.0
Background color distance (lower = more sensitive)
--min-area
60
Minimum segment area in pixels
--dilation
2
Character gap bridging passes
--padding
10
Extra pixels around each asset
--max-coverage
0.5
Max fraction of image a segment can cover
--types
Only extract these types (comma-separated)
--exclude-types
Skip these types (comma-separated)
--regions
JSON file with bounding boxes (skips segmentation)
--segments-only
Output segment JSON without extracting PNGs
--no-visualization
Skip visualization image
--batch
Create subdirectories per image
--json
Output results as JSON to stdout
--quiet
Suppress progress messages

Output

Each run produces:

  • asset_NNN_<type>.png
    — individual transparent PNGs
  • manifest.json
    — positions, types, and metadata for all assets
  • visualization.png
    — input image with color-coded bounding boxes

Manifest format

{
  "source_image": "slide.png",
  "source_size": {"width": 1920, "height": 1080},
  "background_color": [255, 255, 255],
  "num_assets": 44,
  "assets": [
    {
      "id": "asset_000_illustration",
      "label": "illustration",
      "file": "asset_000_illustration.png",
      "position": {"x": 100, "y": 50, "width": 400, "height": 300},
      "pixel_area": 120000
    }
  ]
}

Regions JSON format (for --regions)

[
  {"x": 100, "y": 50, "width": 400, "height": 300, "label": "chart"},
  {"x1": 600, "y1": 100, "x2": 800, "y2": 300, "label": "logo"}
]

Also supports

{"regions": [...]}
wrapper. Label defaults to
"region"
if omitted.

Asset Types

TypeDetection Logic
text
dark_ratio > 0.4, uniform ink color
illustration
Large (>1% image area), colorful
icon
Small (<3000px area, <60px max dimension)
graphic
Medium-sized, colored
line
Thin (min dimension <=5px, extreme aspect ratio)
dot
Very small (<150px area, <20px dimension)
diagram
Low fill ratio (<0.25)
diagram_network
Spans >80% of image, very low fill
shadow
Bright (>200), low contrast, low saturation
element
Catch-all for unclassified objects

Performance

Image typeAssetsTime
Presentation slide22-442-6s
Poster113.9s
Scientific diagram434.2s
Technical diagram424.5s
Data chart264.8s

Dependencies

Only

Pillow
and
numpy
. Optional
opencv-python
for better alpha edges.