Immich-photo-manager duplicate-report
git clone https://github.com/drolosoft/immich-photo-manager
T=$(mktemp -d) && git clone --depth=1 https://github.com/drolosoft/immich-photo-manager "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/duplicate-report" ~/.claude/skills/drolosoft-immich-photo-manager-duplicate-report && rm -rf "$T"
skills/duplicate-report/SKILL.mdDuplicate Report
⚠️ Connection Required — ALWAYS CHECK FIRST
Before doing ANYTHING else in this skill, call
on the Immich MCP server.ping
- If
succeeds → proceed with the skill normally.ping - If
fails or the MCP tools are not available → STOP. Do not continue. Tell the user:ping
❌ Immich is not connected. This plugin needs a running Immich MCP server to work.
Run /setup-immich-photo-manager to configure your Immich connection. You'll need:
- Your Immich server URL (e.g.,
)http://192.168.1.100:2283- An Immich API key (how to create one)
- The MCP server configured (see /setup-immich-photo-manager)
Nothing in this plugin will work until the connection is configured.
Do NOT skip this check. Do NOT try to run any other tool first. Always ping, always block if it fails.
Generate a comprehensive duplicate analysis of an Immich photo library. Uses perceptual hashing to find visually identical photos even when they have different checksums (common when photos are exported from Apple Photos and Google Photos).
Why Perceptual Hashing?
When users import the same photo library from multiple sources (Apple Photos export, Google Takeout, manual folder copies), the files are often re-encoded by each platform. This means:
- Checksums differ — same photo, different binary → SHA/MD5 won't match
- Immich's built-in CLIP duplicate detection uses too strict a threshold for re-encoded content
- Filename matching catches only a fraction (filenames often differ across platforms)
Perceptual hashing (pHash) computes a fingerprint based on the visual content of the image, not the binary data. Two re-encoded copies of the same photo produce the same perceptual hash.
Prerequisites
The user's machine needs:
pip3 install Pillow imagehash pillow-heif --break-system-packages
— image loadingPillow
— perceptual hashingimagehash
— HEIC/HEIF support (critical for Apple Photos)pillow-heif
Analysis Workflow
Step 1: Discover Import Sources
Query Immich to identify distinct import sources from asset paths:
SELECT CASE WHEN "originalPath" LIKE '%Apple Fotos%' OR "originalPath" LIKE '%Apple Photos%' THEN 'Apple Photos' WHEN "originalPath" LIKE '%Google Fotos%' OR "originalPath" LIKE '%Google Photos%' THEN 'Google Photos' ELSE split_part("originalPath", '/', 5) -- or whatever level gives the source folder END as source, count(*) as total FROM asset WHERE "deletedAt" IS NULL GROUP BY source ORDER BY total DESC;
Present the sources to the user and ask which ones to compare.
Step 2: Run Perceptual Hash Scan
For each source directory, scan all image files and compute 256-bit perceptual hashes:
from pillow_heif import register_heif_opener register_heif_opener() from PIL import Image import imagehash def compute_phash(filepath): with Image.open(filepath) as img: if img.mode != 'RGB': img = img.convert('RGB') return str(imagehash.phash(img, hash_size=16))
Key parameters:
→ 256-bit hash (high accuracy, very few false positives)hash_size=16- Use
(NOTThreadPoolExecutor
— native HEIF libs deadlock on fork)ProcessPoolExecutor - 4 workers is optimal for most machines
- Report progress every 500 files
Expected performance: ~500 files/30 seconds on Apple Silicon, ~200 files/30 seconds on Intel.
Step 3: Compute Overlap
Compare hash sets between sources:
common = set(source_a_hashes.keys()) & set(source_b_hashes.keys()) a_only = set(source_a_hashes.keys()) - set(source_b_hashes.keys()) b_only = set(source_b_hashes.keys()) - set(source_a_hashes.keys())
For internal duplicates within a single source:
internal_dupes = sum(len(v) - 1 for v in hashes.values() if len(v) > 1)
Step 4: Generate Report
Present findings in a structured report:
DUPLICATE ANALYSIS REPORT Library: [total] assets ([photos] photos + [videos] videos) Sources analyzed: [Source A] ([count] files), [Source B] ([count] files) CROSS-SOURCE DUPLICATES [Source A] <-> [Source B] visual matches: [count] ([pct]% overlap) UNIQUE TO EACH SOURCE [Source A]-only photos: [count] [Source B]-only photos: [count] INTERNAL DUPLICATES Within [Source A]: [count] Within [Source B]: [count] TOTAL REMOVABLE Cross-source duplicates: [count] Internal duplicates: [count] TOTAL: [count] files RECOMMENDATION Keep: [Source with better metadata/folder structure] Remove: [Other source] copies where match exists Review: [count] [other]-only photos are NOT duplicates — keep them
Step 5: Removal (User-Approved)
NEVER auto-remove. Always:
- Present the report with counts
- Ask user which categories to remove
- Confirm the exact count
- Execute removal in two steps:
a. Permanent delete from Immich (
withDELETE /api/assets
) b. Physical file removal from disk (force: true
)os.remove() - Log everything to a JSON file for audit
Batch Immich deletions in groups of 100 assets per API call.
Step 6: Verify
After removal, query Immich statistics to confirm the new count and present before/after comparison.
Report Variations
Quick Report (no disk scan)
Uses only Immich database — checksums, filenames, timestamps. Fast but misses re-encoded duplicates.
-- Exact checksum duplicates SELECT checksum, count(*) FROM asset WHERE "deletedAt" IS NULL GROUP BY checksum HAVING count(*) > 1; -- Filename overlap between sources SELECT count(*) FROM ( SELECT "originalFileName" FROM asset WHERE "originalPath" LIKE '%Source A%' INTERSECT SELECT "originalFileName" FROM asset WHERE "originalPath" LIKE '%Source B%' ) t;
Full Report (perceptual hash)
Scans actual files on disk. Catches re-encoded duplicates. Requires filesystem access and Python dependencies. Takes 10-20 minutes for ~40K photos on Apple Silicon.
Year-by-Year Breakdown
Shows which source dominates each year — helps users understand their photo ecosystem history:
SELECT year, source_a_count, source_b_count, CASE WHEN source_a_count > source_b_count THEN 'Source A' ELSE 'Source B' END as dominant FROM ( SELECT extract(year from "localDateTime") as year, count(*) FILTER (WHERE "originalPath" LIKE '%Source A%') as source_a_count, count(*) FILTER (WHERE "originalPath" LIKE '%Source B%') as source_b_count FROM asset WHERE "deletedAt" IS NULL GROUP BY year ) t ORDER BY year;
Important Notes
- Perceptual hashing has rare false positives — two visually very similar (but different) photos may share a hash. The 256-bit hash size minimizes this, but users should spot-check a few matches before bulk removal.
- Videos are excluded from perceptual hashing — they need a different approach (frame extraction + hashing).
- HEIC support is essential — without
, Apple Photos libraries will have massive error rates (50%+ of files).pillow-heif - ThreadPoolExecutor, not ProcessPoolExecutor — native HEIF libraries deadlock when forked on macOS. Always use threads.
- Background Immich scanning may add new assets during analysis. Note this in the report if the post-cleanup count seems off.