Skills image-duplication-detector
Detect image duplication and tampering in manuscript figures using computer
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/aipoch-ai/image-duplication-detector" ~/.claude/skills/openclaw-skills-image-duplication-detector && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/aipoch-ai/image-duplication-detector" ~/.openclaw/skills/openclaw-skills-image-duplication-detector && rm -rf "$T"
manifest:
skills/aipoch-ai/image-duplication-detector/SKILL.mdsource content
Image Duplication Detector
ID: 195
Description
Uses Computer Vision (CV) algorithms to scan all images in paper manuscripts to detect potential duplication or local tampering (PS traces).
Usage
# Scan single PDF file python scripts/main.py --input paper.pdf --output report.json # Scan image folder python scripts/main.py --input ./images/ --output report.json # Specify similarity threshold (default 0.85) python scripts/main.py --input paper.pdf --threshold 0.90 --output report.json # Enable tampering detection python scripts/main.py --input paper.pdf --detect-tampering --output report.json # Generate visualization report python scripts/main.py --input paper.pdf --visualize --output report.json
Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| string | - | Yes | Input PDF file or image folder path |
| string | report.json | No | Output report path |
| float | 0.85 | No | Similarity threshold (0-1), higher is stricter |
| flag | false | No | Enable tampering/PS trace detection |
| flag | false | No | Generate visualization comparison images |
| string | ./temp | No | Temporary file directory |
Output Format
{ "summary": { "total_images": 12, "duplicates_found": 2, "tampering_detected": 1, "processing_time": "3.5s" }, "duplicates": [ { "group_id": 1, "similarity": 0.98, "images": [ {"page": 2, "index": 1, "path": "..."}, {"page": 5, "index": 3, "path": "..."} ] } ], "tampering": [ { "image": "page_3_img_2.png", "suspicious_regions": [ {"x": 120, "y": 80, "width": 50, "height": 50, "confidence": 0.92} ] } ] }
Requirements
opencv-python>=4.8.0 numpy>=1.24.0 Pillow>=10.0.0 PyPDF2>=3.0.0 pdf2image>=1.16.0 imagehash>=4.3.0 scikit-image>=0.21.0 matplotlib>=3.7.0
Algorithm Details
Duplication Detection
- Perceptual Hashing: Uses pHash, dHash, aHash combination to detect visually similar images
- Feature Matching: ORB feature point matching to verify similarity
- SSIM: Structural similarity index as auxiliary verification
Tampering Detection
- ELA (Error Level Analysis): Detects JPEG compression level inconsistencies
- Noise Analysis: Noise pattern anomaly detection
- Copy-Move Detection: Copy-move forgery detection
- Lighting Inconsistency: Lighting consistency analysis
Example
from scripts.main import ImageDuplicationDetector detector = ImageDuplicationDetector( threshold=0.85, detect_tampering=True ) results = detector.scan("paper.pdf") detector.save_report(results, "report.json")
Notes
- Supports PDF, PNG, JPG, TIFF formats
- Large files recommended for batch processing
- Tampering detection may produce false positives, manual review recommended
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access (../)
- Output does not expose sensitive information
- Prompt injection protections in place
- Input file paths validated (no ../ traversal)
- Output directory restricted to workspace
- Script execution in sandboxed environment
- Error messages sanitized (no stack traces exposed)
- Dependencies audited
Prerequisites
# Python dependencies pip install -r requirements.txt
Evaluation Criteria
Success Metrics
- Successfully executes main functionality
- Output meets quality standards
- Handles edge cases gracefully
- Performance is acceptable
Test Cases
- Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support