Marketplace ai-chapter-consolidate
Use AI to merge individual page HTML files into a unified chapter document. Creates continuous document format for improved reading experience and semantic consistency.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/abejitsu/ai-chapter-consolidate" ~/.claude/skills/aiskillstore-marketplace-ai-chapter-consolidate && rm -rf "$T"
skills/abejitsu/ai-chapter-consolidate/SKILL.mdAI Chapter Consolidate Skill
Purpose
This skill uses AI to intelligently merge individual page HTML files into a single, continuous chapter document. Rather than simple concatenation, the AI:
- Removes duplicate headers/footers from continuation pages
- Ensures consistent heading hierarchy across pages
- Maintains semantic structure throughout
- Preserves all content without loss or repetition
- Creates smooth content flow (no page breaks)
The result is a unified chapter document in the continuous format (single
page-container, single page-content).
What to Do
-
Collect all page HTML files for chapter
- Gather
files for all pages in chapter04_page_XX.html - Verify all files exist and are valid
- Sort by page number (ascending)
- Gather
-
Extract content from each page
- Load each HTML file
- Extract main content from
<main class="page-content"> - Preserve semantic classes and structure
-
Prepare consolidation inputs for AI
- Page 1: Full content including chapter header
- Pages 2+: Extract content sections, remove chapter header/nav
- Preserve all text and structure
- Note any special sections (exhibits, tables, etc.)
-
Invoke AI consolidation
- Send all page contents to Claude
- Request merging into single continuous document
- Specify structural requirements
- Request heading hierarchy normalization
-
Process AI output
- Extract consolidated HTML from response
- Verify structure integrity
- Ensure all pages represented
- Check heading hierarchy
-
Save consolidated document
- Save to:
output/chapter_XX/chapter_artifacts/chapter_XX.html - Create metadata/log file
- Calculate statistics
- Save to:
Input Files
Per-page HTML files (validated by previous gate):
(Chapter opening)output/chapter_XX/page_artifacts/page_16/04_page_16.html
(Continuation)output/chapter_XX/page_artifacts/page_17/04_page_17.html
(Continuation)output/chapter_XX/page_artifacts/page_18/04_page_18.html- ... (all pages in chapter)
Chapter metadata (from analysis):
- Page range (first and last page of chapter)
- Chapter number
- Chapter title
- Expected page count
AI Consolidation Prompt
The prompt sent to Claude:
You are merging individual page HTML documents into a single, continuous chapter. INPUT PAGES: Page 1 (Opening - include chapter header): [HTML content from page 1] Page 2 (Continuation): [HTML content from page 2] Page 3 (Continuation): [HTML content from page 3] ... (all pages) TASK: Merge these pages into a single HTML document that reads as one continuous chapter. REQUIREMENTS: 1. Structure: - Create single <div class="page-container"> wrapping everything - Create single <main class="page-content"> for all content - Remove page-break indicators or comments - Create truly continuous document (no paginated elements) 2. Chapter Header: - Keep chapter header from Page 1 (chapter number, title) - Remove chapter headers/titles from continuation pages - Keep section navigation if present on Page 1 - Remove duplicate navigation from other pages 3. Content Preservation: - Include ALL text content from all pages - Preserve exact wording (no paraphrasing) - Maintain all lists, paragraphs, tables - Include all semantic classes - Keep all HTML structure 4. Heading Hierarchy: - Normalize heading levels across merged pages - Page 1 h1 = Chapter title (stays as h1) - First section in each page = h2 (main sections) - Sub-sections = h3 or h4 as needed - Ensure no hierarchy jumps (h1 → h3 without h2) - Number consecutive headings logically 5. Content Flow: - Remove page-specific headers/footers - Merge seamlessly so content flows naturally - No artificial breaks or transitions - Paragraphs continue logically - Lists maintain coherence 6. Exhibits and Images: - Preserve all tables and figures - Keep exhibit titles and captions - Include all images with proper paths - Maintain table of contents if present 7. CSS Classes: - Preserve all semantic classes (section-heading, paragraph, etc.) - Keep consistent class usage throughout - Ensure classes match chapter opening page style - Do not add or remove classes 8. Metadata: - Include title tag: "Chapter N: Title - Pages X-Y" - Keep meta charset and viewport - Link stylesheet: <link rel="stylesheet" href="../../styles/main.css"> OUTPUT: Return ONLY a single, valid HTML5 document: ```html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Chapter [N]: [Title] - Pages [X-Y]</title> <link rel="stylesheet" href="../../styles/main.css"> </head> <body> <div class="page-container"> <main class="page-content"> <!-- All content from all pages, merged seamlessly --> </main> </div> </body> </html>
VALIDATION:
- Single HTML5 document
- All pages represented
- No page breaks or transitions
- Proper heading hierarchy
- All text preserved
## Page Content Extraction Logic Before sending to AI, extract content strategically: ### Page 1 (Opening): - **Include**: Entire page HTML content - **Reason**: Contains chapter header, navigation, first section - **Preserve**: All elements (header, nav, dividers, content) ### Pages 2-N (Continuation): - **Extract**: Only content after chapter header - **Skip**: Chapter number, chapter title, section navigation - **Preserve**: Section headings, paragraphs, lists, exhibits - **Include**: All semantic content sections ### Example extraction: ```html <!-- Page 1: Keep everything --> <div class="chapter-header"> <span class="chapter-number">2</span> <h1 class="chapter-title">Rights in Real Estate</h1> </div> <nav class="section-navigation">...</nav> <h2 class="section-heading">REAL PROPERTY RIGHTS</h2> <p class="paragraph">...</p> <!-- Page 2: Skip header, keep content --> <!-- <div class="chapter-header">...</div> SKIPPED --> <!-- <nav class="section-navigation">...</nav> SKIPPED --> <h4 class="subsection-heading">Physical characteristics.</h4> <p class="paragraph">...</p> <ul class="bullet-list">...</ul> <!-- Page 3: Continue same pattern --> <h4 class="subsection-heading">Interdependence.</h4> <p class="paragraph">...</p>
Output File
Consolidated Chapter HTML
Path:
output/chapter_XX/chapter_artifacts/chapter_XX.html
Structure:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Chapter 2: Rights in Real Estate - Pages 16-29</title> <link rel="stylesheet" href="../../styles/main.css"> </head> <body> <div class="page-container"> <main class="page-content"> <!-- Chapter header (from page 1) --> <div class="chapter-header">...</div> <nav class="section-navigation">...</nav> <hr class="section-divider"> <!-- Page 1 content --> <h2 class="section-heading">REAL PROPERTY RIGHTS</h2> <p class="paragraph">...</p> <!-- Page 2 content (seamlessly merged) --> <h4 class="subsection-heading">Physical characteristics.</h4> <p class="paragraph">...</p> <ul class="bullet-list">...</ul> <!-- Page 3 content (continuing flow) --> <h4 class="subsection-heading">Interdependence.</h4> <p class="paragraph">...</p> <!-- ... more content from remaining pages ... --> <!-- Final page content --> <h2 class="section-heading">REGULATIONS AND LICENSING</h2> <p class="paragraph">...</p> </main> </div> </body> </html>
Consolidation Log
Path:
output/chapter_XX/chapter_artifacts/consolidation_log.json
{ "chapter": 2, "title": "Rights in Real Estate", "book_pages": "16-29", "pdf_indices": "15-28", "consolidated_at": "2025-11-08T14:35:00Z", "pages_merged": 14, "pages_included": [ { "page": 16, "book_page": 17, "status": "opening_chapter", "content_type": "header_navigation_content" }, { "page": 17, "book_page": 18, "status": "continuation", "content_type": "subsections_paragraphs" }, { "page": 18, "book_page": 19, "status": "continuation", "content_type": "subsections_paragraphs_list" } // ... all pages ], "content_statistics": { "total_headings": { "h1": 1, "h2": 4, "h3": 0, "h4": 12 }, "total_paragraphs": 156, "total_lists": 12, "total_list_items": 42, "total_tables": 3, "total_images": 5, "total_words": 12547 }, "ai_model": "claude-3-5-sonnet-20241022", "consolidation_notes": "Successfully merged 14 pages into continuous format" }
Implementation
Execute consolidation via Python wrapper:
cd Calypso/tools # Run consolidation python3 consolidate_chapter.py \ --chapter 2 \ --pages 15-28 \ --output "../output" \ --mapping "../analysis/page_mapping.json" # Or invoke directly via Claude API: # The orchestrator sends the AI prompt with all page contents
Quality Checks
Before passing to next gate:
-
File created
-
existschapter_XX.html - File is valid HTML (parseable)
- File size reasonable (> 50KB typical)
-
-
Structure validated
- Single
<div class="page-container"> - Single
<main class="page-content"> - All tags properly closed
- No duplicate content
- Single
-
Content completeness
- All pages represented
- No missing sections
- Paragraph/heading counts reasonable
- All text content present
-
Heading hierarchy
- Starts with h1 (chapter title)
- h1 count = 1
- h2 = major sections
- h3/h4 = subsections
- No hierarchy jumps
-
Metadata logged
- Consolidation timestamp recorded
- Pages merged count documented
- Content statistics calculated
- Log file saved
Success Criteria
✓ All pages merged into single document ✓ Chapter header preserved from page 1 ✓ Duplicate headers removed from continuation pages ✓ Content flows naturally (continuous format) ✓ Heading hierarchy is correct ✓ All text content preserved ✓ Semantic classes maintained ✓ Ready for semantic validation
Error Handling
If page HTML is incomplete:
- Note in consolidation log
- Include whatever content is available
- Proceed to validation (validation will catch issues)
If heading hierarchy is ambiguous:
- AI makes best judgment
- Semantic validation gate will refine if needed
- Document decision in log
If content appears duplicated:
- AI deduplicates automatically
- Verify word count is reasonable
- Log any unusual content patterns
Next Steps
Once consolidation completes:
- Quality Gate 2 (semantic-validate) checks semantic structure
- Skill 5 (quality-report-generate) generates final report
- Quality Gate 3 (visual-accuracy-check) validates appearance
Design Notes
- This skill is AI-powered (uses probabilistic consolidation)
- Relies on AI's understanding of document structure
- Produces continuous format (no page breaks)
- Merges intelligently (not just concatenation)
- Output will be refined by validation gates
Testing
To test consolidation on Chapter 2:
# Input: 14 individual page HTML files (pages 16-29) # Process: AI merges into single continuous chapter # Output: chapter_02.html (single, unified document) # Verify: # - File size is sum of all pages # - Content flows logically # - Heading hierarchy makes sense # - No duplicate sections