Marketplace html-structure-validate

Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality.

install
source · Clone the upstream repo
git clone https://github.com/aiskillstore/marketplace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/abejitsu/html-structure-validate" ~/.claude/skills/aiskillstore-marketplace-html-structure-validate && rm -rf "$T"
manifest: skills/abejitsu/html-structure-validate/SKILL.md
source content

HTML Structure Validate Skill

Purpose

This skill is a BLOCKING quality gate that ensures generated HTML meets minimum structural requirements. It is the first deterministic validation of probabilistic AI-generated output.

The skill checks:

  • HTML5 compliance - Proper DOCTYPE, tags
  • Tag closure - All tags properly closed
  • Required elements - Meta tags, stylesheet links
  • Well-formedness - Valid structure

If validation fails, the pipeline STOPS and triggers a hook to notify the user.

This enforces the principle: Python validates, ensuring deterministic quality.

What to Do

  1. Load HTML file to validate

    • Read
      04_page_XX.html
      generated by AI skill
    • Verify file exists and is readable
    • Confirm file is text (not binary)
  2. Run validation checks

    • Check HTML5 structure compliance
    • Verify tag closure
    • Validate head section
    • Check required CSS link
    • Validate page container structure
  3. Generate validation report

    • Document all checks performed
    • List any errors found
    • Note warnings (non-blocking)
    • Record informational findings
  4. Save validation report as JSON

    • Save to:
      output/chapter_XX/page_artifacts/page_YY/06_validation_structure.json
    • Include timestamp
    • Include all check results
  5. Exit with appropriate code

    • Return 0 if VALID (continue pipeline)
    • Return 1 if INVALID (STOP pipeline, trigger hook)

Input Parameters

html_file: <str>         - Path to 04_page_XX.html
output_dir: <str>        - Directory for validation report
strict_mode: <bool>      - If true, warnings also fail (default: false)
page_number: <int>       - Page number (for reporting)
chapter: <int>           - Chapter number (for reporting)

Validation Checks

Check 1: DOCTYPE Declaration

Requirement: File must start with proper DOCTYPE

<!DOCTYPE html>

Check:

  • File contains
    <!DOCTYPE html>
    (case-insensitive)
  • DOCTYPE appears before any tags
  • DOCTYPE is on first line or near beginning

Error if: Missing or incorrect DOCTYPE

Check 2: HTML Tags

Requirement: Proper

<html>
opening and closing tags

<html lang="en">
    ...
</html>

Checks:

  • <html>
    tag present
  • </html>
    closing tag present
  • Tags are properly paired
  • No unclosed
    <html>
    tags

Error if: Missing either tag or improperly paired

Check 3: Head Section

Requirement: Complete

<head>
section with metadata

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>...</title>
    <link rel="stylesheet" href="../../styles/main.css">
</head>

Checks:

  • <head>
    and
    </head>
    tags present
  • <meta charset="UTF-8">
    present
  • <meta name="viewport">
    present (warning if missing)
  • <title>
    tag with content present
  • CSS
    <link>
    tag present with href attribute

Error if: Missing charset, title, or CSS link Warning if: Missing viewport meta tag

Check 4: Body Section

Requirement: Proper

<body>
tags with content

<body>
    <div class="page-container">
        <main class="page-content">
            ...
        </main>
    </div>
</body>

Checks:

  • <body>
    and
    </body>
    tags present
  • <div class="page-container">
    present
  • <main class="page-content">
    present inside container
  • Body contains substantial content (> 100 bytes)

Error if: Missing tags or required container divs

Check 5: Tag Closure Validation

Requirement: All tags must be properly closed

Checks for:

  • Unmatched opening tags (e.g.,
    <p>
    without
    </p>
    )
  • Improper nesting (e.g.,
    <p><h2>text</h2></p>
    )
  • Self-closing tags used correctly (e.g.,
    <br/>
    ,
    <img/>
    )
  • Comment blocks properly formatted (
    <!-- -->
    )

Validation method:

  • Parse HTML into tree structure
  • Verify all nodes properly matched
  • Check nesting doesn't violate HTML5 rules

Error if: Any unmatched or improperly nested tags

Check 6: Heading Tags (h1-h6)

Requirement: Valid heading hierarchy

<h1>Chapter Title</h1>
<h2>Section Heading</h2>
<h3>Subsection</h3>

Checks:

  • All heading tags properly closed
  • First heading should be h1 (warning if not)
  • Heading levels don't skip dramatically (h1 → h4 is suspicious)
  • All headings have text content (not empty)

Error if: Heading tags improperly closed Warning if: Suspicious hierarchy

Check 7: Content Structure

Requirement: Meaningful content in page container

Checks:

  • <main class="page-content">
    contains elements
  • Content includes headings or paragraphs
  • No completely empty content area
  • Text nodes or elements present (> 100 words total)

Error if: No content or empty structure

Check 8: List Integrity

Requirement: All lists properly structured

Checks for each

<ul>
or
<ol>
:

  • List opening and closing tags matched
  • List contains
    <li>
    elements
  • All
    <li>
    tags properly closed
  • <li>
    count matches opening/closing pairs
  • No nested
    <ul>
    or
    <ol>
    improperly closed

Error if: Empty lists or unmatched

<li>
tags

Check 9: Image and Link Tags

Requirement: Self-closing tags properly formatted

Checks:

  • All
    <img>
    tags have
    src
    and
    alt
    attributes
  • All
    <a>
    tags have valid
    href
    attributes
  • Image paths don't have obvious errors (no broken syntax)
  • Self-closing tags use proper syntax

Warning if: Images missing alt text or links missing href

Check 10: Table Tags (if present)

Requirement: Proper table structure

Checks:

  • <table>
    ,
    <tr>
    ,
    <td>
    ,
    <th>
    tags properly nested
  • All rows have consistent column counts
  • Table headers and body properly structured

Error if: Malformed table structure

Validation Report Format

Output:
06_validation_structure.json

{
  "page": 16,
  "book_page": 17,
  "chapter": 2,
  "validation_type": "structure",
  "validation_timestamp": "2025-11-08T14:34:00Z",
  "overall_status": "PASS",
  "error_count": 0,
  "warning_count": 1,
  "checks_performed": [
    {
      "check_name": "DOCTYPE Declaration",
      "status": "PASS",
      "details": "Valid HTML5 DOCTYPE found"
    },
    {
      "check_name": "HTML Tags",
      "status": "PASS",
      "details": "Proper <html> opening and closing tags"
    },
    {
      "check_name": "Head Section",
      "status": "PASS",
      "details": "All required meta tags and title present"
    },
    {
      "check_name": "Body Section",
      "status": "PASS",
      "details": "Body and content structure valid"
    },
    {
      "check_name": "Tag Closure",
      "status": "PASS",
      "details": "All tags properly matched and closed"
    },
    {
      "check_name": "Heading Hierarchy",
      "status": "PASS",
      "details": "4 headings found, proper h1-h4 hierarchy"
    },
    {
      "check_name": "Content Structure",
      "status": "PASS",
      "details": "Main content area contains 245 words across 3 paragraphs"
    },
    {
      "check_name": "List Integrity",
      "status": "PASS",
      "details": "1 list with 3 items, all properly formed"
    },
    {
      "check_name": "Image Tags",
      "status": "PASS",
      "details": "No images on this page"
    },
    {
      "check_name": "Table Tags",
      "status": "PASS",
      "details": "No tables on this page"
    }
  ],
  "errors": [],
  "warnings": [
    {
      "check": "Heading Hierarchy",
      "message": "First heading is h2, typically should be h1 for page opening",
      "severity": "LOW"
    }
  ],
  "summary": {
    "total_checks": 10,
    "passed": 9,
    "failed": 0,
    "warnings": 1,
    "html_valid": true,
    "tags_matched": true,
    "content_substantial": true
  }
}

Validation Rules

PASS Criteria

  • DOCTYPE present and valid
  • All required tags (
    html
    ,
    head
    ,
    body
    ,
    main
    ,
    div.page-container
    ) present
  • All tags properly closed and matched
  • Title tag with content
  • CSS stylesheet link present
  • Content structure valid
  • No structural errors

FAIL Criteria (BLOCKS PIPELINE)

  • Missing DOCTYPE
  • Missing required tags
  • Unmatched or improperly nested tags
  • Missing title or CSS link
  • Empty content
  • Malformed lists or tables

WARNING (Logged but doesn't block)

  • Missing viewport meta tag
  • First heading is not h1
  • Large heading jumps (h1 → h4)
  • Missing alt text on images
  • Missing href on links

Implementation: Using Python Script

This validation is performed by existing

validate_html.py
tool, run in structure validation mode:

cd Calypso/tools

# Validate single page HTML
python3 validate_html.py \
  ../output/chapter_02/page_artifacts/page_16/04_page_16.html \
  --output-json ../output/chapter_02/page_artifacts/page_16/06_validation_structure.json \
  --strict-structure

# Exit code:
# 0 = VALID (continue to next skill)
# 1 = INVALID (STOP pipeline)

Hook Integration

When validation FAILS:

# Trigger hook: .claude/hooks/validate-structure.sh
# Receives:
#   - Page number
#   - HTML file path
#   - Validation report path
#   - Error details

# Hook behavior:
# - Log failure with details
# - Save error report
# - Notify user
# - STOP pipeline (no further processing)

Error Recovery

If validation fails:

  1. User reviews validation report
  2. User identifies issue in AI-generated HTML
  3. Options:
    • Fix HTML manually and re-validate
    • Re-run AI generation with improved prompt
    • Review source extraction data for errors
    • Proceed with caution (expert override)

Quality Metrics

Validation provides metrics:

  • Percentage of checks passing
  • Error severity levels
  • Content size (word count, element count)
  • Structure complexity

These metrics feed into final quality reports.

Success Criteria

✓ Validation completes successfully ✓ All structural checks pass (0 errors) ✓ Validation report saved in JSON format ✓ Exit code 0 returned (or 1 if invalid) ✓ Clear error messages if validation fails

Next Steps After PASS

If validation passes:

  1. All pages of chapter processed through this gate
  2. Skill 4 (consolidate pages) merges individual page HTMLs
  3. Quality Gate 2 (semantic validate) checks semantic structure
  4. Continue through validation pipeline

Next Steps After FAIL

If validation fails:

  1. PIPELINE STOPS
  2. Hook
    validate-structure.sh
    triggered
  3. User receives error report with details
  4. User must fix issues and retry

Design Notes

  • This is the first deterministic quality gate
  • Uses proven
    validate_html.py
    tool
  • Catches structural issues before semantic analysis
  • Provides clear, actionable error messages
  • Essential for ensuring pipeline reliability

Testing

To test structure validation:

# Test with known-good HTML
python3 validate_html.py ../output/chapter_01/chapter_01.html

# Should show: ✓ VALID

# Test with invalid HTML (if needed)
python3 validate_html.py broken_html.html

# Should show: ✗ INVALID with specific errors