Marketplace html-structure-validate
Validate HTML5 structure and basic syntax. BLOCKING quality gate - stops pipeline if validation fails. Ensures deterministic output quality.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/abejitsu/html-structure-validate" ~/.claude/skills/aiskillstore-marketplace-html-structure-validate && rm -rf "$T"
skills/abejitsu/html-structure-validate/SKILL.mdHTML Structure Validate Skill
Purpose
This skill is a BLOCKING quality gate that ensures generated HTML meets minimum structural requirements. It is the first deterministic validation of probabilistic AI-generated output.
The skill checks:
- HTML5 compliance - Proper DOCTYPE, tags
- Tag closure - All tags properly closed
- Required elements - Meta tags, stylesheet links
- Well-formedness - Valid structure
If validation fails, the pipeline STOPS and triggers a hook to notify the user.
This enforces the principle: Python validates, ensuring deterministic quality.
What to Do
-
Load HTML file to validate
- Read
generated by AI skill04_page_XX.html - Verify file exists and is readable
- Confirm file is text (not binary)
- Read
-
Run validation checks
- Check HTML5 structure compliance
- Verify tag closure
- Validate head section
- Check required CSS link
- Validate page container structure
-
Generate validation report
- Document all checks performed
- List any errors found
- Note warnings (non-blocking)
- Record informational findings
-
Save validation report as JSON
- Save to:
output/chapter_XX/page_artifacts/page_YY/06_validation_structure.json - Include timestamp
- Include all check results
- Save to:
-
Exit with appropriate code
- Return 0 if VALID (continue pipeline)
- Return 1 if INVALID (STOP pipeline, trigger hook)
Input Parameters
html_file: <str> - Path to 04_page_XX.html output_dir: <str> - Directory for validation report strict_mode: <bool> - If true, warnings also fail (default: false) page_number: <int> - Page number (for reporting) chapter: <int> - Chapter number (for reporting)
Validation Checks
Check 1: DOCTYPE Declaration
Requirement: File must start with proper DOCTYPE
<!DOCTYPE html>
Check:
- File contains
(case-insensitive)<!DOCTYPE html> - DOCTYPE appears before any tags
- DOCTYPE is on first line or near beginning
Error if: Missing or incorrect DOCTYPE
Check 2: HTML Tags
Requirement: Proper
<html> opening and closing tags
<html lang="en"> ... </html>
Checks:
-
tag present<html> -
closing tag present</html> - Tags are properly paired
- No unclosed
tags<html>
Error if: Missing either tag or improperly paired
Check 3: Head Section
Requirement: Complete
<head> section with metadata
<head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>...</title> <link rel="stylesheet" href="../../styles/main.css"> </head>
Checks:
-
and<head>
tags present</head> -
present<meta charset="UTF-8"> -
present (warning if missing)<meta name="viewport"> -
tag with content present<title> - CSS
tag present with href attribute<link>
Error if: Missing charset, title, or CSS link Warning if: Missing viewport meta tag
Check 4: Body Section
Requirement: Proper
<body> tags with content
<body> <div class="page-container"> <main class="page-content"> ... </main> </div> </body>
Checks:
-
and<body>
tags present</body> -
present<div class="page-container"> -
present inside container<main class="page-content"> - Body contains substantial content (> 100 bytes)
Error if: Missing tags or required container divs
Check 5: Tag Closure Validation
Requirement: All tags must be properly closed
Checks for:
- Unmatched opening tags (e.g.,
without<p>
)</p> - Improper nesting (e.g.,
)<p><h2>text</h2></p> - Self-closing tags used correctly (e.g.,
,<br/>
)<img/> - Comment blocks properly formatted (
)<!-- -->
Validation method:
- Parse HTML into tree structure
- Verify all nodes properly matched
- Check nesting doesn't violate HTML5 rules
Error if: Any unmatched or improperly nested tags
Check 6: Heading Tags (h1-h6)
Requirement: Valid heading hierarchy
<h1>Chapter Title</h1> <h2>Section Heading</h2> <h3>Subsection</h3>
Checks:
- All heading tags properly closed
- First heading should be h1 (warning if not)
- Heading levels don't skip dramatically (h1 → h4 is suspicious)
- All headings have text content (not empty)
Error if: Heading tags improperly closed Warning if: Suspicious hierarchy
Check 7: Content Structure
Requirement: Meaningful content in page container
Checks:
-
contains elements<main class="page-content"> - Content includes headings or paragraphs
- No completely empty content area
- Text nodes or elements present (> 100 words total)
Error if: No content or empty structure
Check 8: List Integrity
Requirement: All lists properly structured
Checks for each
<ul> or <ol>:
- List opening and closing tags matched
- List contains
elements<li> - All
tags properly closed<li> -
count matches opening/closing pairs<li> - No nested
or<ul>
improperly closed<ol>
Error if: Empty lists or unmatched
<li> tags
Check 9: Image and Link Tags
Requirement: Self-closing tags properly formatted
Checks:
- All
tags have<img>
andsrc
attributesalt - All
tags have valid<a>
attributeshref - Image paths don't have obvious errors (no broken syntax)
- Self-closing tags use proper syntax
Warning if: Images missing alt text or links missing href
Check 10: Table Tags (if present)
Requirement: Proper table structure
Checks:
-
,<table>
,<tr>
,<td>
tags properly nested<th> - All rows have consistent column counts
- Table headers and body properly structured
Error if: Malformed table structure
Validation Report Format
Output: 06_validation_structure.json
06_validation_structure.json{ "page": 16, "book_page": 17, "chapter": 2, "validation_type": "structure", "validation_timestamp": "2025-11-08T14:34:00Z", "overall_status": "PASS", "error_count": 0, "warning_count": 1, "checks_performed": [ { "check_name": "DOCTYPE Declaration", "status": "PASS", "details": "Valid HTML5 DOCTYPE found" }, { "check_name": "HTML Tags", "status": "PASS", "details": "Proper <html> opening and closing tags" }, { "check_name": "Head Section", "status": "PASS", "details": "All required meta tags and title present" }, { "check_name": "Body Section", "status": "PASS", "details": "Body and content structure valid" }, { "check_name": "Tag Closure", "status": "PASS", "details": "All tags properly matched and closed" }, { "check_name": "Heading Hierarchy", "status": "PASS", "details": "4 headings found, proper h1-h4 hierarchy" }, { "check_name": "Content Structure", "status": "PASS", "details": "Main content area contains 245 words across 3 paragraphs" }, { "check_name": "List Integrity", "status": "PASS", "details": "1 list with 3 items, all properly formed" }, { "check_name": "Image Tags", "status": "PASS", "details": "No images on this page" }, { "check_name": "Table Tags", "status": "PASS", "details": "No tables on this page" } ], "errors": [], "warnings": [ { "check": "Heading Hierarchy", "message": "First heading is h2, typically should be h1 for page opening", "severity": "LOW" } ], "summary": { "total_checks": 10, "passed": 9, "failed": 0, "warnings": 1, "html_valid": true, "tags_matched": true, "content_substantial": true } }
Validation Rules
PASS Criteria
- DOCTYPE present and valid
- All required tags (
,html
,head
,body
,main
) presentdiv.page-container - All tags properly closed and matched
- Title tag with content
- CSS stylesheet link present
- Content structure valid
- No structural errors
FAIL Criteria (BLOCKS PIPELINE)
- Missing DOCTYPE
- Missing required tags
- Unmatched or improperly nested tags
- Missing title or CSS link
- Empty content
- Malformed lists or tables
WARNING (Logged but doesn't block)
- Missing viewport meta tag
- First heading is not h1
- Large heading jumps (h1 → h4)
- Missing alt text on images
- Missing href on links
Implementation: Using Python Script
This validation is performed by existing
validate_html.py tool, run in structure validation mode:
cd Calypso/tools # Validate single page HTML python3 validate_html.py \ ../output/chapter_02/page_artifacts/page_16/04_page_16.html \ --output-json ../output/chapter_02/page_artifacts/page_16/06_validation_structure.json \ --strict-structure # Exit code: # 0 = VALID (continue to next skill) # 1 = INVALID (STOP pipeline)
Hook Integration
When validation FAILS:
# Trigger hook: .claude/hooks/validate-structure.sh # Receives: # - Page number # - HTML file path # - Validation report path # - Error details # Hook behavior: # - Log failure with details # - Save error report # - Notify user # - STOP pipeline (no further processing)
Error Recovery
If validation fails:
- User reviews validation report
- User identifies issue in AI-generated HTML
- Options:
- Fix HTML manually and re-validate
- Re-run AI generation with improved prompt
- Review source extraction data for errors
- Proceed with caution (expert override)
Quality Metrics
Validation provides metrics:
- Percentage of checks passing
- Error severity levels
- Content size (word count, element count)
- Structure complexity
These metrics feed into final quality reports.
Success Criteria
✓ Validation completes successfully ✓ All structural checks pass (0 errors) ✓ Validation report saved in JSON format ✓ Exit code 0 returned (or 1 if invalid) ✓ Clear error messages if validation fails
Next Steps After PASS
If validation passes:
- All pages of chapter processed through this gate
- Skill 4 (consolidate pages) merges individual page HTMLs
- Quality Gate 2 (semantic validate) checks semantic structure
- Continue through validation pipeline
Next Steps After FAIL
If validation fails:
- PIPELINE STOPS
- Hook
triggeredvalidate-structure.sh - User receives error report with details
- User must fix issues and retry
Design Notes
- This is the first deterministic quality gate
- Uses proven
toolvalidate_html.py - Catches structural issues before semantic analysis
- Provides clear, actionable error messages
- Essential for ensuring pipeline reliability
Testing
To test structure validation:
# Test with known-good HTML python3 validate_html.py ../output/chapter_01/chapter_01.html # Should show: ✓ VALID # Test with invalid HTML (if needed) python3 validate_html.py broken_html.html # Should show: ✗ INVALID with specific errors