Claude-skill-registry hierarchical-matching-systems
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/hierarchical-matching-systems" ~/.claude/skills/majiayu000-claude-skill-registry-hierarchical-matching-systems && rm -rf "$T"
skills/data/hierarchical-matching-systems/SKILL.mdHierarchical Matching Systems
This skill provides rigid diagnostic and architectural procedures for hierarchical matching systems. Follow checklists exactly—matching bugs often hide in skipped steps.
Quick Reference
- Algorithm selection: See references/decision-guide.md
- Algorithm details: See references/algorithms.md
1. Problem Classification Checklist
Before any work, classify the problem. Check ALL that apply:
□ TWO-SIDED: Both sides have preferences (students↔schools, workers↔jobs) □ ONE-SIDED: Only one side has preferences (tasks→workers, items→bins) □ HIERARCHICAL: Entities exist at multiple levels (org→dept→team→person) □ WEIGHTED: Matches have costs/scores to optimize □ CONSTRAINED: Hard limits exist (capacity, exclusions, required pairings) □ STABLE: Solution must resist defection (no blocking pairs) □ OPTIMAL: Solution must minimize/maximize objective function □ FUZZY: Entities may partially match (entity resolution, deduplication)
Classification determines algorithm family. Proceed to Section 2 for architecture or Section 3 for debugging.
2. Architecture Procedure
Follow these phases in order when designing a new matching system.
Phase 2.1: Requirements Translation
Convert each business requirement into formal constraints:
| Business Requirement | Constraint Type | Formal Expression |
|---|---|---|
| "Each student gets one school" | Capacity | ` |
| "Schools have seat limits" | Capacity | ` |
| "Siblings must be together" | Coupling | |
| "Student X cannot attend Y" | Exclusion | |
| "Priority for residents" | Preference ordering | |
Checklist:
□ List ALL business requirements □ Classify each as: capacity | coupling | exclusion | ordering | soft preference □ Identify conflicts between requirements (document tradeoffs) □ Distinguish HARD constraints (must satisfy) from SOFT (optimize toward) □ Validate translations with stakeholder examples
Phase 2.2: Algorithm Selection
Use references/decision-guide.md to select algorithm. Verify:
□ Algorithm handles all HARD constraints □ Algorithm can optimize SOFT constraints (or document gaps) □ Complexity acceptable for data size (see references/algorithms.md) □ Stability requirements met (if two-sided) □ Optimality requirements met (if weighted)
Phase 2.3: Data Model Design
Define entities, relationships, and preference representation:
□ Entity schema for each side (attributes, identifiers) □ Preference representation (ranked list | score matrix | pairwise comparisons) □ Constraint encoding (how exclusions/couplings are stored) □ Hierarchy representation (if multi-level: tree | DAG | adjacency list) □ Tie-breaking rules (deterministic ordering for equal preferences)
Phase 2.4: Interface Contracts
Specify inputs, outputs, and invariants:
Input Contract:
□ Preference format and validation rules □ Constraint format and validation rules □ Required vs optional fields □ How missing preferences are handled (reject | default rank | exclude)
Output Contract:
□ Match format (pairs | assignment map | ranked list) □ Unmatched entity handling (explicit list | null matches | error) □ Match metadata (scores, stability proof, constraint satisfaction report)
Invariants:
□ Determinism: same input → same output (document randomness if any) □ Completeness: all entities matched OR explicitly unmatched □ Validity: all matches satisfy hard constraints
Phase 2.5: Testing Strategy
Define validation before implementation:
□ Unit tests for preference parsing and constraint validation □ Property tests: stability, optimality, constraint satisfaction □ Edge cases: empty inputs, single entity, all tied preferences □ Regression tests from known-good examples □ Performance benchmarks at target scale
3. Debugging Procedure
Follow this diagnostic sequence for any matching issue. Do not skip steps.
Phase 3.1: Symptom Classification
Identify the symptom category:
| Symptom | Category | Go To |
|---|---|---|
| Same inputs, different outputs | INSTABILITY | 3.2 |
| Matches violate business rules | CONSTRAINT VIOLATION | 3.3 |
| Matches technically valid but "wrong" | PREFERENCE MISALIGNMENT | 3.4 |
| Errors with nested/hierarchical data | HIERARCHY BUG | 3.5 |
| Poor performance at scale | PERFORMANCE | 3.6 |
Phase 3.2: Instability Diagnosis
Root causes of non-deterministic matches:
□ RANDOMNESS: Check for unseeded RNG in tie-breaking → Fix: Use deterministic tie-breaker (lexicographic ID, timestamp) □ FLOATING POINT: Check score comparisons for floating point issues → Fix: Use epsilon comparison or integer scores □ HASH ORDERING: Check if iteration order depends on hash maps → Fix: Sort keys before iteration □ PARALLEL RACE: Check for concurrent modifications → Fix: Synchronize or use sequential processing □ INPUT ORDERING: Check if algorithm is order-sensitive → Fix: Canonicalize input order (sort by ID)
Verification:
□ Run matching 10x with identical inputs □ Diff all outputs □ If any differ, add logging to identify divergence point
Phase 3.3: Constraint Violation Diagnosis
Diagnostic sequence:
1. □ IDENTIFY: Which specific constraint is violated? → List the violated constraint and the violating match 2. □ TRACE: Where should constraint be enforced? → Map constraint to code location (filter | validation | algorithm step) 3. □ VERIFY ENCODING: Is constraint correctly represented? → Print constraint data structure, verify against requirement 4. □ VERIFY ENFORCEMENT: Is constraint checked at right time? → Add logging before/after enforcement point 5. □ CHECK ORDERING: Is constraint checked before conflicting decisions? → Trace decision sequence, verify constraint checked first 6. □ CHECK COMPLETENESS: Are all instances covered? → Enumerate all entities that should be constrained
Common failure patterns:
| Pattern | Symptom | Fix |
|---|---|---|
| Late enforcement | Valid intermediate state, invalid final | Move check earlier |
| Partial coverage | Some entities constrained, others not | Enumerate all cases |
| Soft vs hard confusion | Constraint violated for "better" match | Reclassify as hard |
| Stale data | Constraint on outdated values | Refresh before check |
Phase 3.4: Preference Misalignment Diagnosis
When matches are valid but don't reflect intended priorities:
1. □ EXTRACT: Get the actual preference data used → Log/print the exact preference structure at match time 2. □ COMPARE: Check against expected preferences → Side-by-side diff with business-stated priorities 3. □ TRACE TRANSFORMATION: Follow preference from input to algorithm → Log at each transformation step (parsing, normalization, scoring) 4. □ CHECK SCORING: Verify score calculation → Manual calculation for 2-3 example cases 5. □ CHECK AGGREGATION: If multi-criteria, verify combination → Test each criterion independently, then combined 6. □ CHECK NORMALIZATION: Verify scale/range handling → Check for min/max, z-score, or rank normalization bugs
Scoring function checklist:
□ Direction correct (higher = better or lower = better, consistently) □ Scale appropriate (no single factor dominating) □ Missing values handled (null → 0? → excluded? → default?) □ Ties handled explicitly (not left to floating point chance) □ Edge cases: extreme values, all same values, single candidate
Phase 3.5: Hierarchy Traversal Diagnosis
For multi-level matching issues:
1. □ VISUALIZE: Draw the hierarchy with the problematic match → Tree diagram showing all levels and the match path 2. □ CHECK INHERITANCE: Do child constraints inherit from parent? → Verify constraint propagation rules 3. □ CHECK AGGREGATION: How do child preferences roll up? → Verify aggregation function (sum | max | weighted | majority) 4. □ CHECK LEVEL INTERACTION: Can matches cross levels? → Document allowed/forbidden cross-level matches 5. □ CHECK TRAVERSAL ORDER: Top-down or bottom-up? → Verify algorithm processes levels in intended order 6. □ CHECK PARTIAL MATCHES: Can a parent match without all children? → Document completeness requirements per level
Common hierarchy bugs:
| Bug | Symptom | Fix |
|---|---|---|
| Missing propagation | Child ignores parent constraint | Add inheritance logic |
| Double counting | Same entity weighted multiple times | Deduplicate in aggregation |
| Level skipping | Match at wrong level | Add level validation |
| Orphan handling | Unattached children cause errors | Define orphan policy |
Phase 3.6: Performance Diagnosis
For scale and speed issues:
1. □ PROFILE: Identify the slow component → Time each phase: input parsing, preference building, matching, output 2. □ COMPLEXITY CHECK: Verify actual vs expected complexity → Log iteration counts, compare to theoretical O(n) 3. □ MEMORY CHECK: Profile memory usage → Watch for preference matrix explosion (n² space) 4. □ ALGORITHM FIT: Verify algorithm appropriate for scale → See references/algorithms.md for complexity comparison 5. □ CACHING: Check for redundant computation → Log cache hits/misses for preference lookups 6. □ BATCH VS STREAMING: Check processing model → Full recomputation vs incremental updates
4. Testing & Validation Procedure
4.1: Correctness Properties
Test these properties for every matching system:
□ DETERMINISM: run(input) = run(input) (10 trials minimum) □ COMPLETENESS: all entities either matched or explicitly unmatched □ VALIDITY: all matches satisfy all hard constraints □ STABILITY (if applicable): no blocking pairs exist □ OPTIMALITY (if applicable): objective function at expected value
4.2: Stability Verification (Two-Sided Matching)
For stable matching, verify no blocking pairs:
# Pseudocode - verify no blocking pair exists for each unmatched_pair (a, b): if a prefers b over current_match(a): if b prefers a over current_match(b): FAIL: blocking pair found (a, b)
□ Enumerate all non-matched pairs □ Check mutual preference for each □ Report any blocking pairs found □ For large instances, sample-check (document coverage)
4.3: Constraint Satisfaction Verification
□ List all hard constraints □ For each match, verify against each constraint □ Generate constraint satisfaction report □ Flag any violations with specific match and constraint
4.4: Edge Case Test Suite
Mandatory test cases:
□ Empty input (no entities on one or both sides) □ Single entity (one-to-one degenerate case) □ All identical preferences (maximum tie scenario) □ Mutually exclusive preferences (everyone wants same thing) □ Impossible constraints (unsatisfiable, should error clearly) □ Maximum capacity (all slots exactly filled) □ Minimum capacity (barely enough slots) □ Self-referential (can entity match itself? test boundary) □ Circular preferences (A→B→C→A)
4.5: Regression Test Maintenance
□ Capture real production cases that revealed bugs □ Minimize to smallest reproducing example □ Document expected behavior explicitly □ Run on every change to matching logic
5. Review Checklist
When reviewing matching system code or design:
5.1: Design Review
□ Problem correctly classified (Section 1) □ Algorithm appropriate for problem class (references/decision-guide.md) □ All business requirements mapped to constraints (Section 2.1) □ Hard vs soft constraints clearly distinguished □ Tie-breaking is deterministic and documented □ Hierarchy semantics defined (if applicable)
5.2: Implementation Review
□ Preference representation matches algorithm requirements □ Constraints enforced at correct point in algorithm □ No hidden randomness (unseeded RNG, hash iteration) □ Floating point comparison handled correctly □ Edge cases handled (empty, single, ties) □ Error messages identify specific constraint violations
5.3: Testing Review
□ All properties from 4.1 tested □ Edge cases from 4.4 covered □ Performance benchmarked at realistic scale □ Regression tests exist for past bugs
Appendix: Common Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Greedy first-come | Order-dependent, non-optimal | Use proper algorithm |
| Score = sum(all factors) | One factor dominates | Normalize scales |
| Retry until valid | Non-deterministic, slow | Fix constraint order |
| Global preference cache | Stale across updates | Invalidate on change |
| String matching for entities | Case/whitespace bugs | Use canonical IDs |
| Float equality for ties | Non-deterministic | Use epsilon or integer |
| Recursive hierarchy walk | Stack overflow risk | Use iterative with explicit stack |
| N² preference matrix | Memory explosion | Use sparse representation |