Claude-code-minoan linear-a-decipherment
Analyze Linear A inscriptions computationally via Gordon's Semitic hypothesis — sign frequency, co-occurrence stats, consonantal skeleton extraction, Proto-Semitic root comparison, Gordon lexicon lookup, libation formula patterns, and ML training data prep. Triggers on Linear A, Minoan script, GORILA corpus, HT/ZA/PK tablets, Semitic cognates, sign values, Phaistos, Hagia Triada, lashon ha-kretan.
git clone https://github.com/tdimino/claude-code-minoan
T=$(mktemp -d) && git clone --depth=1 https://github.com/tdimino/claude-code-minoan "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/research/linear-a-decipherment" ~/.claude/skills/tdimino-claude-code-minoan-linear-a-decipherment && rm -rf "$T"
skills/research/linear-a-decipherment/SKILL.mdLinear A Decipherment
Computational pipeline for analyzing Linear A inscriptions against Semitic roots, formalizing Cyrus H. Gordon's five-step decipherment methodology. Built on data from
lashon-ha-kretan (1,701 inscriptions, 60 Gordon readings, 2,871 Proto-Semitic roots).
Base directory:
~/.claude/skills/linear-a-decipherment
Scholarly Disclaimer
All readings are hypothetical. Linear A remains officially undeciphered. Gordon's Semitic hypothesis is one of several competing frameworks. Include this disclaimer on every analytical output.
Confidence Taxonomy
Every proposed reading must be tagged with a confidence level:
| Level | Criteria | Example |
|---|---|---|
| CONFIRMED | Ideographic + phonetic + mathematical confirmation | KU-NI-SU (emmer wheat) |
| PROBABLE | Direct Gordon reading + external attestation | DA-KU-SE-NE (Hurrian name at Nuzi) |
| CANDIDATE | Gordon reading or strong Proto-Semitic match (d < 0.3) | New cognate from distance search |
| SPECULATIVE | Weak phonetic match or single-source evidence | Proto-Semitic match with d > 0.5 |
Reference File Protocol
Route questions to the right reference before answering:
Question about a specific reading or word? → Read references/gordon-lexicon.md → Run: uv run scripts/cognate_search.py "WORD" Question about methodology or approach? → Read references/methodology.md Question about sign values or the syllabary? → Read references/sign-values.md Question about ML/computational approaches? → Read references/ml-approaches.md Question about a specific inscription? → Run: uv run scripts/analyze.py single INSCRIPTION_NAME Question about corpus statistics? → Run: uv run scripts/sign_analysis.py SUBCOMMAND
Data Dependencies
Source data from
lashon-ha-kretan:
| File | Path | Contents |
|---|---|---|
| Inscriptions | | ~1,701 GORILA inscriptions |
| Lexicon | | 60 Gordon + 3 YasharMana + 7 scholarly readings |
| Proto-Semitic | | 2,871 roots |
Extracted data cached in
data/ (generated by corpus_extract.py --all):
— Structured inscriptionsdata/corpus.json
— Gordon + YasharMana lexicondata/gordon.json
— Proto-Semitic rootsdata/semitic_roots.json
— Precomputed cognate scores (built bydata/cognate_cache.json
)cognate_search.py --build-cache
If
data/ files are missing, run extraction first:
uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --all
Workflows
1. Analyze a Single Inscription
Runs Gordon's 5-step pipeline on one inscription:
# Human-readable report uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py single HT88 # JSON output uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py single HT88 --format json
Steps performed: transliteration extraction, segmentation, consonantal skeleton for each word, cognate search (Gordon → YasharMana → Proto-Semitic cache), coverage summary.
2. Search Cognates for a Word
Find Semitic cognates for any Linear A transliteration:
# Full search with table output uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" # Skeleton extraction only uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --skeleton # JSON with top 10 matches uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --top 10 --format json # Skip cache for live Proto-Semitic search uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --no-cache
Pipeline: transliteration → skeleton (k-r-t) → Gordon direct → YasharMana → Proto-Semitic distance.
3. Find Unknown Words (Discovery Mode)
Identify frequently-occurring words with no known reading—best targets for new cognate proposals:
# Top 20 unknown words appearing 3+ times uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode unknowns # More restrictive: top 10 appearing 5+ times uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode unknowns --min-count 5 --top 10
4. Find Promising Inscriptions
Inscriptions with the highest ratio of identified words—best for study:
uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode promising --top 15
5. Compare Libation Formulas
Group inscriptions containing the libation formula (JA-SA-SA-RA-ME pattern):
# List all libation inscriptions uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode libation # With skeleton alignment uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode libation --alignment
6. Corpus Statistics
Statistical analysis of sign patterns:
# Sign frequency (top 30) uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py frequency # Word frequency with hapax legomena count uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py words # Sign co-occurrence within words uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py cooccurrence --signs KI,RO,SA # Positional distribution (initial/medial/final) uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py position # Site distribution (HT, ZA, PK, etc.) uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py distribution # JSON output for any subcommand uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py frequency --format json
7. Generate Training Data
Prepare JSONL for ML fine-tuning:
# Preview first 3 entries uv run ~/.claude/skills/linear-a-decipherment/scripts/finetune_prep.py gordon-pairs --preview 3 # Generate full JSONL uv run ~/.claude/skills/linear-a-decipherment/scripts/finetune_prep.py gordon-pairs --output data/gordon_pairs.jsonl
v1 produces 63 chat-format pairs (Gordon + YasharMana). See
references/ml-approaches.md for v2 augmentation strategy.
8. Reverse Root Search (Semitic Root → Corpus Words)
Given a Semitic consonantal root, find all Linear A words in the corpus whose skeletons match:
# Find corpus words matching root KNS (e.g., kiništu "gathering place") uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse kns # Broader search with higher distance tolerance uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse kns --max-dist 0.5 -n 30 # JSON output for programmatic use uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse thm --format json # Search for Baal-related words (b-'-l root) uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse bl # Search for "give" root (y-t-n) uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse ytn
Pipeline: root consonants → weighted Levenshtein against all corpus word skeletons → ranked by distance, annotated with Gordon/YasharMana readings, occurrence counts, sites, and inscriptions.
9. Extract / Rebuild Corpus
Extract structured data from JS source files:
# Extract everything (inscriptions + lexicons + Proto-Semitic roots) uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --all # Inscriptions only, filtered by site uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --site HT # Include Gordon lexicon uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --with-gordon # Build cognate cache (takes ~10 seconds) uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --build-cache
Integration with Other Skills
| Skill | Usage |
|---|---|
| Create RAG collection from |
| Sefaria for Hebrew cognate verification, CDLI for Akkadian parallels |
| Search recent computational decipherment papers |
| Local inference with fine-tuned decipherment models (v2) |
Architecture
~/.claude/skills/linear-a-decipherment/ ├── SKILL.md # This file ├── lib/ # Shared Python library │ ├── __init__.py │ ├── types.py # Frozen dataclasses (Inscription, LexiconEntry, CognateMatch) │ ├── js_parser.py # JS Map → Python dict extraction │ ├── normalization.py # normalize(), lookup_in(), J/Y swap │ ├── skeleton.py # SIGN_DECOMPOSITION, extract_skeleton() │ └── phonetics.py # SEMITIC_DISTANCES, weighted_levenshtein() ├── scripts/ │ ├── corpus_extract.py # JS → JSON extraction │ ├── cognate_search.py # Forward + reverse cognate search + cache builder │ ├── sign_analysis.py # Corpus-wide sign statistics │ ├── analyze.py # Gordon 5-step pipeline (single + batch) │ └── finetune_prep.py # ML training data generation ├── references/ │ ├── gordon-lexicon.md # Complete 60+3+7 entry lexicon tables │ ├── methodology.md # Gordon's methods, 5-step pipeline │ ├── sign-values.md # Sign confidence levels (HIGH/MEDIUM/LOW) │ └── ml-approaches.md # Computational decipherment survey (v2) └── data/ # Generated (not committed) ├── corpus.json # 1,701 inscriptions ├── gordon.json # 60 Gordon + 3 YasharMana + 7 scholarly entries ├── semitic_roots.json # 2,871 Proto-Semitic roots └── cognate_cache.json # Precomputed cognate scores
All scripts use
uv run with PEP 723 inline metadata. Dependencies: stdlib only.