Claude-code-minoan linear-a-decipherment

Analyze Linear A inscriptions computationally via Gordon's Semitic hypothesis — sign frequency, co-occurrence stats, consonantal skeleton extraction, Proto-Semitic root comparison, Gordon lexicon lookup, libation formula patterns, and ML training data prep. Triggers on Linear A, Minoan script, GORILA corpus, HT/ZA/PK tablets, Semitic cognates, sign values, Phaistos, Hagia Triada, lashon ha-kretan.

install
source · Clone the upstream repo
git clone https://github.com/tdimino/claude-code-minoan
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/tdimino/claude-code-minoan "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/research/linear-a-decipherment" ~/.claude/skills/tdimino-claude-code-minoan-linear-a-decipherment && rm -rf "$T"
manifest: skills/research/linear-a-decipherment/SKILL.md
source content

Linear A Decipherment

Computational pipeline for analyzing Linear A inscriptions against Semitic roots, formalizing Cyrus H. Gordon's five-step decipherment methodology. Built on data from

lashon-ha-kretan
(1,701 inscriptions, 60 Gordon readings, 2,871 Proto-Semitic roots).

Base directory:

~/.claude/skills/linear-a-decipherment

Scholarly Disclaimer

All readings are hypothetical. Linear A remains officially undeciphered. Gordon's Semitic hypothesis is one of several competing frameworks. Include this disclaimer on every analytical output.

Confidence Taxonomy

Every proposed reading must be tagged with a confidence level:

LevelCriteriaExample
CONFIRMEDIdeographic + phonetic + mathematical confirmationKU-NI-SU (emmer wheat)
PROBABLEDirect Gordon reading + external attestationDA-KU-SE-NE (Hurrian name at Nuzi)
CANDIDATEGordon reading or strong Proto-Semitic match (d < 0.3)New cognate from distance search
SPECULATIVEWeak phonetic match or single-source evidenceProto-Semitic match with d > 0.5

Reference File Protocol

Route questions to the right reference before answering:

Question about a specific reading or word?
  → Read references/gordon-lexicon.md
  → Run: uv run scripts/cognate_search.py "WORD"

Question about methodology or approach?
  → Read references/methodology.md

Question about sign values or the syllabary?
  → Read references/sign-values.md

Question about ML/computational approaches?
  → Read references/ml-approaches.md

Question about a specific inscription?
  → Run: uv run scripts/analyze.py single INSCRIPTION_NAME

Question about corpus statistics?
  → Run: uv run scripts/sign_analysis.py SUBCOMMAND

Data Dependencies

Source data from

lashon-ha-kretan
:

FilePathContents
Inscriptions
~/Desktop/Programming/lashon-ha-kretan/LinearAInscriptions.js
~1,701 GORILA inscriptions
Lexicon
~/Desktop/Programming/lashon-ha-kretan/semiticLexicon.js
60 Gordon + 3 YasharMana + 7 scholarly readings
Proto-Semitic
~/Desktop/Programming/lashon-ha-kretan/etymology/Semitic.json
2,871 roots

Extracted data cached in

data/
(generated by
corpus_extract.py --all
):

  • data/corpus.json
    — Structured inscriptions
  • data/gordon.json
    — Gordon + YasharMana lexicon
  • data/semitic_roots.json
    — Proto-Semitic roots
  • data/cognate_cache.json
    — Precomputed cognate scores (built by
    cognate_search.py --build-cache
    )

If

data/
files are missing, run extraction first:

uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --all

Workflows

1. Analyze a Single Inscription

Runs Gordon's 5-step pipeline on one inscription:

# Human-readable report
uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py single HT88

# JSON output
uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py single HT88 --format json

Steps performed: transliteration extraction, segmentation, consonantal skeleton for each word, cognate search (Gordon → YasharMana → Proto-Semitic cache), coverage summary.

2. Search Cognates for a Word

Find Semitic cognates for any Linear A transliteration:

# Full search with table output
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA"

# Skeleton extraction only
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --skeleton

# JSON with top 10 matches
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --top 10 --format json

# Skip cache for live Proto-Semitic search
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py "KI-RE-TA" --no-cache

Pipeline: transliteration → skeleton (k-r-t) → Gordon direct → YasharMana → Proto-Semitic distance.

3. Find Unknown Words (Discovery Mode)

Identify frequently-occurring words with no known reading—best targets for new cognate proposals:

# Top 20 unknown words appearing 3+ times
uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode unknowns

# More restrictive: top 10 appearing 5+ times
uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode unknowns --min-count 5 --top 10

4. Find Promising Inscriptions

Inscriptions with the highest ratio of identified words—best for study:

uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode promising --top 15

5. Compare Libation Formulas

Group inscriptions containing the libation formula (JA-SA-SA-RA-ME pattern):

# List all libation inscriptions
uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode libation

# With skeleton alignment
uv run ~/.claude/skills/linear-a-decipherment/scripts/analyze.py batch --mode libation --alignment

6. Corpus Statistics

Statistical analysis of sign patterns:

# Sign frequency (top 30)
uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py frequency

# Word frequency with hapax legomena count
uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py words

# Sign co-occurrence within words
uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py cooccurrence --signs KI,RO,SA

# Positional distribution (initial/medial/final)
uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py position

# Site distribution (HT, ZA, PK, etc.)
uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py distribution

# JSON output for any subcommand
uv run ~/.claude/skills/linear-a-decipherment/scripts/sign_analysis.py frequency --format json

7. Generate Training Data

Prepare JSONL for ML fine-tuning:

# Preview first 3 entries
uv run ~/.claude/skills/linear-a-decipherment/scripts/finetune_prep.py gordon-pairs --preview 3

# Generate full JSONL
uv run ~/.claude/skills/linear-a-decipherment/scripts/finetune_prep.py gordon-pairs --output data/gordon_pairs.jsonl

v1 produces 63 chat-format pairs (Gordon + YasharMana). See

references/ml-approaches.md
for v2 augmentation strategy.

8. Reverse Root Search (Semitic Root → Corpus Words)

Given a Semitic consonantal root, find all Linear A words in the corpus whose skeletons match:

# Find corpus words matching root KNS (e.g., kiništu "gathering place")
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse kns

# Broader search with higher distance tolerance
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse kns --max-dist 0.5 -n 30

# JSON output for programmatic use
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse thm --format json

# Search for Baal-related words (b-'-l root)
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse bl

# Search for "give" root (y-t-n)
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --reverse ytn

Pipeline: root consonants → weighted Levenshtein against all corpus word skeletons → ranked by distance, annotated with Gordon/YasharMana readings, occurrence counts, sites, and inscriptions.

9. Extract / Rebuild Corpus

Extract structured data from JS source files:

# Extract everything (inscriptions + lexicons + Proto-Semitic roots)
uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --all

# Inscriptions only, filtered by site
uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --site HT

# Include Gordon lexicon
uv run ~/.claude/skills/linear-a-decipherment/scripts/corpus_extract.py --with-gordon

# Build cognate cache (takes ~10 seconds)
uv run ~/.claude/skills/linear-a-decipherment/scripts/cognate_search.py --build-cache

Integration with Other Skills

SkillUsage
rlama
Create
gordon-dossiers
RAG collection from
~/Desktop/minoanmystery-astro/souls/minoan/dossiers/scholarly-sources/gordon/
ancient-near-east-research
Sefaria for Hebrew cognate verification, CDLI for Akkadian parallels
exa-search
Search recent computational decipherment papers
llama-cpp
Local inference with fine-tuned decipherment models (v2)

Architecture

~/.claude/skills/linear-a-decipherment/
├── SKILL.md                    # This file
├── lib/                        # Shared Python library
│   ├── __init__.py
│   ├── types.py                # Frozen dataclasses (Inscription, LexiconEntry, CognateMatch)
│   ├── js_parser.py            # JS Map → Python dict extraction
│   ├── normalization.py        # normalize(), lookup_in(), J/Y swap
│   ├── skeleton.py             # SIGN_DECOMPOSITION, extract_skeleton()
│   └── phonetics.py            # SEMITIC_DISTANCES, weighted_levenshtein()
├── scripts/
│   ├── corpus_extract.py       # JS → JSON extraction
│   ├── cognate_search.py       # Forward + reverse cognate search + cache builder
│   ├── sign_analysis.py        # Corpus-wide sign statistics
│   ├── analyze.py              # Gordon 5-step pipeline (single + batch)
│   └── finetune_prep.py        # ML training data generation
├── references/
│   ├── gordon-lexicon.md       # Complete 60+3+7 entry lexicon tables
│   ├── methodology.md          # Gordon's methods, 5-step pipeline
│   ├── sign-values.md          # Sign confidence levels (HIGH/MEDIUM/LOW)
│   └── ml-approaches.md        # Computational decipherment survey (v2)
└── data/                       # Generated (not committed)
    ├── corpus.json             # 1,701 inscriptions
    ├── gordon.json             # 60 Gordon + 3 YasharMana + 7 scholarly entries
    ├── semitic_roots.json      # 2,871 Proto-Semitic roots
    └── cognate_cache.json      # Precomputed cognate scores

All scripts use

uv run
with PEP 723 inline metadata. Dependencies: stdlib only.