Gsd-skill-creator vocabulary-acquisition
Vocabulary learning strategies and retention science for any language -- frequency-based word selection, spaced repetition systems (Ebbinghaus forgetting curve, Leitner system, SM-2 algorithm), cognate exploitation, word family networks, context-based learning, collocations, depth of word knowledge (form, meaning, use), reading progression from controlled to authentic texts, and productive vs. receptive vocabulary thresholds. Use when building vocabulary plans, optimizing review schedules, selecting learning materials, or diagnosing vocabulary gaps.
git clone https://github.com/Tibsfox/gsd-skill-creator
T=$(mktemp -d) && git clone --depth=1 https://github.com/Tibsfox/gsd-skill-creator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/examples/skills/languages/vocabulary-acquisition" ~/.claude/skills/tibsfox-gsd-skill-creator-vocabulary-acquisition && rm -rf "$T"
examples/skills/languages/vocabulary-acquisition/SKILL.mdVocabulary Acquisition
Vocabulary is the single strongest predictor of reading comprehension and communicative competence in a second language. Without grammar, very little can be conveyed; without vocabulary, nothing can be conveyed. This skill covers the science and practice of vocabulary learning as a meta-skill applicable to any target language.
Agent affinity: krashen (comprehensible input, incidental acquisition), bruner-l (scaffolding, pedagogical sequencing)
Concept IDs: lang-high-frequency-words, lang-spaced-repetition, lang-cognates-word-families, lang-reading-progression
The Vocabulary Landscape
How Many Words Does a Learner Need?
Research by Nation (2001, 2006) establishes frequency-based thresholds:
| Words Known | Coverage of Running Text | Practical Ability |
|---|---|---|
| 1,000 word families | ~78-80% | Basic survival communication |
| 2,000 word families | ~85-90% | Simple conversation, graded readers |
| 3,000 word families | ~93-95% | Unassisted reading of simplified texts |
| 5,000 word families | ~97-98% | Independent reading of most authentic texts |
| 8,000-9,000 word families | ~98-99% | Near-native reading comprehension |
A word family includes a base word plus all its inflected and derived forms: "nation, national, nationality, nationalize, internationally" = one word family. The 2,000 most frequent word families in any language cover a disproportionate share of everyday text, making frequency-based learning the most efficient strategy.
Receptive vs. Productive Vocabulary
- Receptive (passive): Words you recognize and understand when reading or listening.
- Productive (active): Words you can retrieve and use correctly in speaking or writing.
Receptive vocabulary is always larger than productive vocabulary. A learner may recognize 5,000 word families but actively produce only 2,000. Teaching methods that target recognition (reading, listening) build receptive vocabulary faster; production practice (speaking, writing) converts receptive knowledge to productive use.
The Forgetting Curve and Spaced Repetition
Ebbinghaus's Discovery
Hermann Ebbinghaus (1885) demonstrated that memory decays exponentially after initial learning. Without review, approximately 60% of new material is forgotten within 24 hours, and 80% within a week. The critical insight: a single well-timed review dramatically flattens the forgetting curve.
The Spacing Effect
Reviewing information at increasing intervals produces stronger retention than massed practice (cramming). Each successful retrieval at a longer interval strengthens the memory trace.
Optimal review schedule (simplified):
- First review: 1 day after initial encounter
- Second review: 3 days after first review
- Third review: 7 days after second review
- Fourth review: 14 days after third review
- Subsequent reviews: double the interval each time
Spaced Repetition Systems
Leitner system (physical flashcards). Cards are sorted into boxes. Correct answers advance a card to the next box (longer review interval). Incorrect answers send the card back to Box 1 (immediate re-review). Simple and effective with physical materials.
SM-2 algorithm (Anki and similar). SuperMemo's algorithm computes per-item review intervals based on a difficulty rating (1-5) assigned at each review. Easy items get exponentially increasing intervals; difficult items are reviewed more frequently. Anki is the most widely used implementation.
Practical guidance:
- New words: learn 10-20 per day maximum. More than 20 creates review backlog.
- Review time: 15-30 minutes daily is more effective than 2 hours weekly.
- Encoding quality matters: a word learned through meaningful context retains better than one learned from a decontextualized list.
Cognates and Word Families
Cognate Exploitation
Cognates are words in two languages that share a common etymological origin and similar form: English "telephone" / French "telephone" / Spanish "telefono" / German "Telefon."
Cognate density by language pair:
- English-French: ~27% of common English words have recognizable French cognates (Norman conquest legacy)
- English-German: ~20% through shared Germanic roots
- Spanish-Italian: ~82% lexical similarity -- the highest among major Romance languages
- Japanese-Chinese: ~60% of Japanese kanji compounds have Chinese cognates (Sino-Japanese vocabulary)
False cognates (false friends): Words that look similar but differ in meaning. English "actually" vs. French "actuellement" (currently). English "embarrassed" vs. Spanish "embarazada" (pregnant). Systematic cataloging of false friends prevents interference.
Word Family Networks
Words are not isolated units but members of morphological families:
Root + affixes:
- "construct" -> "construction, constructive, constructively, reconstruct, deconstruct, constructivism"
- Learning one root + productive affixes gives access to the entire family
Semantic fields:
- "kitchen" -> "cook, stove, oven, fridge, counter, recipe, ingredient, chop, stir, boil"
- Teaching words in semantic clusters improves retrieval because related items share memory pathways
Collocations:
- Words that frequently co-occur: "make a decision" (not "do a decision"), "heavy rain" (not "strong rain")
- Collocation knowledge distinguishes fluent from merely accurate language use
- Concordance analysis of native corpora reveals collocational patterns
Depth of Word Knowledge
Knowing a word is not binary. Nation (2001) identifies eight aspects of word knowledge:
| Aspect | Receptive | Productive |
|---|---|---|
| Form: spoken | Recognize the word when heard | Pronounce the word correctly |
| Form: written | Recognize the word when read | Spell the word correctly |
| Form: word parts | Recognize root, prefix, suffix | Use word parts to build forms |
| Meaning: form-meaning link | Recall meaning from form | Recall form from meaning |
| Meaning: concepts | Understand the concept behind the word | Use the word to express the concept |
| Meaning: associations | Know related words | Produce appropriate related words |
| Use: grammar | Recognize grammatical patterns | Use in correct grammatical patterns |
| Use: collocations | Recognize natural word combinations | Produce natural combinations |
| Use: register | Know formality constraints | Use at appropriate formality level |
Most learners have shallow knowledge of many words (form + basic meaning) and deep knowledge of few. Deepening word knowledge -- learning collocations, register, and productive use -- is a distinct learning task from adding new words.
Incidental vs. Intentional Learning
Incidental Acquisition
Krashen's Input Hypothesis argues that most vocabulary is acquired incidentally through comprehensible input -- reading and listening to meaningful content where the learner encounters new words in context.
Conditions for successful incidental acquisition:
- The text must be comprehensible overall (95-98% known vocabulary)
- The unknown word must be encountered multiple times (6-20 encounters for initial acquisition)
- Context must provide enough clues to infer meaning
- The learner must notice the word (not skip over it)
Extensive reading is the most powerful vehicle for incidental vocabulary acquisition. Graded readers (controlled vocabulary texts at the learner's level) provide the right density of known-to-unknown words.
Intentional Learning
Deliberate study (flashcards, word lists, exercises) is more efficient per unit of time for initial acquisition but less effective for depth and retention than incidental learning in context.
Optimal strategy: Use intentional learning for the first 2,000 high-frequency word families (these need to be acquired quickly for basic comprehension), then shift to extensive reading for incidental acquisition of the next 3,000-6,000 families.
The Reading Progression
Level 1: Controlled Readers (0-1,000 words)
Texts written within a strict vocabulary limit. Every unknown word is glossed or repeated. Purpose: build basic sight vocabulary and decoding confidence.
Level 2: Graded Readers (1,000-2,000 words)
Simplified authentic stories or purpose-written narratives. Unknown word density: 2-5%. Purpose: extensive reading for pleasure and incidental vocabulary acquisition.
Level 3: Simplified Authentic Texts (2,000-3,000 words)
News articles, blog posts, children's literature adapted to reduce low-frequency vocabulary. Purpose: bridge from controlled to authentic input.
Level 4: Authentic Texts with Support (3,000-5,000 words)
Unmodified texts with dictionary access or glossary. Unknown word density: 3-5%. Purpose: develop strategies for handling unknown vocabulary in real texts.
Level 5: Unassisted Authentic Texts (5,000+ words)
Native-audience materials: novels, newspapers, academic texts. Unknown word density: 1-2%. Purpose: native-like reading experience.
Cross-References
- krashen agent: Input hypothesis (i+1), reading hypothesis, case studies of extensive reading programs.
- bruner-l agent: Scaffolding vocabulary learning through structured support that is gradually removed.
- lado agent: Contrastive vocabulary analysis -- which L1 words transfer and which create interference.
- crystal agent: Historical etymology and language change that explain cognate relationships.
- phonetics-phonology skill: Phonological form is the entry point for word learning -- without form, there is no handle for meaning.
- grammar-syntax skill: Grammatical knowledge constrains which words can appear in which positions.
- pragmatics-communication skill: Register and formality constrain vocabulary choice in context.
References
- Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge University Press.
- Nation, I. S. P. (2006). "How large a vocabulary is needed for reading and listening?" Canadian Modern Language Review, 63(1), 59-82.
- Ebbinghaus, H. (1885). Uber das Gedachtnis. (Translated as Memory: A Contribution to Experimental Psychology.)
- Schmitt, N. (2010). Researching Vocabulary: A Vocabulary Research Manual. Palgrave Macmillan.
- Webb, S. & Nation, I. S. P. (2017). How Vocabulary Is Learned. Oxford University Press.
- Krashen, S. D. (2004). The Power of Reading. 2nd edition. Libraries Unlimited.