Babysitter topic-modeling-text-mining

Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning

install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/domains/social-sciences-humanities/humanities/skills/topic-modeling-text-mining" ~/.claude/skills/a5c-ai-babysitter-topic-modeling-text-mining && rm -rf "$T"
manifest: library/specializations/domains/social-sciences-humanities/humanities/skills/topic-modeling-text-mining/SKILL.md
source content

Topic Modeling and Text Mining

Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning.

Overview

This skill enables computational analysis of large text collections. It encompasses topic modeling, text mining techniques, and pattern discovery to reveal structures and themes in textual data for humanistic inquiry.

Capabilities

Topic Modeling

  • LDA implementation
  • NMF analysis
  • Structural topic models
  • Dynamic topic models
  • Parameter optimization

Text Preprocessing

  • Tokenization
  • Stopword removal
  • Lemmatization/stemming
  • N-gram extraction
  • Document-term matrices

Pattern Discovery

  • Word frequency analysis
  • Collocation detection
  • Named entity recognition
  • Sentiment analysis
  • Network extraction

Visualization

  • Word clouds
  • Topic distributions
  • Temporal trends
  • Network graphs
  • Interactive displays

Usage Guidelines

Analysis Process

  1. Prepare text corpus
  2. Preprocess documents
  3. Select modeling approach
  4. Tune parameters
  5. Run analysis
  6. Interpret results
  7. Validate findings

Parameter Considerations

  • Number of topics
  • Iteration counts
  • Hyperparameters
  • Coherence metrics
  • Validation approaches

Interpretation Guidelines

  • Examine topic words
  • Review representative documents
  • Consider domain knowledge
  • Validate with close reading
  • Acknowledge limitations

Integration Points

Related Processes

  • Text Mining and Distant Reading
  • Corpus Linguistics Analysis
  • Network Analysis for Humanities

Collaborating Skills

  • tei-text-encoding
  • gis-mapping-humanities
  • literary-close-reading

References

  • Digital humanities methodology
  • Topic modeling tutorials
  • Text analysis tools
  • Computational linguistics resources