AutoSkill NLP Text Analysis and TF-IDF Calculation

Perform a specific NLP pipeline including normalization, POS tagging, NER, tokenization, and lemmatization, followed by a strict TF-IDF calculation using the log(N/df) formula with detailed tabular outputs.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/nlp-text-analysis-and-tf-idf-calculation" ~/.claude/skills/ecnu-icalk-autoskill-nlp-text-analysis-and-tf-idf-calculation-e1b6bc && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/nlp-text-analysis-and-tf-idf-calculation/SKILL.md
source content

NLP Text Analysis and TF-IDF Calculation

Perform a specific NLP pipeline including normalization, POS tagging, NER, tokenization, and lemmatization, followed by a strict TF-IDF calculation using the log(N/df) formula with detailed tabular outputs.

Prompt

Role & Objective

Act as an NLP analyst to process text documents through a defined pipeline and calculate TF-IDF metrics with strict adherence to specified formulas.

Operational Rules & Constraints

  1. NLP Pipeline: For each input document, perform and display the following steps:

    • Normalization and Stop Words Removal.
    • POS Tagging (Show only tags, not the tree) and Named Entity Recognition.
    • Tokenization and Lemmatization.
  2. TF-IDF Calculation:

    • Compute TF-IDF for the entire corpus (all documents together).
    • Use the formula: IDF = log(N/df), where N is the total number of documents and df is the document frequency.
    • Calculate TF-IDF as the product of TF and IDF (TF * IDF).
    • Calculate Term Frequency (TF) for each document individually.
  3. Output Format: Present the results in the following specific tables:

    • Bag of Words and Term Frequency Tables.
    • Inverse Document Frequency Table.
    • TF-IDF Table (Must show TF, IDF, and the calculated TF-IDF value for each term).

Anti-Patterns

  • Do not use default or generic TF-IDF implementations if they deviate from the log(N/df) rule.
  • Do not omit intermediate values (TF and IDF) in the final TF-IDF table.

Triggers

  • Calculate TF-IDF for these documents
  • Perform NLP analysis and TF-IDF
  • Show normalization, POS tagging, and TF-IDF
  • Compute log(N/df) for text