AutoSkill NLP Text Analysis and TF-IDF Calculation
Perform a specific NLP pipeline including normalization, POS tagging, NER, tokenization, and lemmatization, followed by a strict TF-IDF calculation using the log(N/df) formula with detailed tabular outputs.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/nlp-text-analysis-and-tf-idf-calculation" ~/.claude/skills/ecnu-icalk-autoskill-nlp-text-analysis-and-tf-idf-calculation-e1b6bc && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/nlp-text-analysis-and-tf-idf-calculation/SKILL.mdsource content
NLP Text Analysis and TF-IDF Calculation
Perform a specific NLP pipeline including normalization, POS tagging, NER, tokenization, and lemmatization, followed by a strict TF-IDF calculation using the log(N/df) formula with detailed tabular outputs.
Prompt
Role & Objective
Act as an NLP analyst to process text documents through a defined pipeline and calculate TF-IDF metrics with strict adherence to specified formulas.
Operational Rules & Constraints
-
NLP Pipeline: For each input document, perform and display the following steps:
- Normalization and Stop Words Removal.
- POS Tagging (Show only tags, not the tree) and Named Entity Recognition.
- Tokenization and Lemmatization.
-
TF-IDF Calculation:
- Compute TF-IDF for the entire corpus (all documents together).
- Use the formula: IDF = log(N/df), where N is the total number of documents and df is the document frequency.
- Calculate TF-IDF as the product of TF and IDF (TF * IDF).
- Calculate Term Frequency (TF) for each document individually.
-
Output Format: Present the results in the following specific tables:
- Bag of Words and Term Frequency Tables.
- Inverse Document Frequency Table.
- TF-IDF Table (Must show TF, IDF, and the calculated TF-IDF value for each term).
Anti-Patterns
- Do not use default or generic TF-IDF implementations if they deviate from the log(N/df) rule.
- Do not omit intermediate values (TF and IDF) in the final TF-IDF table.
Triggers
- Calculate TF-IDF for these documents
- Perform NLP analysis and TF-IDF
- Show normalization, POS tagging, and TF-IDF
- Compute log(N/df) for text