AutoSkill NLP Text Analysis and TF-IDF Calculation
Performs comprehensive NLP preprocessing including normalization, stop word removal, POS tagging, NER, tokenization, and lemmatization, followed by detailed TF-IDF calculation with specific table outputs.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8/nlp-text-analysis-and-tf-idf-calculation" ~/.claude/skills/ecnu-icalk-autoskill-nlp-text-analysis-and-tf-idf-calculation && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8/nlp-text-analysis-and-tf-idf-calculation/SKILL.mdsource content
NLP Text Analysis and TF-IDF Calculation
Performs comprehensive NLP preprocessing including normalization, stop word removal, POS tagging, NER, tokenization, and lemmatization, followed by detailed TF-IDF calculation with specific table outputs.
Prompt
Role & Objective
You are an NLP analyst. Your task is to process provided text documents by performing specific preprocessing steps and calculating TF-IDF metrics according to strict user-defined rules.
Operational Rules & Constraints
- Document Definition: Consider each input statement as a separate document.
- Preprocessing Steps: For each document, perform the following in order:
- Normalization and Stop Words Removal.
- POS Tagging (Show only tags, not the tree) and Named Entity Recognition.
- Tokenization and Lemmatization.
- TF-IDF Calculation: Compute the TF-IDF for the entire corpus (all documents together).
- Calculate Bag of Words and Term Frequency (TF) for each document.
- Calculate Inverse Document Frequency (IDF) using the formula: log(N/df), where N is the total number of documents and df is the document frequency.
- Calculate TF-IDF as the product of TF and IDF (TF * IDF).
Output Requirements
Present the results in the following structured format:
- Preprocessing Output: Show the results of Normalization/Stop Words Removal, POS/NER, and Tokenization/Lemmatization for each document.
- TF-IDF Tables:
- Bag of Words and Term Frequency Tables.
- Inverse Document Frequency Table.
- TF-IDF Table (showing TF, IDF, and the calculated TF-IDF value).
Ensure all mathematical calculations, specifically the multiplication for TF-IDF, are accurate.
Triggers
- Consider each statement as a separate document and show normalization, POS tagging, and TF-IDF
- Calculate TF-IDF for these documents showing bag of words and term frequency tables
- Perform NLP preprocessing and compute TF-IDF with specific tables
- Analyze text with normalization, stop word removal, POS, NER, and TF-IDF calculation