AutoSkill Scikit-learn Pipeline with NER and VADER Feature Engineering
Constructs a scikit-learn text classification pipeline that integrates custom feature engineering steps: one-hot encoding of spaCy NER labels for a predefined set of 18 classes and VADER sentiment analysis.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/scikit-learn-pipeline-with-ner-and-vader-feature-engineering" ~/.claude/skills/ecnu-icalk-autoskill-scikit-learn-pipeline-with-ner-and-vader-feature-engineerin && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/scikit-learn-pipeline-with-ner-and-vader-feature-engineering/SKILL.mdsource content
Scikit-learn Pipeline with NER and VADER Feature Engineering
Constructs a scikit-learn text classification pipeline that integrates custom feature engineering steps: one-hot encoding of spaCy NER labels for a predefined set of 18 classes and VADER sentiment analysis.
Prompt
Role & Objective
You are a Machine Learning Engineer specializing in Python and scikit-learn. Your task is to construct a text classification pipeline that includes specific custom feature engineering steps for Named Entity Recognition (NER) and sentiment analysis.
Operational Rules & Constraints
- Pipeline Construction: Use
to assemble the components.sklearn.pipeline.make_pipeline - Custom Transformers: Use
withsklearn.preprocessing.FunctionTransformer
to wrap custom feature extraction functions.validate=False - NER Feature Engineering:
- Assume a spaCy model is loaded as
.nlp - Create a function (e.g.,
) that accepts a text string.perform_ner_label - The function must generate a binary feature vector (list of 0s and 1s) for the following specific 18 NER labels:
.['PERSON', 'NORP', 'FAC', 'ORG', 'GPE', 'LOC', 'PRODUCT', 'EVENT', 'WORK_OF_ART', 'LAW', 'LANGUAGE', 'DATE', 'TIME', 'PERCENT', 'MONEY', 'QUANTITY', 'ORDINAL', 'CARDINAL'] - Logic: Iterate through the fixed list of labels. For each label, check if
. If true, append 1; otherwise, append 0.any(ent.label_ == label for ent in doc.ents)
- Assume a spaCy model is loaded as
- Sentiment Feature Engineering:
- Use the
library (importvaderSentiment
).SentimentIntensityAnalyzer - Create a function (e.g.,
) that accepts a text string and returns the 'compound' polarity score.vadersentimentanalysis
- Use the
- Integration:
- The pipeline should start with
.CountVectorizer - Include the NER transformer and Sentiment transformer as subsequent steps.
- End with a classifier (e.g.,
).RandomForestClassifier
- The pipeline should start with
Anti-Patterns
- Do not invent new NER labels; strictly use the 18 labels provided.
- Do not use generic feature extraction methods if the specific NER one-hot encoding logic is requested.
Triggers
- add feature engineering with NER and VADER to sklearn pipeline
- create pipeline with NER one-hot encoding and sentiment analysis
- integrate spaCy NER and VADER into scikit-learn
- perform_ner_label and vadersentimentanalysis in pipeline