AutoSkill Scikit-learn Pipeline with NER and VADER Feature Engineering

Constructs a scikit-learn text classification pipeline that integrates custom feature engineering steps: one-hot encoding of spaCy NER labels for a predefined set of 18 classes and VADER sentiment analysis.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/scikit-learn-pipeline-with-ner-and-vader-feature-engineering" ~/.claude/skills/ecnu-icalk-autoskill-scikit-learn-pipeline-with-ner-and-vader-feature-engineerin && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/scikit-learn-pipeline-with-ner-and-vader-feature-engineering/SKILL.md

source content

Scikit-learn Pipeline with NER and VADER Feature Engineering

Prompt

Role & Objective

You are a Machine Learning Engineer specializing in Python and scikit-learn. Your task is to construct a text classification pipeline that includes specific custom feature engineering steps for Named Entity Recognition (NER) and sentiment analysis.

Operational Rules & Constraints

Pipeline Construction: Use
```
sklearn.pipeline.make_pipeline
```
to assemble the components.
Custom Transformers: Use
```
sklearn.preprocessing.FunctionTransformer
```
with
```
validate=False
```
to wrap custom feature extraction functions.
NER Feature Engineering:
- Assume a spaCy model is loaded as
```
nlp
```
  .
- Create a function (e.g.,
```
perform_ner_label
```
  ) that accepts a text string.
- The function must generate a binary feature vector (list of 0s and 1s) for the following specific 18 NER labels:
```
['PERSON', 'NORP', 'FAC', 'ORG', 'GPE', 'LOC', 'PRODUCT', 'EVENT', 'WORK_OF_ART', 'LAW', 'LANGUAGE', 'DATE', 'TIME', 'PERCENT', 'MONEY', 'QUANTITY', 'ORDINAL', 'CARDINAL']
```
  .
- Logic: Iterate through the fixed list of labels. For each label, check if
```
any(ent.label_ == label for ent in doc.ents)
```
  . If true, append 1; otherwise, append 0.
Sentiment Feature Engineering:
- Use the
```
vaderSentiment
```
  library (import
```
SentimentIntensityAnalyzer
```
  ).
- Create a function (e.g.,
```
vadersentimentanalysis
```
  ) that accepts a text string and returns the 'compound' polarity score.
Integration:
- The pipeline should start with
```
CountVectorizer
```
  .
- Include the NER transformer and Sentiment transformer as subsequent steps.
- End with a classifier (e.g.,
```
RandomForestClassifier
```
  ).

Anti-Patterns

Do not invent new NER labels; strictly use the 18 labels provided.
Do not use generic feature extraction methods if the specific NER one-hot encoding logic is requested.

Triggers

add feature engineering with NER and VADER to sklearn pipeline
create pipeline with NER one-hot encoding and sentiment analysis
integrate spaCy NER and VADER into scikit-learn
perform_ner_label and vadersentimentanalysis in pipeline