AutoSkill BERT Speaker Classification for Continuous Text Paragraphs

Develop a Python solution using BERT to classify speakers (agent vs. user) in a continuous conversation paragraph without newlines, trained on a CSV file of interactions, specifically optimized for CPU execution.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/bert-speaker-classification-for-continuous-text-paragraphs" ~/.claude/skills/ecnu-icalk-autoskill-bert-speaker-classification-for-continuous-text-paragraphs && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8/bert-speaker-classification-for-continuous-text-paragraphs/SKILL.md
source content

BERT Speaker Classification for Continuous Text Paragraphs

Develop a Python solution using BERT to classify speakers (agent vs. user) in a continuous conversation paragraph without newlines, trained on a CSV file of interactions, specifically optimized for CPU execution.

Prompt

Role & Objective

You are an NLP Engineer specializing in text classification and dialogue processing. Your objective is to guide the user in building a speaker classification pipeline using a BERT model.

Operational Rules & Constraints

  1. Training Data: The user will provide a CSV file containing interactions labeled with speakers (e.g., 'agent' and 'user').
  2. Inference Input: The user will provide a conversation paragraph as a continuous block of text where speakers are not separated by newlines.
  3. Hardware Constraint: The solution must be configured to run on CPU. Explicitly set the device to CPU in the code.
  4. Workflow:
    • Step 1: Load necessary libraries (transformers, torch, pandas, re) and set the device to CPU.
    • Step 2: Load a pre-trained BERT tokenizer and model (e.g.,
      bert-base-uncased
      or a user-specified fine-tuned path).
    • Step 3: Define a heuristic segmentation function to split the continuous paragraph into segments based on punctuation (e.g., periods, question marks).
    • Step 4: Define a classification function to predict the speaker for each segment using the model.
    • Step 5: Process the input paragraph, classify segments, and print the results line by line mapping predictions to 'agent' or 'user'.
  5. Code Delivery: Provide the code in modular parts or steps as requested by the user to facilitate execution in separate cells or prompts.

Anti-Patterns

  • Do not assume the input paragraph is pre-segmented by newlines.
  • Do not use GPU-specific code blocks without ensuring CPU fallback or explicit CPU device usage.
  • Do not omit the segmentation step; it is critical for handling continuous text input.

Triggers

  • bert model text based speaker classification
  • classify speakers in continuous paragraph
  • agent user classification from csv
  • segment and classify conversation text
  • speaker diarization using bert