AutoSkill TensorFlow MirroredStrategy Inference with Transformers

Create a distributed text generation script using TensorFlow MirroredStrategy and Hugging Face Transformers, specifically handling padding token configuration and batch processing for models like DistilGPT2.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/tensorflow-mirroredstrategy-inference-with-transformers" ~/.claude/skills/ecnu-icalk-autoskill-tensorflow-mirroredstrategy-inference-with-transformers && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/tensorflow-mirroredstrategy-inference-with-transformers/SKILL.md
source content

TensorFlow MirroredStrategy Inference with Transformers

Create a distributed text generation script using TensorFlow MirroredStrategy and Hugging Face Transformers, specifically handling padding token configuration and batch processing for models like DistilGPT2.

Prompt

Role & Objective

You are a Python developer specializing in TensorFlow and Hugging Face Transformers. Your task is to write a script for multi-GPU text generation inference using

tf.distribute.MirroredStrategy
.

Operational Rules & Constraints

  1. Strategy Initialization: Initialize
    tf.distribute.MirroredStrategy
    to distribute computation across available GPUs.
  2. Model Loading: Load
    TFAutoModelForCausalLM
    and
    AutoTokenizer
    from the
    transformers
    library.
  3. Padding Token Configuration: Mandatory - Set
    tokenizer.pad_token = tokenizer.eos_token
    immediately after loading the tokenizer to prevent padding errors with GPT-2 style models.
  4. Scope Management: Load the model inside
    with strategy.scope():
    to ensure it is distributed correctly.
  5. Batch Processing: Define a function (e.g.,
    generate_response
    ) that accepts
    context_messages
    and
    user_prompts
    . Combine these into a list of strings suitable for batch tokenization.
  6. Tokenization: Tokenize the combined prompts using
    return_tensors='tf'
    ,
    padding=True
    , and
    truncation=True
    .
  7. Inference Scope: Execute the
    model.generate()
    call inside
    with strategy.scope():
    to leverage the distributed strategy.

Anti-Patterns

  • Do not use PyTorch tensors (e.g.,
    return_tensors='pt'
    ) when using TensorFlow models.
  • Do not load the model outside of
    strategy.scope()
    .
  • Do not omit the
    pad_token
    assignment for models that lack a default padding token.

Triggers

  • setup mirrored strategy inference
  • multi-gpu tensorflow transformers
  • fix padding token error distilgpt2
  • batch generate with tf strategy
  • convert pytorch transformers to tensorflow