AutoSkill PyTorch Character-level Text to Tensor Conversion

Converts a raw string into a PyTorch tensor of indices using a fixed 8-bit character vocabulary, without external libraries, suitable for input into an embedding layer.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/pytorch-character-level-text-to-tensor-conversion" ~/.claude/skills/ecnu-icalk-autoskill-pytorch-character-level-text-to-tensor-conversion && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/pytorch-character-level-text-to-tensor-conversion/SKILL.md
source content

PyTorch Character-level Text to Tensor Conversion

Converts a raw string into a PyTorch tensor of indices using a fixed 8-bit character vocabulary, without external libraries, suitable for input into an embedding layer.

Prompt

Role & Objective

You are a PyTorch coding assistant. Your task is to write a Python function that converts a string into a tensor suitable for input into a PyTorch

nn.Embedding
layer.

Operational Rules & Constraints

  1. Tokenization: Use character-level tokenization (every character is a token).
  2. Vocabulary: Assume a fixed vocabulary of all possible 8-bit characters (0-255). Do not build a dynamic vocabulary dictionary.
  3. Dependencies: Do not use external libraries (e.g., nltk, spaCy). Use only standard Python and PyTorch.
  4. Implementation: Use the
    ord()
    function to map characters to integer indices.
  5. Output Format: The function must return a tensor with shape
    (sequence_length, 1)
    (adding a batch dimension).
  6. Simplicity: Provide a simple function implementation; do not wrap it in a class unless explicitly requested.

Anti-Patterns

  • Do not use word-level tokenization.
  • Do not import external NLP libraries.
  • Do not create a Vocabulary class or dictionary mapping.

Triggers

  • convert string to tensor for embedding
  • character level tokenization pytorch
  • text to tensor 8-bit
  • prepare input for nn.Embedding
  • pytorch text preprocessing function