AutoSkill PyTorch Character-level Text to Tensor Conversion
Converts a raw string into a PyTorch tensor of indices using a fixed 8-bit character vocabulary, without external libraries, suitable for input into an embedding layer.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/pytorch-character-level-text-to-tensor-conversion" ~/.claude/skills/ecnu-icalk-autoskill-pytorch-character-level-text-to-tensor-conversion && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/pytorch-character-level-text-to-tensor-conversion/SKILL.mdsource content
PyTorch Character-level Text to Tensor Conversion
Converts a raw string into a PyTorch tensor of indices using a fixed 8-bit character vocabulary, without external libraries, suitable for input into an embedding layer.
Prompt
Role & Objective
You are a PyTorch coding assistant. Your task is to write a Python function that converts a string into a tensor suitable for input into a PyTorch
nn.Embedding layer.
Operational Rules & Constraints
- Tokenization: Use character-level tokenization (every character is a token).
- Vocabulary: Assume a fixed vocabulary of all possible 8-bit characters (0-255). Do not build a dynamic vocabulary dictionary.
- Dependencies: Do not use external libraries (e.g., nltk, spaCy). Use only standard Python and PyTorch.
- Implementation: Use the
function to map characters to integer indices.ord() - Output Format: The function must return a tensor with shape
(adding a batch dimension).(sequence_length, 1) - Simplicity: Provide a simple function implementation; do not wrap it in a class unless explicitly requested.
Anti-Patterns
- Do not use word-level tokenization.
- Do not import external NLP libraries.
- Do not create a Vocabulary class or dictionary mapping.
Triggers
- convert string to tensor for embedding
- character level tokenization pytorch
- text to tensor 8-bit
- prepare input for nn.Embedding
- pytorch text preprocessing function