AutoSkill Python Lexer in Rust with Indentation Handling
Implement a simple Python lexer in Rust that tokenizes a subset of Python syntax, specifically handling indentation and dedentation logic using a stack to ensure correct block structure.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/python-lexer-in-rust-with-indentation-handling" ~/.claude/skills/ecnu-icalk-autoskill-python-lexer-in-rust-with-indentation-handling && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/python-lexer-in-rust-with-indentation-handling/SKILL.mdsource content
Python Lexer in Rust with Indentation Handling
Implement a simple Python lexer in Rust that tokenizes a subset of Python syntax, specifically handling indentation and dedentation logic using a stack to ensure correct block structure.
Prompt
Role & Objective
You are a Rust developer tasked with writing a simple lexer for the Python language. The lexer must tokenize a string input into a stream of tokens, specifically handling Python's significant whitespace rules for indentation and dedentation.
Operational Rules & Constraints
- Language: Use Rust.
- Token Definition: Define an enum
with variants forToken
,Identifier(String)
,Def
,Return
,Number(String)
,OpenParenthesis
,CloseParenthesis
,Comma
,LessThan
,Colon
,Newline
,Indent
, andDedent
.EndOfFile - Lexer Structure: Use a struct
containing aLexer<'a>
,Peekable<Chars<'a>>
,current_indent: usize
, andindent_levels: Vec<usize>
(At Beginning Of Line).at_bol: bool - Indentation Logic:
- At the beginning of a line (
is true), count the leading spaces.at_bol - If the count is greater than
, pushcurrent_indent
tocurrent_indent
, updateindent_levels
, and emit ancurrent_indent
token.Indent - If the count is less than
, you must emitcurrent_indent
tokens. Crucially, loop through theDedent
stack, popping values and updatingindent_levels
, emitting acurrent_indent
token for each level closed untilDedent
matches the new line's indentation. Do not stop after just one dedent if the indentation drop spans multiple levels.current_indent
- At the beginning of a line (
- Tokenization Rules:
- Skip comments starting with
until a newline.# - Recognize keywords
anddef
as specific tokens, not generic identifiers.return - Recognize basic punctuation:
,(
,)
,,
,<
.: - Recognize alphanumeric sequences as identifiers.
- Recognize digits as numbers.
- Skip comments starting with
- EOF Handling: At the end of the input, ensure any remaining indentation levels on the stack are closed by emitting the appropriate number of
tokens.Dedent
Anti-Patterns
- Do not assume indentation always changes by exactly 4 spaces; handle arbitrary space counts.
- Do not emit only one
token when the indentation drops multiple levels (e.g., from 8 spaces to 0 spaces requires two dedents).Dedent - Do not treat
ordef
as generic identifiers.return
Triggers
- write simple python lexer in rust
- rust python indentation handling
- handle indent and dedent tokens in rust
- python tokenizer with dedent logic