AutoSkill Python Lexer in Rust with Indentation Handling

Implement a simple Python lexer in Rust that tokenizes a subset of Python syntax, specifically handling indentation and dedentation logic using a stack to ensure correct block structure.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/python-lexer-in-rust-with-indentation-handling" ~/.claude/skills/ecnu-icalk-autoskill-python-lexer-in-rust-with-indentation-handling && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/python-lexer-in-rust-with-indentation-handling/SKILL.md

source content

Python Lexer in Rust with Indentation Handling

Implement a simple Python lexer in Rust that tokenizes a subset of Python syntax, specifically handling indentation and dedentation logic using a stack to ensure correct block structure.

Prompt

Role & Objective

You are a Rust developer tasked with writing a simple lexer for the Python language. The lexer must tokenize a string input into a stream of tokens, specifically handling Python's significant whitespace rules for indentation and dedentation.

Operational Rules & Constraints

Language: Use Rust.

Token Definition: Define an enum

Token

with variants for

Identifier(String)

Def

Return

Number(String)

OpenParenthesis

CloseParenthesis

Comma

LessThan

Colon

Newline

Indent

Dedent

, and

EndOfFile

Lexer Structure: Use a struct

Lexer<'a>

containing a

Peekable<Chars<'a>>

current_indent: usize

indent_levels: Vec<usize>

, and

at_bol: bool

(At Beginning Of Line).

Indentation Logic:
- At the beginning of a line (
```
at_bol
```
  is true), count the leading spaces.
- If the count is greater than
```
current_indent
```
  , push
```
current_indent
```
  to
```
indent_levels
```
  , update
```
current_indent
```
  , and emit an
```
Indent
```
  token.
- If the count is less than
```
current_indent
```
  , you must emit
```
Dedent
```
  tokens. Crucially, loop through the
```
indent_levels
```
  stack, popping values and updating
```
current_indent
```
  , emitting a
```
Dedent
```
  token for each level closed until
```
current_indent
```
  matches the new line's indentation. Do not stop after just one dedent if the indentation drop spans multiple levels.
Tokenization Rules:
- Skip comments starting with
```
#
```
  until a newline.
- Recognize keywords
```
def
```
  and
```
return
```
  as specific tokens, not generic identifiers.
- Recognize basic punctuation:
```
(
```
  ,
```
)
```
  ,
```
,
```
  ,
```
<
```
  ,
```
:
```
  .
- Recognize alphanumeric sequences as identifiers.
- Recognize digits as numbers.
EOF Handling: At the end of the input, ensure any remaining indentation levels on the stack are closed by emitting the appropriate number of
```
Dedent
```
tokens.

Anti-Patterns

Do not assume indentation always changes by exactly 4 spaces; handle arbitrary space counts.
Do not emit only one
```
Dedent
```
token when the indentation drops multiple levels (e.g., from 8 spaces to 0 spaces requires two dedents).
Do not treat
```
def
```
or
```
return
```
as generic identifiers.

Triggers

write simple python lexer in rust
rust python indentation handling
handle indent and dedent tokens in rust
python tokenizer with dedent logic