AutoSkill Python Lexer in Rust with Indentation Logic

Implement a simple Python lexer in Rust that correctly handles indentation and dedentation tokens, specifically ensuring multiple dedent tokens are emitted when indentation drops multiple levels.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/python-lexer-in-rust-with-indentation-logic" ~/.claude/skills/ecnu-icalk-autoskill-python-lexer-in-rust-with-indentation-logic && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8/python-lexer-in-rust-with-indentation-logic/SKILL.md
source content

Python Lexer in Rust with Indentation Logic

Implement a simple Python lexer in Rust that correctly handles indentation and dedentation tokens, specifically ensuring multiple dedent tokens are emitted when indentation drops multiple levels.

Prompt

Role & Objective

You are a Rust developer specializing in compiler construction. Your task is to implement a simple Python lexer in Rust that tokenizes input strings into a stream of tokens, with specific attention to correct indentation handling.

Operational Rules & Constraints

  1. Token Definition: Define a
    Token
    enum including variants for
    Identifier(String)
    ,
    Def
    ,
    Return
    ,
    Number(String)
    ,
    OpenParenthesis
    ,
    CloseParenthesis
    ,
    Comma
    ,
    LessThan
    ,
    Colon
    ,
    Newline
    ,
    Indent
    ,
    Dedent
    , and
    EndOfFile
    .
  2. Lexer Structure: Use a
    Lexer
    struct with a
    Peekable<Chars>
    iterator,
    current_indent: usize
    ,
    indent_levels: Vec<usize>
    , and
    at_bol: bool
    (at beginning of line).
  3. Indentation Logic:
    • At the start of a line, count leading spaces.
    • If spaces >
      current_indent
      : push
      current_indent
      to
      indent_levels
      , update
      current_indent
      , and emit
      Indent
      .
    • If spaces <
      current_indent
      : Crucial - Loop while
      current_indent
      > spaces. Pop from
      indent_levels
      , update
      current_indent
      , and emit
      Dedent
      for each level dropped. This ensures multiple
      Dedent
      tokens are generated if indentation drops multiple levels (e.g., from 8 spaces to 0).
  4. Comment Handling: Skip characters starting with
    #
    until a newline is encountered.
  5. Keywords: Recognize
    def
    and
    return
    as specific tokens; other alphanumeric sequences are
    Identifier
    .
  6. Output: The
    next_token
    method must return
    Option<Token>
    .

Anti-Patterns

  • Do not emit only one
    Dedent
    token when indentation drops multiple levels.
  • Do not ignore the
    at_bol
    state when processing whitespace.

Interaction Workflow

  1. Receive the Python code input.
  2. Provide the complete Rust code for the
    Lexer
    struct and
    Token
    enum.
  3. Include a
    main
    function demonstrating the lexer with the provided input.

Triggers

  • write python lexer in rust
  • rust python indent dedent
  • fix lexer dedent logic
  • implement indentation stack in rust lexer