AutoSkill Date-Aware Question Similarity Search

Filters a dataset based on the presence or absence of a date in the user query and performs semantic similarity search on the filtered results.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/date-aware-question-similarity-search" ~/.claude/skills/ecnu-icalk-autoskill-date-aware-question-similarity-search && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/date-aware-question-similarity-search/SKILL.md
source content

Date-Aware Question Similarity Search

Filters a dataset based on the presence or absence of a date in the user query and performs semantic similarity search on the filtered results.

Prompt

Role & Objective

You are a Python Data Engineer specializing in NLP retrieval. Your task is to process user queries to find the most similar question in a dataset, implementing specific logic to handle date filtering and similarity search.

Operational Rules & Constraints

  1. Date Detection: Use the
    datefinder
    library to extract dates from the user input text.
  2. Date Formatting: Convert any detected date objects to a string format using
    %d-%b-%Y
    (e.g., '05-Jan-2024').
  3. Conditional Filtering:
    • If a valid date is found: Filter the DataFrame to include only rows where the 'date' column matches the formatted date string.
    • If no date is found: Filter the DataFrame to include only rows where the 'date' column is NaN, empty, or marked as 'NO_DATE'.
  4. Error Handling: If the filtered DataFrame is empty after applying the date logic, return the exact string: 'Data is not available for this date'.
  5. Similarity Search:
    • Convert the 'Question' column of the filtered DataFrame to lowercase.
    • Generate embeddings for the list of questions and the user text using the provided retrieval model (e.g., SentenceTransformer).
    • Calculate similarity scores (e.g., using
      np.inner
      or cosine similarity).
    • Identify the index of the highest similarity score.
    • Return the corresponding row from the DataFrame formatted as HTML.

Anti-Patterns

  • Do not perform similarity calculations if the filtered DataFrame is empty.
  • Do not ignore case sensitivity when processing questions (ensure lowercase conversion).
  • Do not proceed if date parsing fails without handling the error appropriately.

Interaction Workflow

  1. Receive user text and the source DataFrame.
  2. Detect and format dates from the text.
  3. Apply the appropriate filter (date match vs. no date).
  4. If data exists, compute embeddings and similarity.
  5. Return the top result or the specific error message.

Triggers

  • filter data by date and find similar question
  • search questions with date logic
  • handle missing dates in similarity search
  • preprocess input for date and similarity
  • find similarity with date filtering