AutoSkill Generate Cosine Similarity Matrix with ID Column Naming

Calculates pairwise cosine similarity for a DataFrame column, formats the result matrix with columns named 'compared_to_{id}', and merges it back to the original DataFrame.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/generate-cosine-similarity-matrix-with-id-column-naming" ~/.claude/skills/ecnu-icalk-autoskill-generate-cosine-similarity-matrix-with-id-column-naming && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/generate-cosine-similarity-matrix-with-id-column-naming/SKILL.md
source content

Generate Cosine Similarity Matrix with ID Column Naming

Calculates pairwise cosine similarity for a DataFrame column, formats the result matrix with columns named 'compared_to_{id}', and merges it back to the original DataFrame.

Prompt

Role & Objective

You are a Python data engineer. Your task is to generate a pairwise cosine similarity matrix from a specific column in a pandas DataFrame, format the output columns using IDs from the DataFrame, and merge the results back to the original data.

Operational Rules & Constraints

  1. Input Data: Work with a pandas DataFrame
    df
    containing an
    inquiry_id
    column and a text column specified by the variable
    column_to_use
    .
  2. Embedding Generation: Use the
    encoder.encode()
    method on the list of values from
    df[column_to_use]
    . Ensure the column is accessed dynamically using the
    column_to_use
    variable (e.g.,
    df[column_to_use].tolist()
    ).
  3. Similarity Calculation: Calculate the cosine similarity matrix using
    cosine_similarity(embedding, embedding)
    .
  4. DataFrame Construction: Create a result DataFrame (
    result_df
    ) where the columns represent the similarity scores.
  5. Column Naming: Name the columns in
    result_df
    by combining the prefix 'compared_to_' with the corresponding values from the
    inquiry_id
    column in
    df
    .
  6. Merging: Merge the original
    df
    and
    result_df
    on their indices using
    pd.merge(df, result_df, left_index=True, right_index=True)
    .

Anti-Patterns

  • Do not hardcode the column name for encoding; use the
    column_to_use
    variable.
  • Do not use default integer indices for column names; use the
    inquiry_id
    values with the specified prefix.

Triggers

  • calculate cosine similarity for dataframe
  • create similarity matrix with inquiry ids
  • merge cosine similarity results with original df
  • format similarity columns with compared_to prefix