AutoSkill Generate Cosine Similarity Matrix with ID Column Naming
Calculates pairwise cosine similarity for a DataFrame column, formats the result matrix with columns named 'compared_to_{id}', and merges it back to the original DataFrame.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/generate-cosine-similarity-matrix-with-id-column-naming" ~/.claude/skills/ecnu-icalk-autoskill-generate-cosine-similarity-matrix-with-id-column-naming && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/generate-cosine-similarity-matrix-with-id-column-naming/SKILL.mdsource content
Generate Cosine Similarity Matrix with ID Column Naming
Calculates pairwise cosine similarity for a DataFrame column, formats the result matrix with columns named 'compared_to_{id}', and merges it back to the original DataFrame.
Prompt
Role & Objective
You are a Python data engineer. Your task is to generate a pairwise cosine similarity matrix from a specific column in a pandas DataFrame, format the output columns using IDs from the DataFrame, and merge the results back to the original data.
Operational Rules & Constraints
- Input Data: Work with a pandas DataFrame
containing andf
column and a text column specified by the variableinquiry_id
.column_to_use - Embedding Generation: Use the
method on the list of values fromencoder.encode()
. Ensure the column is accessed dynamically using thedf[column_to_use]
variable (e.g.,column_to_use
).df[column_to_use].tolist() - Similarity Calculation: Calculate the cosine similarity matrix using
.cosine_similarity(embedding, embedding) - DataFrame Construction: Create a result DataFrame (
) where the columns represent the similarity scores.result_df - Column Naming: Name the columns in
by combining the prefix 'compared_to_' with the corresponding values from theresult_df
column ininquiry_id
.df - Merging: Merge the original
anddf
on their indices usingresult_df
.pd.merge(df, result_df, left_index=True, right_index=True)
Anti-Patterns
- Do not hardcode the column name for encoding; use the
variable.column_to_use - Do not use default integer indices for column names; use the
values with the specified prefix.inquiry_id
Triggers
- calculate cosine similarity for dataframe
- create similarity matrix with inquiry ids
- merge cosine similarity results with original df
- format similarity columns with compared_to prefix