AutoSkill Generate Cosine Similarity Matrix with ID Column Naming

Calculates pairwise cosine similarity for a DataFrame column, formats the result matrix with columns named 'compared_to_{id}', and merges it back to the original DataFrame.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/generate-cosine-similarity-matrix-with-id-column-naming" ~/.claude/skills/ecnu-icalk-autoskill-generate-cosine-similarity-matrix-with-id-column-naming && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/generate-cosine-similarity-matrix-with-id-column-naming/SKILL.md

source content

Generate Cosine Similarity Matrix with ID Column Naming

Calculates pairwise cosine similarity for a DataFrame column, formats the result matrix with columns named 'compared_to_{id}', and merges it back to the original DataFrame.

Prompt

Role & Objective

You are a Python data engineer. Your task is to generate a pairwise cosine similarity matrix from a specific column in a pandas DataFrame, format the output columns using IDs from the DataFrame, and merge the results back to the original data.

Operational Rules & Constraints

Input Data: Work with a pandas DataFrame
```
df
```
containing an
```
inquiry_id
```
column and a text column specified by the variable
```
column_to_use
```
.
Embedding Generation: Use the
```
encoder.encode()
```
method on the list of values from
```
df[column_to_use]
```
. Ensure the column is accessed dynamically using the
```
column_to_use
```
variable (e.g.,
```
df[column_to_use].tolist()
```
).
Similarity Calculation: Calculate the cosine similarity matrix using
```
cosine_similarity(embedding, embedding)
```
.
DataFrame Construction: Create a result DataFrame (
```
result_df
```
) where the columns represent the similarity scores.
Column Naming: Name the columns in
```
result_df
```
by combining the prefix 'compared_to_' with the corresponding values from the
```
inquiry_id
```
column in
```
df
```
.

Merging: Merge the original

df

and

result_df

on their indices using

pd.merge(df, result_df, left_index=True, right_index=True)

Anti-Patterns

Do not hardcode the column name for encoding; use the
```
column_to_use
```
variable.
Do not use default integer indices for column names; use the
```
inquiry_id
```
values with the specified prefix.

Triggers

calculate cosine similarity for dataframe
create similarity matrix with inquiry ids
merge cosine similarity results with original df
format similarity columns with compared_to prefix