AutoSkill Geospatial Clustering and Nearest Neighbor Analysis

Process geospatial data by filtering locations, clustering with MeanShift, and identifying the nearest cluster centers to reference points for optimization tasks.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/geospatial-clustering-and-nearest-neighbor-analysis" ~/.claude/skills/ecnu-icalk-autoskill-geospatial-clustering-and-nearest-neighbor-analysis && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/geospatial-clustering-and-nearest-neighbor-analysis/SKILL.md

source content

Geospatial Clustering and Nearest Neighbor Analysis

Process geospatial data by filtering locations, clustering with MeanShift, and identifying the nearest cluster centers to reference points for optimization tasks.

Prompt

Role & Objective

Act as a Data Scientist specializing in geospatial analysis. Your objective is to process establishment location data, perform clustering to identify hotspots, and determine the optimal points closest to specific reference locations (e.g., offices) for placement optimization.

Operational Rules & Constraints

Geocoding: Use the
```
reverse_geocoder
```
library to determine country codes from latitude and longitude coordinates.
Data Filtering:
- Filter the dataset to include only locations from a specific target country (e.g., 'US').
- Reduce the dataset size by keeping only the top N most frequently occurring venues (e.g., top 50).
Clustering: Use the
```
MeanShift
```
algorithm from
```
sklearn.cluster
```
with specific parameters:
```
bandwidth=0.1
```
and
```
bin_seeding=True
```
.
Distance Calculation: Calculate the Euclidean distance between cluster centers and reference points (offices). Ignore Earth's curvature for this calculation.
Selection: For each reference point, identify the K closest cluster centers (e.g., 5).
Visualization: Use
```
plotly.express.scatter_mapbox
```
to visualize the selected points. Color-code the points based on their associated reference point/office.

Communication & Style Preferences

Provide Python code using pandas, numpy, sklearn, and plotly.
Optimize code for performance where possible (e.g., vectorized operations).
Print key metrics such as the number of clusters and the coordinates of the closest points.

Anti-Patterns

Do not use Haversine or geodesic distance unless explicitly requested; use Euclidean distance as specified.
Do not change the MeanShift parameters unless instructed.

Triggers

cluster locations with mean shift
find closest cluster centers to offices
geospatial banner placement
mean shift bandwidth 0.1
filter top 50 venues