Claude-skill-registry geospatial-data-pipeline
Process, analyze, and visualize geospatial data at scale. Handles drone imagery, GPS tracks, GeoJSON optimization, coordinate transformations, and tile generation. Use for mapping apps, drone data processing, location-based services. Activate on "geospatial", "GIS", "PostGIS", "GeoJSON", "map tiles", "coordinate systems". NOT for simple address validation, basic distance calculations, or static map embeds.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/geospatial-data-pipeline" ~/.claude/skills/majiayu000-claude-skill-registry-geospatial-data-pipeline && rm -rf "$T"
skills/data/geospatial-data-pipeline/SKILL.mdGeospatial Data Pipeline
Expert in processing, optimizing, and visualizing geospatial data at scale.
When to Use
✅ Use for:
- Drone imagery processing and annotation
- GPS track analysis and visualization
- Location-based search (find nearby X)
- Map tile generation for web/mobile
- Coordinate system transformations
- Geofencing and spatial queries
- GeoJSON optimization for web
❌ NOT for:
- Simple address validation (use address APIs)
- Basic distance calculations (use Haversine formula)
- Static map embeds (use Mapbox Static API)
- Geocoding (use Nominatim or Google Geocoding API)
Technology Selection
Database: PostGIS vs MongoDB Geospatial
| Feature | PostGIS | MongoDB |
|---|---|---|
| Spatial indexes | GiST, SP-GiST | 2dsphere |
| Query language | SQL + spatial functions | Aggregation pipeline |
| Geometry types | 20+ (full OGC support) | Basic (Point, Line, Polygon) |
| Coordinate systems | 6000+ via EPSG | WGS84 only |
| Performance (10M points) | <100ms | <200ms |
| Best for | Complex spatial analysis | Document-centric apps |
Timeline:
- 2005: PostGIS 1.0 released
- 2012: MongoDB adds geospatial indexes
- 2020: PostGIS 3.0 with improved performance
- 2024: PostGIS remains gold standard for GIS workloads
Common Anti-Patterns
Anti-Pattern 1: Storing Coordinates as Strings
Novice thinking: "I'll just store lat/lon as text, it's simple"
Problem: Can't use spatial indexes, queries are slow, no validation.
Wrong approach:
// ❌ String storage, no spatial features interface Location { id: string; name: string; latitude: string; // "37.7749" longitude: string; // "-122.4194" } // Linear scan for "nearby" queries async function findNearby(lat: string, lon: string): Promise<Location[]> { const all = await db.locations.findAll(); return all.filter(loc => { const distance = calculateDistance( parseFloat(lat), parseFloat(lon), parseFloat(loc.latitude), parseFloat(loc.longitude) ); return distance < 5000; // 5km }); }
Why wrong: O(N) linear scan, no spatial index, string parsing overhead.
Correct approach:
// ✅ PostGIS GEOGRAPHY type with spatial index CREATE TABLE locations ( id SERIAL PRIMARY KEY, name VARCHAR(255), location GEOGRAPHY(POINT, 4326) -- WGS84 coordinates ); -- Spatial index (GiST) CREATE INDEX idx_locations_geography ON locations USING GIST(location); -- TypeScript query async function findNearby(lat: number, lon: number, radiusMeters: number): Promise<Location[]> { const query = ` SELECT id, name, ST_AsGeoJSON(location) as geojson FROM locations WHERE ST_DWithin( location, ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography, $3 ) ORDER BY location <-> ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography LIMIT 100 `; return db.query(query, [lon, lat, radiusMeters]); // <10ms with index }
Timeline context:
- 2000s: Stored lat/lon as FLOAT columns, did math in app code
- 2010s: PostGIS adoption, spatial indexes
- 2024:
type handles Earth curvature automaticallyGEOGRAPHY
Anti-Pattern 2: Not Using Spatial Indexes
Problem: Proximity queries do full table scans.
Wrong approach:
-- ❌ No index, sequential scan CREATE TABLE drone_images ( id SERIAL PRIMARY KEY, image_url VARCHAR(255), location GEOGRAPHY(POINT, 4326) ); -- This query scans ALL rows SELECT * FROM drone_images WHERE ST_DWithin( location, ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography, 1000 -- 1km );
EXPLAIN output:
Seq Scan on drone_images (cost=0.00..1234.56 rows=1 width=123)
Correct approach:
-- ✅ GiST index for spatial queries CREATE INDEX idx_drone_images_location ON drone_images USING GIST(location); -- Same query, now uses index SELECT * FROM drone_images WHERE ST_DWithin( location, ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography, 1000 );
EXPLAIN output:
Bitmap Index Scan on idx_drone_images_location (cost=4.30..78.30 rows=50 width=123)
Performance impact: 10M points, 5km radius query
- Without index: 3.2 seconds (full scan)
- With GiST index: 12ms (99.6% faster)
Anti-Pattern 3: Mixing Coordinate Systems
Novice thinking: "Coordinates are just numbers, I can mix them"
Problem: Incorrect distances, misaligned map features.
Wrong approach:
// ❌ Mixing EPSG:4326 (WGS84) and EPSG:3857 (Web Mercator) const userLocation = { lat: 37.7749, // WGS84 lon: -122.4194 }; const droneImage = { x: -13634876, // Web Mercator (EPSG:3857) y: 4545684 }; // Comparing apples to oranges! const distance = Math.sqrt( Math.pow(userLocation.lon - droneImage.x, 2) + Math.pow(userLocation.lat - droneImage.y, 2) );
Result: Wildly incorrect distance (millions of "units").
Correct approach:
-- ✅ Transform to common coordinate system SELECT ST_Distance( ST_Transform( ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326), -- WGS84 3857 -- Transform to Web Mercator ), ST_SetSRID(ST_MakePoint(-13634876, 4545684), 3857) -- Already Web Mercator ) AS distance_meters;
Or better: Always store in one system (WGS84), transform on display only.
Timeline:
- 2005: Web Mercator (EPSG:3857) introduced by Google Maps
- 2010: Confusion peaks as apps mix WGS84 data with Web Mercator tiles
- 2024: Best practice: Store WGS84, transform to 3857 only for tile rendering
Anti-Pattern 4: Loading Huge GeoJSON Files
Problem: 50MB GeoJSON file crashes browser.
Wrong approach:
// ❌ Load entire file into memory const geoJson = await fetch('/drone-survey-data.geojson').then(r => r.json()); // 50MB of GeoJSON = browser freeze map.addSource('drone-data', { type: 'geojson', data: geoJson // All 10,000 polygons loaded at once });
Correct approach 1: Vector tiles (pre-chunked)
// ✅ Serve as vector tiles (MBTiles or PMTiles) map.addSource('drone-data', { type: 'vector', tiles: ['https://api.example.com/tiles/{z}/{x}/{y}.pbf'], minzoom: 10, maxzoom: 18 }); // Browser only loads visible tiles
Correct approach 2: GeoJSON simplification + chunking
# Simplify geometry (reduce points) npm install -g @mapbox/geojson-precision geojson-precision -p 5 input.geojson output.geojson # Split into tiles npm install -g geojson-vt # Generate tiles programmatically (see scripts/tile_generator.ts)
Correct approach 3: Server-side filtering
// ✅ Only fetch visible bounds async function fetchVisibleFeatures(bounds: Bounds): Promise<GeoJSON> { const response = await fetch( `/api/features?bbox=${bounds.west},${bounds.south},${bounds.east},${bounds.north}` ); return response.json(); } map.on('moveend', async () => { const bounds = map.getBounds(); const geojson = await fetchVisibleFeatures(bounds); map.getSource('dynamic-data').setData(geojson); });
Anti-Pattern 5: Euclidean Distance on Spherical Earth
Novice thinking: "Distance is just Pythagorean theorem"
Problem: Incorrect at scale, worse near poles.
Wrong approach:
// ❌ Flat Earth distance (wrong!) function distanceKm(lat1: number, lon1: number, lat2: number, lon2: number): number { const dx = lon2 - lon1; const dy = lat2 - lat1; return Math.sqrt(dx * dx + dy * dy) * 111.32; // 111.32 km/degree (WRONG) } // Example: San Francisco to New York const distance = distanceKm(37.7749, -122.4194, 40.7128, -74.0060); // Returns: ~55 km (WRONG! Actual: ~4,130 km)
Why wrong: Earth is a sphere, not a flat plane.
Correct approach 1: Haversine formula (great circle distance)
// ✅ Haversine formula (spherical Earth) function haversineKm(lat1: number, lon1: number, lat2: number, lon2: number): number { const R = 6371; // Earth radius in km const dLat = toRadians(lat2 - lat1); const dLon = toRadians(lon2 - lon1); const a = Math.sin(dLat / 2) * Math.sin(dLat / 2) + Math.cos(toRadians(lat1)) * Math.cos(toRadians(lat2)) * Math.sin(dLon / 2) * Math.sin(dLon / 2); const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a)); return R * c; } // San Francisco to New York const distance = haversineKm(37.7749, -122.4194, 40.7128, -74.0060); // Returns: ~4,130 km ✅
Correct approach 2: PostGIS (handles curvature automatically)
-- ✅ PostGIS ST_Distance with GEOGRAPHY SELECT ST_Distance( ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography, ST_SetSRID(ST_MakePoint(-74.0060, 40.7128), 4326)::geography ) / 1000 AS distance_km; -- Returns: 4130.137 km ✅
Accuracy comparison:
| Method | SF to NYC | Error |
|---|---|---|
| Euclidean (flat) | 55 km | 98.7% wrong |
| Haversine (sphere) | 4,130 km | ✅ Correct |
| PostGIS (ellipsoid) | 4,135 km | Most accurate |
Production Checklist
□ PostGIS extension installed and spatial indexes created □ All coordinates stored in consistent SRID (recommend: 4326) □ GeoJSON files optimized (<1MB) or served as vector tiles □ Coordinate transformations use ST_Transform, not manual math □ Distance calculations use ST_Distance with GEOGRAPHY type □ Bounding box queries use ST_MakeEnvelope + ST_Intersects □ Large geometries chunked (not >100KB per feature) □ Map tiles pre-generated for common zoom levels □ CORS configured for tile servers □ Rate limiting on geocoding/reverse geocoding endpoints
When to Use vs Avoid
| Scenario | Appropriate? |
|---|---|
| Drone imagery annotation and search | ✅ Yes - process survey data |
| GPS track visualization | ✅ Yes - optimize paths |
| Find nearest coffee shops | ✅ Yes - spatial queries |
| Jurisdiction boundary lookups | ✅ Yes - point-in-polygon |
| Simple address autocomplete | ❌ No - use Mapbox/Google |
| Embed static map on page | ❌ No - use Static API |
| Geocode single address | ❌ No - use geocoding API |
References
- EPSG codes, transformations, Web Mercator vs WGS84/references/coordinate-systems.md
- PostGIS setup, spatial indexes, common queries/references/postgis-guide.md
- Simplification, chunking, vector tiles/references/geojson-optimization.md
Scripts
- Process drone imagery, GPS tracks, GeoJSON validationscripts/geospatial_processor.ts
- Generate vector tiles (MBTiles/PMTiles) from GeoJSONscripts/tile_generator.ts
This skill guides: Geospatial data | PostGIS | GeoJSON | Map tiles | Coordinate systems | Drone data processing | Spatial queries