Asi detecting-typosquatting-packages-in-npm-pypi
git clone https://github.com/plurigrid/asi
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/asi/skills/detecting-typosquatting-packages-in-npm-pypi" ~/.claude/skills/plurigrid-asi-detecting-typosquatting-packages-in-npm-pypi && rm -rf "$T"
plugins/asi/skills/detecting-typosquatting-packages-in-npm-pypi/SKILL.mdDetecting Typosquatting Packages in npm and PyPI
When to Use
- Auditing project dependencies to identify packages whose names are suspiciously similar to popular libraries
- Proactively scanning package registries for newly published packages that may be typosquats of your organization's packages
- Investigating a suspected supply chain compromise where a developer installed a misspelled package name
- Building automated monitoring that alerts when new packages appear with names close to critical dependencies
- Assessing the risk profile of unfamiliar packages before adding them to a project's dependency tree
Do not use as the sole determination of malicious intent; name similarity alone does not prove a package is malicious. Do not use for bulk automated takedown requests without manual review of flagged packages. Do not use against private registries without authorization.
Prerequisites
- Python 3.9+ with
andrequests
(orpython-Levenshtein
) packages installedrapidfuzz - Network access to
(PyPI JSON API) andhttps://pypi.org/pypi/<package>/json
(npm registry API)https://registry.npmjs.org/<package> - A list of popular or critical packages to monitor (e.g., top 1000 PyPI packages, organization's dependency list)
- Understanding of common typosquatting patterns: character omission, transposition, insertion, substitution, and hyphen/underscore manipulation
Workflow
Step 1: Build the Target Package Watchlist
Establish the set of legitimate packages to monitor for typosquats:
- Extract project dependencies: Parse
,requirements.txt
,Pipfile.lock
, orpackage.json
to extract all direct and transitive dependency namespackage-lock.json - Include popular packages: Supplement with high-value targets from the top 1000 PyPI downloads (available from
) or top npm packages by download counthttps://hugovk.github.io/top-pypi-packages/ - Add organization packages: Include any packages published by your organization that attackers might target with typosquats to intercept internal installations
- Normalize names: PyPI treats hyphens, underscores, and periods as equivalent (PEP 503 normalization:
). npm package names are case-sensitive but scoped packages usere.sub(r"[-_.]+", "-", name).lower()
format. Normalize before comparison.@scope/name
Step 2: Generate Candidate Typosquat Names
Produce potential typosquat variants for each target package:
- Character omission: Remove each character one at a time (
->requests
,rquests
,requets
)reqests - Character transposition: Swap adjacent characters (
->requests
,erquests
,rqeuests
)reques ts - Character substitution: Replace characters with keyboard-adjacent keys using a QWERTY distance map (
->requests
,rrquests
)requesta - Character insertion: Insert common characters at each position (
->requests
,rrequests
)reqquests - Separator manipulation: For hyphenated names, try removing, doubling, or replacing separators (
->my-package
,mypackage
,my--package
)my_package - Common prefix/suffix attacks: Prepend or append common strings (
,python-requests
,requests-python
,requests2
)requests-lib
Step 3: Query Registry APIs for Candidate Packages
Check whether generated candidate names actually exist in the registry:
- PyPI JSON API: Send
for each candidate. AGET https://pypi.org/pypi/<candidate>/json
response means the package exists;200
means it does not. Extract from the response:404
,info.name
,info.version
,info.author
,info.summary
,info.home_page
, andinfo.project_urls
(keyed by version withreleases
timestamps).upload_time_iso_8601 - npm registry API: Send
withGET https://registry.npmjs.org/<candidate>
. Extract:Accept: application/json
,name
,description
,dist-tags.latest
,time.created
,time.modified
, andmaintainers
.versions - Rate limiting: PyPI has no published rate limits but respect reasonable request rates (1-2 requests/second). npm registry returns
when rate limited; implement exponential backoff.429 - Batch optimization: For large candidate lists, parallelize requests with connection pooling (
) and limit concurrency to avoid triggering abuse protections.requests.Session
Step 4: Analyze Package Metadata for Suspicion Signals
Score each existing candidate package against multiple heuristic signals:
- Levenshtein distance: Calculate the edit distance between the candidate name and the target. Packages with distance 1-2 from a popular package are high-priority suspects. Historical analysis shows 18 of 40 known typosquats had Levenshtein distance of 2 or less from their targets.
- Publish date recency: Compare the candidate's first publish date against the target's. A package created years after its near-namesake is more suspicious. Flag packages created within the last 90 days that are similar to packages published years ago.
- Download count disparity: Compare weekly downloads. Legitimate similarly-named packages typically have comparable or explainable download counts. A package with 50 downloads versus its near-namesake with 5 million downloads is suspicious. PyPI download stats are available via BigQuery (
); npm provides download counts atpypistats.org/api/
.https://api.npmjs.org/downloads/point/last-week/<package> - Author and maintainer analysis: Check if the candidate package author matches the legitimate package author. Different authors for near-identical names increase suspicion.
- Description similarity: Compare package descriptions. Typosquats frequently copy or closely paraphrase the target package description to appear legitimate.
- Version count: Legitimate packages typically have many versions over time. A package with only 1-2 versions and a name similar to a popular package is suspicious.
- Repository URL analysis: Check if the candidate links to the same repository as the target (likely legitimate fork/mirror) or has no repository URL (suspicious).
Step 5: Score, Rank, and Report Findings
Combine signals into a composite risk score and generate an actionable report:
- Weighted scoring: Assign weights to each signal. Example: Levenshtein distance 1 = 40 points, Levenshtein distance 2 = 25 points, created < 90 days ago = 15 points, download ratio < 0.001 = 15 points, different author = 10 points, single version = 5 points. Total score out of 100.
- Threshold classification: Score >= 70: HIGH risk (likely typosquat), 40-69: MEDIUM risk (requires manual review), < 40: LOW risk (likely legitimate)
- Generate report: For each flagged package, include the target it mimics, all signal values, the composite score, direct links to both packages on the registry, and a recommendation (block, investigate, or allow)
- Actionable output: Produce a blocklist of flagged package names that can be imported into package manager deny-lists, CI/CD policy engines, or artifact repository proxy rules
Key Concepts
| Term | Definition |
|---|---|
| Typosquatting | Registering a package name that closely resembles a popular package, exploiting common typos to trick developers into installing malicious code |
| Levenshtein Distance | The minimum number of single-character edits (insertions, deletions, substitutions) required to transform one string into another; the primary metric for measuring name similarity |
| Dependency Confusion | A broader supply chain attack where attackers publish malicious packages to public registries with names matching private internal packages, exploiting package manager resolution order |
| PEP 503 Normalization | The Python packaging specification that treats hyphens, underscores, and periods as equivalent in package names, meaning , , and resolve to the same package |
| QWERTY Distance | A keyboard-layout-aware distance metric measuring how far apart two keys are on a standard keyboard, used to detect substitutions from adjacent key mistyping |
| Combosquatting | A variant of typosquatting where attackers prepend or append common words to a package name (e.g., , ) |
| StarJacking | An attack where a typosquat package links its repository URL to the legitimate package's GitHub repository to inflate apparent credibility |
Tools & Systems
- PyPI JSON API: REST API at
returning package metadata including name, author, versions, upload timestamps, and project URLshttps://pypi.org/pypi/<package>/json - npm Registry API: REST API at
returning package metadata including maintainers, version history, creation timestamps, and distribution infohttps://registry.npmjs.org/<package> - python-Levenshtein / rapidfuzz: Python libraries for fast string distance computation, supporting Levenshtein, Damerau-Levenshtein, Jaro-Winkler, and other similarity metrics
- pypistats.org API: Provides download statistics for PyPI packages, enabling download count comparison between suspected typosquats and their targets
- npm download counts API: Endpoint at
providing download statistics for npm packageshttps://api.npmjs.org/downloads/point/<period>/<package>
Common Scenarios
Scenario: Auditing a Python Project for Typosquatted Dependencies
Context: A security team discovers that a developer's workstation was compromised after installing a Python package. The incident response team needs to audit all project dependencies for potential typosquats and establish ongoing monitoring.
Approach:
- Parse
andrequirements.txt
to extract all 87 direct and transitive dependenciesPipfile.lock - Generate typosquat candidates for each dependency using character omission, transposition, substitution, and separator manipulation, producing approximately 2,400 candidate names
- Query the PyPI JSON API for each candidate, finding 34 that actually exist as published packages
- Score each existing candidate: 3 packages score above 70 (HIGH risk) with Levenshtein distance 1, created within the last 60 days, single version, and fewer than 100 downloads
- Manual review confirms 2 of the 3 are malicious typosquats containing obfuscated code that exfiltrates environment variables during installation
- Block the malicious packages in the organization's artifact proxy, report to PyPI for takedown via
, and add all 87 dependencies to the ongoing monitoring watchlistsecurity@pypi.org - Implement the detection agent as a scheduled CI job that runs weekly and alerts on new HIGH-risk findings
Pitfalls:
- Not normalizing PyPI package names per PEP 503 before comparison, causing missed matches between hyphenated and underscored variants
- Setting the Levenshtein distance threshold too low (only 1) and missing typosquats at distance 2 that use double substitutions
- Relying solely on name similarity without checking metadata signals, leading to high false positive rates on legitimately similar package names
- Not accounting for npm scoped packages (
) which have different naming rules than unscoped packages@scope/name - Querying the registries too aggressively and getting rate-limited or IP-blocked
Output Format
## Typosquatting Detection Report **Scan Date**: 2026-03-19 **Registry**: PyPI **Packages Monitored**: 87 **Candidates Generated**: 2,412 **Candidates Found in Registry**: 34 **Flagged as Suspicious**: 5 ### HIGH Risk (Score >= 70) | Suspect Package | Target Package | Levenshtein | Created | Downloads | Score | |----------------|---------------|-------------|---------|-----------|-------| | reqeusts | requests | 1 | 2026-02-28 | 43 | 92 | | requsets | requests | 1 | 2026-03-01 | 12 | 88 | | numpyy | numpy | 1 | 2026-01-15 | 67 | 78 | ### Recommendation - BLOCK: reqeusts, requsets, numpyy (add to artifact proxy deny-list) - REPORT: Submit malware reports to security@pypi.org with package names and evidence - MONITOR: Continue weekly scans for the full dependency watchlist