SciAgent-Skills uspto-database
Access US Patent and Trademark Office (USPTO) patent data via the PatentsView REST API and Google Patents Public Data (BigQuery). Use it to search patents by inventor, assignee, CPC classification, or keywords; download full patent metadata and claims; analyze patent portfolios; and track technology trends. Ideal for IP landscape analysis, competitor monitoring, prior art searches, and technology forecasting in life sciences and biotech.
git clone https://github.com/jaechang-hits/SciAgent-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scientific-computing/uspto-database" ~/.claude/skills/jaechang-hits-sciagent-skills-uspto-database && rm -rf "$T"
skills/scientific-computing/uspto-database/SKILL.mduspto-database
Overview
The USPTO provides two primary programmatic access points for patent data: the PatentsView API (REST, free, no key required for basic use) for structured queries by inventor, assignee, CPC classification, and keywords; and Google Patents Public Data (BigQuery public dataset) for large-scale analytics across the full patent corpus. Both expose data under the CC0 Public Domain Dedication. This skill covers Python-based access patterns for both, plus basic patent portfolio analytics.
When to Use
- Prior art search: Finding existing patents relevant to a technology before filing or to assess freedom-to-operate.
- Competitor IP landscape analysis: Querying all patents from a specific assignee (company or institution) to map their technology portfolio.
- CPC classification search: Finding patents in a specific technology area using Cooperative Patent Classification codes (e.g., C12N for nucleotides/genetic engineering).
- Inventor network analysis: Identifying prolific inventors in a field and their institutional affiliations.
- Technology trend tracking: Counting patent filings by year and technology category to identify emerging areas.
- Life sciences IP analysis: Searching biotech-specific classifications (A61K for pharmaceuticals, C12N for genetics, G16B for bioinformatics).
- For full-text patent PDF downloads, use the USPTO Bulk Data Storage System (BDSS) or Google Patents direct links.
- Rate limits: PatentsView API allows 45 requests/minute without an API key; request a free key for 45 req/min with higher daily limits.
Prerequisites
- Python packages:
,requests
,pandasmatplotlib - Optional:
for Google Patents Public Data queriesgoogle-cloud-bigquery - Data requirements: No account needed for PatentsView basic queries; Google Cloud account required for BigQuery
- Rate limits: PatentsView — 45 requests/minute (unauthenticated), higher with free API key
pip install requests pandas matplotlib pip install google-cloud-bigquery # optional: for BigQuery access
Quick Start
import requests import pandas as pd # Search PatentsView API: patents assigned to "Genentech" in CPC class C12N url = "https://api.patentsview.org/patents/query" payload = { "q": {"_and": [ {"_contains": {"assignee_organization": "Genentech"}}, {"_contains": {"cpc_subgroup_id": "C12N"}}, ]}, "f": ["patent_number", "patent_title", "patent_date", "assignee_organization"], "o": {"per_page": 25}, } resp = requests.post(url, json=payload) data = resp.json() df = pd.DataFrame(data["patents"]) print(f"Found: {data['total_patent_count']} patents") print(df[["patent_number", "patent_title", "patent_date"]].head())
Core API
Query Type 1: Search by Assignee (Company / Institution)
Find all patents granted to a specific organization.
import requests import pandas as pd def search_by_assignee(assignee_name: str, per_page: int = 100) -> pd.DataFrame: url = "https://api.patentsview.org/patents/query" payload = { "q": {"_contains": {"assignee_organization": assignee_name}}, "f": [ "patent_number", "patent_title", "patent_date", "patent_abstract", "assignee_organization", "assignee_country", ], "o": {"per_page": per_page, "sort": [{"patent_date": "desc"}]}, } resp = requests.post(url, json=payload) resp.raise_for_status() data = resp.json() df = pd.DataFrame(data.get("patents", [])) print(f"Assignee '{assignee_name}': {data.get('total_patent_count', 0)} total patents") return df # Example: patents from Broad Institute df_broad = search_by_assignee("Broad Institute") print(df_broad[["patent_number", "patent_title", "patent_date"]].head(10))
# Paginate through all results for large portfolios def search_assignee_all_pages(assignee_name: str, page_size: int = 100) -> pd.DataFrame: url = "https://api.patentsview.org/patents/query" all_patents = [] page = 1 while True: payload = { "q": {"_contains": {"assignee_organization": assignee_name}}, "f": ["patent_number", "patent_title", "patent_date", "cpc_subgroup_id"], "o": {"per_page": page_size, "page": page}, } resp = requests.post(url, json=payload) data = resp.json() patents = data.get("patents", []) if not patents: break all_patents.extend(patents) total = data.get("total_patent_count", 0) if len(all_patents) >= total: break page += 1 df = pd.DataFrame(all_patents) print(f"Retrieved {len(df)} patents for '{assignee_name}'") return df
Query Type 2: Search by CPC Classification
CPC (Cooperative Patent Classification) codes organize patents by technology. Life sciences codes include C12N (nucleotides/genetics), A61K (pharmaceuticals), and G16B (bioinformatics).
import requests import pandas as pd # Search by CPC subgroup: C12N15 (mutation/genetic engineering) url = "https://api.patentsview.org/patents/query" payload = { "q": {"_begins": {"cpc_subgroup_id": "C12N15"}}, "f": [ "patent_number", "patent_title", "patent_date", "assignee_organization", "cpc_subgroup_id", ], "o": {"per_page": 50, "sort": [{"patent_date": "desc"}]}, } resp = requests.post(url, json=payload) data = resp.json() df = pd.DataFrame(data["patents"]) print(f"C12N15 patents: {data['total_patent_count']}") print(df[["patent_number", "patent_title", "assignee_organization"]].head(10))
# Common life sciences CPC codes CPC_LIFE_SCIENCES = { "C12N": "Microorganisms / enzymes / compositions", "C12N15": "Mutation / genetic engineering", "C12Q": "Measuring / testing involving enzymes or microorganisms", "A61K": "Preparations for medical use", "A61P": "Therapeutic activity of chemical compounds", "G16B": "Bioinformatics", "G16H": "Healthcare informatics", "C07K": "Peptides / proteins", } for code, desc in CPC_LIFE_SCIENCES.items(): print(f" {code:10s}: {desc}")
Query Type 3: Full-Text Keyword Search
Search patent titles and abstracts for specific terms.
import requests import pandas as pd def keyword_search(keyword: str, per_page: int = 50) -> pd.DataFrame: url = "https://api.patentsview.org/patents/query" payload = { "q": {"_or": [ {"_text_any": {"patent_title": keyword}}, {"_text_any": {"patent_abstract": keyword}}, ]}, "f": [ "patent_number", "patent_title", "patent_date", "patent_abstract", "assignee_organization", ], "o": {"per_page": per_page, "sort": [{"patent_date": "desc"}]}, } resp = requests.post(url, json=payload) resp.raise_for_status() data = resp.json() df = pd.DataFrame(data.get("patents", [])) print(f"Keyword '{keyword}': {data.get('total_patent_count', 0)} patents found") return df # Search for CRISPR-related patents df_crispr = keyword_search("CRISPR") print(df_crispr[["patent_number", "patent_title", "patent_date"]].head(10))
Query Type 4: Inventor Search
Find patents by inventor name or retrieve an inventor's full publication history.
import requests import pandas as pd # Search by inventor name url = "https://api.patentsview.org/inventors/query" payload = { "q": {"_and": [ {"inventor_last_name": "Doudna"}, {"inventor_first_name": "Jennifer"}, ]}, "f": ["inventor_id", "inventor_first_name", "inventor_last_name", "inventor_city", "inventor_state", "inventor_country"], "o": {"per_page": 10}, } resp = requests.post(url, json=payload) data = resp.json() print(f"Found {data.get('total_inventor_count', 0)} inventors matching 'Jennifer Doudna'") for inv in data.get("inventors", []): print(f" ID: {inv['inventor_id']}, Location: {inv.get('inventor_city')}, {inv.get('inventor_country')}")
# Get all patents for a specific inventor by inventor_id inventor_id = "fl:j_ln:doudna-1" # PatentsView inventor ID format url = "https://api.patentsview.org/patents/query" payload = { "q": {"inventor_id": inventor_id}, "f": ["patent_number", "patent_title", "patent_date", "assignee_organization"], "o": {"per_page": 100, "sort": [{"patent_date": "desc"}]}, } resp = requests.post(url, json=payload) data = resp.json() df = pd.DataFrame(data.get("patents", [])) print(f"Patents for inventor {inventor_id}: {data.get('total_patent_count', 0)}") print(df.head(5))
Query Type 5: Date Range and Combined Filters
Combine multiple filters for targeted searches.
import requests import pandas as pd # Patents in gene therapy (CPC A61K48) filed 2020-2024 by a US assignee url = "https://api.patentsview.org/patents/query" payload = { "q": {"_and": [ {"_begins": {"cpc_subgroup_id": "A61K48"}}, {"_gte": {"patent_date": "2020-01-01"}}, {"_lte": {"patent_date": "2024-12-31"}}, {"_eq": {"assignee_country": "US"}}, ]}, "f": [ "patent_number", "patent_title", "patent_date", "assignee_organization", "patent_num_claims", ], "o": {"per_page": 100, "sort": [{"patent_date": "desc"}]}, } resp = requests.post(url, json=payload) data = resp.json() df = pd.DataFrame(data.get("patents", [])) print(f"Gene therapy patents 2020-2024 (US assignee): {data.get('total_patent_count', 0)}") print(df[["patent_number", "patent_title", "patent_date", "assignee_organization"]].head(10))
Query Type 6: Google Patents BigQuery
For large-scale corpus analytics, use the public Google Patents dataset in BigQuery.
from google.cloud import bigquery client = bigquery.Client(project="YOUR_GCP_PROJECT") # Count CRISPR patents by year (Google Patents public data) query = """ SELECT EXTRACT(YEAR FROM filing_date) AS filing_year, COUNT(*) AS patent_count, COUNT(DISTINCT assignee) AS unique_assignees FROM `patents-public-data.patents.publications` WHERE (LOWER(title_localized[SAFE_OFFSET(0)].text) LIKE '%crispr%' OR LOWER(abstract_localized[SAFE_OFFSET(0)].text) LIKE '%crispr%') AND filing_date >= '2010-01-01' AND country_code = 'US' GROUP BY filing_year ORDER BY filing_year """ df_bq = client.query(query).to_dataframe() print(df_bq) print(f"Peak year: {df_bq.loc[df_bq.patent_count.idxmax(), 'filing_year']} " f"({df_bq.patent_count.max()} patents)")
Key Parameters
| Parameter | Module | Default | Range / Options | Effect |
|---|---|---|---|---|
| PatentsView | | – | Results per API call |
| PatentsView | | –max pages | Page number for pagination |
| PatentsView | API default | any field + / | Sort order of results |
fields | PatentsView | minimal | any valid field list | Fields returned in response (controls payload size) |
| query operator | — | field + prefix string | Prefix match (e.g., CPC code prefix) |
| query operator | — | field + substring | Substring search (case-insensitive) |
| query operator | — | field + keywords | Full-text search on title/abstract fields |
Best Practices
-
Request only the fields you need: The
(fields) parameter controls what is returned. Requesting"f"
for thousands of patents significantly increases payload size and latency.patent_abstract -
Always handle pagination for large result sets: PatentsView caps responses at 10,000 per page maximum. For queries returning >10,000 results, use date-range slicing or narrower CPC codes to split the query.
-
Cache API responses to disk: PatentsView is rate-limited; if building a dataset iteratively, save responses to JSON/CSV after each API call.
import json, pathlib cache = pathlib.Path("cache") cache.mkdir(exist_ok=True) cache_file = cache / "genentech_patents.json" if not cache_file.exists(): resp = requests.post(url, json=payload) cache_file.write_text(resp.text) data = json.loads(cache_file.read_text()) -
Use CPC codes for technology-specific searches, not just keywords: Keywords miss synonyms and foreign-language patents; CPC codes are assigned by patent examiners and are more systematic.
-
Validate assignee names: Company names in patent records vary (e.g., "Genentech Inc.", "Genentech, Inc.", "GENENTECH INC"). Use
for fuzzy matching, then deduplicate in pandas._contains
Common Workflows
Workflow 1: Technology Landscape Analysis — Filing Trends by Year
Goal: Count patents filed in a CPC class by year and plot the trend.
import requests import pandas as pd import matplotlib.pyplot as plt from collections import defaultdict def count_patents_by_year(cpc_prefix: str, start_year: int = 2010) -> pd.DataFrame: url = "https://api.patentsview.org/patents/query" counts = defaultdict(int) page = 1 while True: payload = { "q": {"_and": [ {"_begins": {"cpc_subgroup_id": cpc_prefix}}, {"_gte": {"patent_date": f"{start_year}-01-01"}}, ]}, "f": ["patent_number", "patent_date"], "o": {"per_page": 10000, "page": page}, } resp = requests.post(url, json=payload) patents = resp.json().get("patents", []) if not patents: break for p in patents: year = p["patent_date"][:4] counts[year] += 1 total = resp.json().get("total_patent_count", 0) if sum(counts.values()) >= total: break page += 1 df = pd.DataFrame(sorted(counts.items()), columns=["year", "count"]) return df df_trend = count_patents_by_year("C12N15", start_year=2010) fig, ax = plt.subplots(figsize=(8, 4)) ax.bar(df_trend["year"], df_trend["count"], color="steelblue", edgecolor="white") ax.set_xlabel("Year") ax.set_ylabel("Patents granted") ax.set_title("US Patents: C12N15 (Genetic Engineering) by Year") plt.xticks(rotation=45) plt.tight_layout() plt.savefig("cpc_trend.png", dpi=150) print(f"Trend plotted: {df_trend['count'].sum()} total patents -> cpc_trend.png")
Workflow 2: Assignee Portfolio Comparison
Goal: Compare patent counts across multiple biotech companies in a target CPC class.
import requests import pandas as pd import matplotlib.pyplot as plt def count_patents_by_assignee(assignees: list, cpc_prefix: str) -> pd.DataFrame: url = "https://api.patentsview.org/patents/query" records = [] for assignee in assignees: payload = { "q": {"_and": [ {"_contains": {"assignee_organization": assignee}}, {"_begins": {"cpc_subgroup_id": cpc_prefix}}, ]}, "f": ["patent_number"], "o": {"per_page": 1}, # only need total count } resp = requests.post(url, json=payload) total = resp.json().get("total_patent_count", 0) records.append({"assignee": assignee, "patent_count": total}) print(f" {assignee}: {total} patents") df = pd.DataFrame(records).sort_values("patent_count", ascending=True) return df companies = ["Genentech", "Amgen", "Regeneron", "AstraZeneca", "Novartis"] df_comp = count_patents_by_assignee(companies, cpc_prefix="A61K") fig, ax = plt.subplots(figsize=(7, 4)) ax.barh(df_comp["assignee"], df_comp["patent_count"], color="salmon") ax.set_xlabel("Patent count (A61K)") ax.set_title("Pharmaceutical Patents by Assignee (CPC A61K)") plt.tight_layout() plt.savefig("assignee_comparison.png", dpi=150) print("Comparison chart saved -> assignee_comparison.png")
Expected Outputs
with patent records (columns depend on requestedpd.DataFrame
fields)"f"
— bar chart of patent counts by yearcpc_trend.png
— horizontal bar chart comparing companiesassignee_comparison.png
in API response gives the full corpus size for a querytotal_patent_count
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Exceeded 45 req/min rate limit | Add between requests; request a free API key |
Empty list in response | Query too narrow or field name incorrect | Check field names in PatentsView API docs; test query in the web UI first |
| Results miss known patents | Exact string matching on assignee name | Use instead of ; check for name variants |
| Field not requested in list | Add to the array |
| BigQuery auth error | GCP credentials not configured | Run or set |
| CPC prefix returns no results | Invalid CPC code or typo | Verify code at CPC classification browser |
References
- PatentsView API Documentation — full field list and query operator reference
- PatentsView API Explorer — browser-based query builder for testing
- Google Patents Public Data (BigQuery) — schema documentation
- CPC Classification Browser — browse life sciences CPC codes
- USPTO Bulk Data Storage System — full-text patent XML downloads