Claude-skill-registry bio-local-blast
Run local BLAST searches using BLAST+ command-line tools. Use when running fast unlimited searches, building custom databases, performing large-scale analysis, or when NCBI servers are slow or unavailable.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/local-blast" ~/.claude/skills/majiayu000-claude-skill-registry-bio-local-blast && rm -rf "$T"
manifest:
skills/data/local-blast/SKILL.mdsource content
Local BLAST
Run BLAST searches locally using NCBI BLAST+ command-line tools.
Installation
# macOS brew install blast # Ubuntu/Debian sudo apt install ncbi-blast+ # conda conda install -c bioconda blast # Verify installation blastn -version
BLAST+ Programs
| Command | Query | Database | Description |
|---|---|---|---|
| DNA | DNA | Nucleotide-nucleotide |
| Protein | Protein | Protein-protein |
| DNA | Protein | Translated query vs protein |
| Protein | DNA | Protein vs translated DB |
| DNA | DNA | Translated vs translated |
| - | - | Create BLAST database |
Creating BLAST Databases
makeblastdb - Create Database
# Create nucleotide database makeblastdb -in sequences.fasta -dbtype nucl -out my_db # Create protein database makeblastdb -in proteins.fasta -dbtype prot -out my_proteins # With title and parse sequence IDs makeblastdb -in sequences.fasta -dbtype nucl -out my_db \ -title "My Reference Database" -parse_seqids
Key Options:
| Option | Description | Values |
|---|---|---|
| Input FASTA file | Path |
| Database type | , |
| Output database name | Path prefix |
| Database title | String |
| Enable ID-based retrieval | Flag |
| Assign taxonomy ID | Integer |
| Taxonomy ID mapping file | Path |
Database Files Created
my_db.nhr # Header file (nucl) / .phr (prot) my_db.nin # Index file (nucl) / .pin (prot) my_db.nsq # Sequence file (nucl) / .psq (prot) my_db.ndb # Alias file (optional) my_db.not # ID index (if parse_seqids) my_db.ntf # Index (if parse_seqids) my_db.nto # Index (if parse_seqids)
Running BLAST Searches
Basic Usage
# BLASTN blastn -query query.fasta -db my_db -out results.txt # BLASTP blastp -query proteins.fasta -db my_proteins -out results.txt # BLASTX (translate query, search protein DB) blastx -query genes.fasta -db nr -out results.txt
Common Options
| Option | Description | Example |
|---|---|---|
| Query FASTA file | |
| Database name | |
| Output file | |
| Output format | |
| E-value threshold | |
| CPU threads | |
| Max hits | |
| Max HSPs per hit | |
| Word size | |
| Filter low complexity (nucl) | |
| Filter low complexity (prot) | |
Output Formats (-outfmt)
| Value | Format |
|---|---|
| Pairwise (default) |
| Query-anchored with identities |
| BLAST XML |
| Tabular |
| Tabular with comments |
| CSV |
Tabular Output Fields (-outfmt 6)
Default columns:
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
Custom columns:
blastn -query query.fa -db my_db -outfmt "6 qseqid sseqid pident length evalue stitle"
Available Fields:
| Field | Description |
|---|---|
| Query ID |
| Subject ID |
| Percent identity |
| Alignment length |
| Mismatches |
| Gap openings |
| Query start |
| Query end |
| Subject start |
| Subject end |
| E-value |
| Bit score |
| Subject title |
| Query coverage |
| Query coverage per HSP |
Code Patterns
Create Database and Search
#!/bin/bash # Create database from reference sequences makeblastdb -in reference.fasta -dbtype nucl -out ref_db -parse_seqids # Run BLAST blastn -query query.fasta -db ref_db -out results.txt \ -outfmt 6 -evalue 1e-10 -num_threads 4 # View results head results.txt
BLAST with Tabular Output
#!/bin/bash blastn -query query.fasta -db my_db \ -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore stitle" \ -evalue 1e-5 \ -max_target_seqs 10 \ -num_threads 8 \ -out results.tsv
Filter and Sort Results
# Get hits with >90% identity awk -F'\t' '$3 >= 90' results.tsv # Sort by E-value sort -t$'\t' -k11 -g results.tsv # Get best hit per query sort -t$'\t' -k1,1 -k11,11g results.tsv | sort -t$'\t' -k1,1 -u
Batch BLAST Multiple Files
#!/bin/bash for query_file in queries/*.fasta; do base=$(basename "$query_file" .fasta) echo "Processing $base..." blastn -query "$query_file" -db my_db \ -outfmt 6 -evalue 1e-5 -num_threads 4 \ -out "results/${base}_blast.tsv" done
Python Wrapper
import subprocess import os def make_blast_db(fasta_file, db_name, db_type='nucl'): cmd = ['makeblastdb', '-in', fasta_file, '-dbtype', db_type, '-out', db_name, '-parse_seqids'] subprocess.run(cmd, check=True) def run_blast(query, db, output, program='blastn', evalue=1e-5, threads=4, outfmt=6): cmd = [program, '-query', query, '-db', db, '-out', output, '-outfmt', str(outfmt), '-evalue', str(evalue), '-num_threads', str(threads)] subprocess.run(cmd, check=True) def parse_blast_tabular(filename): columns = ['qseqid', 'sseqid', 'pident', 'length', 'mismatch', 'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore'] hits = [] with open(filename) as f: for line in f: values = line.strip().split('\t') hit = dict(zip(columns, values)) hit['pident'] = float(hit['pident']) hit['evalue'] = float(hit['evalue']) hit['length'] = int(hit['length']) hits.append(hit) return hits # Example usage make_blast_db('reference.fasta', 'ref_db') run_blast('query.fasta', 'ref_db', 'results.tsv') hits = parse_blast_tabular('results.tsv') for hit in hits[:5]: print(f"{hit['qseqid']} -> {hit['sseqid']}: {hit['pident']}% identity, E={hit['evalue']}")
Reciprocal Best BLAST
#!/bin/bash # Forward BLAST: A vs B blastp -query species_A.fasta -db species_B_db -outfmt 6 -evalue 1e-5 \ -max_target_seqs 1 -out A_vs_B.tsv # Reverse BLAST: B vs A blastp -query species_B.fasta -db species_A_db -outfmt 6 -evalue 1e-5 \ -max_target_seqs 1 -out B_vs_A.tsv # Find reciprocal best hits awk 'NR==FNR {a[$1]=$2; next} $2 in a && a[$2]==$1' A_vs_B.tsv B_vs_A.tsv
Extract Hit Sequences
# Get subject sequence by ID (requires -parse_seqids) blastdbcmd -db my_db -entry "sequence_id" -out hit.fasta # Get multiple sequences blastdbcmd -db my_db -entry_batch ids.txt -out hits.fasta # Get all sequences from database blastdbcmd -db my_db -entry all -out all_seqs.fasta
Prebuilt Databases
Download from NCBI:
# Download and extract (uses update_blastdb.pl) update_blastdb.pl --decompress nt # Or download manually from: # https://ftp.ncbi.nlm.nih.gov/blast/db/
Common databases:
- All nucleotide sequencesnt
- Non-redundant proteinnr
- RefSeq RNArefseq_rna
- UniProt SwissProtswissprot
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Database not found | Check path, rebuild database |
| No matches or wrong DB type | Verify database type matches query |
| Query below word size | Lower word_size or use longer query |
| Large database | Reduce threads, use -num_threads 1 |
Local vs Remote BLAST
| Aspect | Local | Remote |
|---|---|---|
| Speed | Fast | Can be slow |
| Databases | Must download/create | All NCBI DBs available |
| Throughput | Unlimited | Rate limited |
| Setup | Requires installation | Just Biopython |
| Updates | Manual | Automatic |
Decision Tree
Running BLAST locally? ├── Have reference sequences? │ └── makeblastdb to create database ├── Download NCBI database? │ └── update_blastdb.pl or manual download ├── Need tabular output? │ └── -outfmt 6 (or 7 with headers) ├── Filter low-complexity? │ └── -dust yes (nucl) or -seg yes (prot) ├── Multiple queries? │ └── Put all in one FASTA, use -num_threads ├── Need XML output? │ └── -outfmt 5 └── Extract hit sequences? └── blastdbcmd -entry
Related Skills
- blast-searches - Remote BLAST via NCBI (no installation needed)
- sequence-io - Read/write FASTA files for queries
- batch-downloads - Download sequences to build local databases