git clone https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/bio-sequence-similarity" ~/.claude/skills/freedomintelligence-openclaw-medical-skills-bio-sequence-similarity && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/bio-sequence-similarity" ~/.openclaw/skills/freedomintelligence-openclaw-medical-skills-bio-sequence-similarity && rm -rf "$T"
skills/bio-sequence-similarity/SKILL.md- downloads files (wget)
name: bio-sequence-similarity description: Find homologous sequences using iterative BLAST (PSI-BLAST), profile HMMs (HMMER), and reciprocal best hit analysis. Use when identifying orthologs, distant homologs, or protein family members where standard BLAST is not sensitive enough. tool_type: mixed primary_tool: BLAST+ measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Sequence Similarity Searches
Advanced methods for finding homologous sequences beyond standard BLAST.
PSI-BLAST (Position-Specific Iterated BLAST)
Builds a position-specific scoring matrix (PSSM) through iterations to find distant homologs.
Basic PSI-BLAST
psiblast -query protein.fasta -db nr -out results.txt -num_iterations 3
Save PSSM for Reuse
psiblast -query protein.fasta -db nr \ -out results.txt \ -out_pssm pssm.asn \ -out_ascii_pssm pssm.txt \ -num_iterations 5
Use Existing PSSM
psiblast -in_pssm pssm.asn -db nr -out results.txt
Output Format
psiblast -query protein.fasta -db nr \ -out results.txt \ -outfmt 6 \ -num_iterations 3 \ -inclusion_ethresh 0.001
Key Parameters
psiblast -query protein.fasta -db nr \ -num_iterations 5 \ -inclusion_ethresh 0.001 \ -evalue 0.01 \ -num_threads 8 \ -out results.txt
PSI-BLAST Parameters
| Parameter | Default | Description |
|---|---|---|
| -num_iterations | 1 | Number of iterations |
| -inclusion_ethresh | 0.002 | E-value for PSSM inclusion |
| -evalue | 10 | E-value threshold for reporting |
| -num_threads | 1 | CPU threads |
HMMER for Profile Searches
HMMER uses profile hidden Markov models for sensitive sequence searches.
Search with Single Sequence
jackhmmer -o results.txt -A aligned.sto --cpu 8 query.fasta database.fasta
Build Profile from Alignment
hmmbuild profile.hmm alignment.sto
Search Database with Profile
hmmsearch -o results.txt --tblout hits.tbl profile.hmm database.fasta hmmsearch -o results.txt --domtblout domains.tbl profile.hmm database.fasta
Download Pfam Profiles
wget https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz gunzip Pfam-A.hmm.gz hmmpress Pfam-A.hmm
Scan Sequence Against Pfam
hmmscan --tblout pfam_hits.tbl --domtblout domains.tbl Pfam-A.hmm query.fasta
Parse HMMER Output
grep -v "^#" hits.tbl | head awk '$5 < 1e-10' hits.tbl
HMMER Output Columns (--tblout)
| Column | Description |
|---|---|
| 1 | Target name |
| 2 | Accession |
| 3 | Query name |
| 4 | Query accession |
| 5 | E-value (full sequence) |
| 6 | Score (full sequence) |
| 7 | Bias |
| 8 | E-value (best domain) |
| 9 | Score (best domain) |
Reciprocal Best Hit (RBH) Analysis
Find orthologs using bidirectional best hits.
Create BLAST Databases
makeblastdb -in species_A.fasta -dbtype prot -out species_A_db makeblastdb -in species_B.fasta -dbtype prot -out species_B_db
Bidirectional BLAST
blastp -query species_A.fasta -db species_B_db -outfmt 6 -evalue 1e-5 -max_target_seqs 1 > A_vs_B.txt blastp -query species_B.fasta -db species_A_db -outfmt 6 -evalue 1e-5 -max_target_seqs 1 > B_vs_A.txt
Find Reciprocal Best Hits
awk 'FNR==NR {a[$1]=$2; next} $2 in a && a[$2]==$1 {print $1"\t"$2}' \ A_vs_B.txt B_vs_A.txt > reciprocal_best_hits.txt
Python RBH Script
def find_rbh(forward_blast, reverse_blast): '''Find reciprocal best hits from BLAST results''' forward = {} with open(forward_blast) as f: for line in f: parts = line.strip().split('\t') query, subject = parts[0], parts[1] if query not in forward: forward[query] = subject reverse = {} with open(reverse_blast) as f: for line in f: parts = line.strip().split('\t') query, subject = parts[0], parts[1] if query not in reverse: reverse[query] = subject rbh = [] for a, b in forward.items(): if b in reverse and reverse[b] == a: rbh.append((a, b)) return rbh rbh_pairs = find_rbh('A_vs_B.txt', 'B_vs_A.txt') for a, b in rbh_pairs: print(f'{a}\t{b}')
Delta-BLAST
Uses conserved domain database for more sensitive initial search.
deltablast -query protein.fasta -db nr -rpsdb cdd_delta -out results.txt
PHI-BLAST (Pattern-Hit Initiated)
Search with a pattern plus sequence.
phi_pattern="G-x(2)-[ST]-x-[RK]" phiblast -query protein.fasta -db nr -pattern "$phi_pattern" -out results.txt
Iterative Search with Biopython
from Bio.Blast import NCBIWWW, NCBIXML with open('query.fasta') as f: query = f.read() result_handle = NCBIWWW.qblast('psiblast', 'nr', query, expect=0.001, word_size=3) with open('psiblast_result.xml', 'w') as out: out.write(result_handle.read()) result_handle.close() with open('psiblast_result.xml') as f: records = NCBIXML.parse(f) for record in records: for alignment in record.alignments: for hsp in alignment.hsps: if hsp.expect < 1e-10: print(f'{alignment.hit_def[:50]}: E={hsp.expect}')
HMMER with Biopython
from Bio import SearchIO results = SearchIO.parse('hmmsearch_output.txt', 'hmmer3-text') for query_result in results: print(f'Query: {query_result.id}') for hit in query_result: print(f' Hit: {hit.id}, E-value: {hit.evalue}') for hsp in hit: print(f' Domain: {hsp.bitscore} bits')
Jackhmmer (Iterative HMMER)
Similar to PSI-BLAST but uses HMM profiles.
jackhmmer -N 5 -o results.txt --tblout hits.tbl query.fasta database.fasta jackhmmer -N 5 -A iterations.sto --chkhmm checkpoint query.fasta database.fasta
OrthoFinder for Multi-Species Orthologs
orthofinder -f proteomes/ -t 8 orthofinder -f proteomes/ -t 8 -M msa
Prepare Input
mkdir proteomes cp species_*.fasta proteomes/
Output Files
| File | Content |
|---|---|
| Orthogroups.tsv | All orthogroups |
| Orthogroups_SingleCopyOrthologues.txt | 1:1 orthologs |
| Species_Tree/ | Inferred species tree |
| Gene_Trees/ | Individual gene trees |
E-value vs Bit Score
| E-value | Interpretation |
|---|---|
| < 1e-50 | Highly significant, likely homolog |
| 1e-50 to 1e-10 | Significant, probable homolog |
| 1e-10 to 1e-3 | Marginal, possible remote homolog |
| > 0.01 | Not significant |
Complete Ortholog Finding Pipeline
#!/bin/bash SPECIES_A=$1 SPECIES_B=$2 EVALUE=1e-10 THREADS=8 echo "Building databases..." makeblastdb -in $SPECIES_A -dbtype prot -out db_A makeblastdb -in $SPECIES_B -dbtype prot -out db_B echo "Running forward BLAST..." blastp -query $SPECIES_A -db db_B -outfmt 6 -evalue $EVALUE \ -max_target_seqs 1 -num_threads $THREADS > forward.txt echo "Running reverse BLAST..." blastp -query $SPECIES_B -db db_A -outfmt 6 -evalue $EVALUE \ -max_target_seqs 1 -num_threads $THREADS > reverse.txt echo "Finding reciprocal best hits..." awk 'FNR==NR {best[$1]=$2; next} $2 in best && best[$2]==$1 {print $1"\t"$2}' \ forward.txt reverse.txt > orthologs.txt echo "Found $(wc -l < orthologs.txt) ortholog pairs" rm -f db_A.* db_B.*
Related Skills
- blast-searches - Basic remote BLAST
- local-blast - Local BLAST databases
- entrez-fetch - Download sequences
- alignment - Align identified homologs