Claude-skill-registry bio-crispr-screens-jacks-analysis
JACKS (Joint Analysis of CRISPR/Cas9 Knockout Screens) for modeling sgRNA efficacy and gene essentiality. Use when analyzing multiple CRISPR screens simultaneously or when accounting for variable sgRNA efficiency across experiments.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/jacks-analysis" ~/.claude/skills/majiayu000-claude-skill-registry-bio-crispr-screens-jacks-analysis && rm -rf "$T"
manifest:
skills/data/jacks-analysis/SKILL.mdsource content
JACKS CRISPR Screen Analysis
JACKS jointly models sgRNA efficacy and gene essentiality across multiple experiments. It infers both gene-level fitness effects and sgRNA-specific efficiency.
Installation
pip install jacks # or git clone https://github.com/felicityallen/JACKS.git cd JACKS && pip install -e .
Input File Formats
Count Data
# counts.txt (tab-separated) sgRNA Gene Sample1 Sample2 Sample3 Control1 Control2 sgRNA1 GENE_A 100 120 90 80 85 sgRNA2 GENE_A 200 180 210 150 160 sgRNA3 GENE_B 50 45 55 60 58 ...
Replicate Map
# replicatemap.txt Sample1 Experiment1 Day14 Sample2 Experiment1 Day14 Sample3 Experiment2 Day14 Control1 Experiment1 Day0 Control2 Experiment2 Day0
Guide-Gene Map
# guidemap.txt sgRNA1 GENE_A sgRNA2 GENE_A sgRNA3 GENE_B sgRNA4 GENE_B ...
Basic JACKS Analysis
Command Line
# Run JACKS python -m jacks.run_JACKS \ counts.txt \ replicatemap.txt \ guidemap.txt \ output_prefix \ --ctrl_sample_pattern "Day0" \ --ctrl_sample_pattern_column "Condition"
Python API
from jacks import infer import pandas as pd # Load data counts = pd.read_csv('counts.txt', sep='\t', index_col=0) guide_gene_map = pd.read_csv('guidemap.txt', sep='\t', header=None, names=['sgRNA', 'Gene']) replicate_map = pd.read_csv('replicatemap.txt', sep='\t', header=None, names=['Sample', 'Experiment', 'Condition']) # Separate control and treatment samples ctrl_samples = replicate_map[replicate_map['Condition'] == 'Day0']['Sample'].tolist() treatment_samples = replicate_map[replicate_map['Condition'] == 'Day14']['Sample'].tolist() # Run JACKS inference # n_iterations=10000: MCMC iterations. Increase for final analysis. # burn_in=1000: Burn-in period. Should be ~10% of iterations. jacks_results = infer.run_inference( counts, guide_gene_map, treatment_samples, ctrl_samples, n_iterations=10000, burn_in=1000 )
Output Files
| File | Description |
|---|---|
| Gene-level essentiality scores |
| sgRNA-level efficacy estimates |
| Full model for downstream analysis |
Interpret Gene Results
import pandas as pd import numpy as np # Load gene results genes = pd.read_csv('output_gene_JACKS_results.txt', sep='\t') # JACKS score: negative = essential (dropout), positive = enriched # Columns: gene, X1 (effect), X2 (std), fdr_log10 # Essential genes (significant negative effect) # fdr_threshold=-1: log10(FDR) < -1 means FDR < 0.1 essential = genes[(genes['X1'] < 0) & (genes['fdr_log10'] < -1)] essential = essential.sort_values('X1') print(f'Essential genes: {len(essential)}') print(essential.head(20)) # Enriched genes enriched = genes[(genes['X1'] > 0) & (genes['fdr_log10'] < -1)] enriched = enriched.sort_values('X1', ascending=False) print(f'Enriched genes: {len(enriched)}')
sgRNA Efficacy Analysis
import pandas as pd # Load sgRNA results guides = pd.read_csv('output_grna_JACKS_results.txt', sep='\t') # Efficacy scores range from 0 (ineffective) to 1 (highly effective) # X1 column contains efficacy estimates # Identify poor sgRNAs # efficacy<0.3: sgRNAs with low efficacy. Consider removal in future libraries. poor_guides = guides[guides['X1'] < 0.3] print(f'Low efficacy guides: {len(poor_guides)}') # Group by gene to assess library quality gene_efficacy = guides.groupby('Gene')['X1'].agg(['mean', 'std', 'count']) gene_efficacy = gene_efficacy.sort_values('mean') print(gene_efficacy.head(20))
Visualization
Gene Effect Plot
import matplotlib.pyplot as plt import numpy as np genes = pd.read_csv('output_gene_JACKS_results.txt', sep='\t') fig, ax = plt.subplots(figsize=(10, 8)) # Color by significance colors = ['red' if fdr < -1 else 'gray' for fdr in genes['fdr_log10']] ax.scatter(genes['X1'], -genes['fdr_log10'], c=colors, alpha=0.5, s=10) ax.axhline(1, linestyle='--', color='black', alpha=0.5) # FDR = 0.1 ax.axvline(0, linestyle='-', color='gray', alpha=0.3) ax.set_xlabel('JACKS Score (negative = essential)') ax.set_ylabel('-log10(FDR)') ax.set_title('JACKS Gene Essentiality') # Label top hits top = genes[genes['fdr_log10'] < -2].nsmallest(10, 'X1') for _, row in top.iterrows(): ax.annotate(row['gene'], (row['X1'], -row['fdr_log10'])) plt.savefig('jacks_volcano.png', dpi=150)
sgRNA Efficacy Distribution
import matplotlib.pyplot as plt guides = pd.read_csv('output_grna_JACKS_results.txt', sep='\t') plt.figure(figsize=(8, 5)) plt.hist(guides['X1'], bins=50, edgecolor='black') plt.axvline(0.5, color='red', linestyle='--', label='Efficacy = 0.5') plt.xlabel('sgRNA Efficacy') plt.ylabel('Count') plt.title('sgRNA Efficacy Distribution') plt.legend() plt.savefig('sgrna_efficacy.png', dpi=150)
Multi-Screen Analysis
JACKS strength is joint analysis across experiments.
# Define multiple experiments in replicate map # replicatemap.txt: # Sample Experiment Condition # Screen1_T1 Screen1 Treatment # Screen1_T2 Screen1 Treatment # Screen1_C1 Screen1 Control # Screen2_T1 Screen2 Treatment # Screen2_T2 Screen2 Treatment # Screen2_C1 Screen2 Control # JACKS will learn shared sgRNA efficacy across screens # while estimating screen-specific gene effects
Comparing JACKS vs MAGeCK
| Feature | JACKS | MAGeCK |
|---|---|---|
| sgRNA efficacy modeling | Yes | No |
| Multi-experiment joint analysis | Yes | Limited |
| Statistical framework | Bayesian | MLE/RRA |
| Speed | Slower | Faster |
| Best for | Multiple screens | Single screen |
Advanced Options
from jacks import infer # Run with custom parameters results = infer.run_inference( counts, guide_gene_map, treatment_samples, ctrl_samples, n_iterations=50000, # 50000: Publication quality. 10000 for exploration. burn_in=5000, # 5000: 10% of iterations. apply_w_hp=True, # Hierarchical prior on efficacy fixed_w=False, # Learn sgRNA efficacy (set True to fix at 1) w_alpha=0.5, # Prior shape for efficacy w_beta=0.5 # Prior rate for efficacy )
Integration with Other Tools
Compare with MAGeCK
import pandas as pd jacks = pd.read_csv('jacks_gene_results.txt', sep='\t') mageck = pd.read_csv('mageck.gene_summary.txt', sep='\t') # Merge results merged = pd.merge(jacks, mageck, left_on='gene', right_on='id') # Compare rankings from scipy.stats import spearmanr corr, pval = spearmanr(merged['X1'], merged['neg|score']) print(f'Spearman correlation: {corr:.3f} (p={pval:.2e})')
Use sgRNA Efficacy for Library Design
# Extract high-efficacy guides for future libraries guides = pd.read_csv('output_grna_JACKS_results.txt', sep='\t') # efficacy>0.7: High efficacy sgRNAs for optimized libraries. good_guides = guides[guides['X1'] > 0.7][['sgRNA', 'Gene', 'X1']] good_guides.to_csv('high_efficacy_guides.csv', index=False)
Related Skills
- mageck-analysis - Alternative screen analysis method
- hit-calling - Statistical hit identification
- screen-qc - Quality control before analysis
- batch-correction - Handle batch effects in multi-screen data