Claude-skill-registry data-export-excel
Export analysis results, data tables, and formatted spreadsheets to Excel files using openpyxl. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-export-excel" ~/.claude/skills/majiayu000-claude-skill-registry-data-export-excel && rm -rf "$T"
manifest:
skills/data/data-export-excel/SKILL.mdsource content
Excel Data Export (Universal)
Overview
This skill enables you to export bioinformatics data, analysis results, and formatted tables to professional Excel spreadsheets. Unlike cloud-hosted solutions, this skill uses the openpyxl Python library and executes locally in your environment, making it compatible with ALL LLM providers including GPT, Gemini, Claude, DeepSeek, and Qwen.
When to Use This Skill
- Export AnnData observations (.obs) or variables (.var) to Excel
- Save DEG analysis results with formatting
- Create multi-sheet workbooks with different data types
- Generate formatted Excel reports with cell styling
- Export cluster annotations, cell type assignments, or quality control metrics
How to Use
Step 1: Import Required Libraries
import openpyxl from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment, Border, Side from openpyxl.utils.dataframe import dataframe_to_rows import pandas as pd import numpy as np
Step 2: Prepare Your Data
Convert your data to pandas DataFrame format:
# Example: Export AnnData observations df = adata.obs.copy() # Example: Export DEG results deg_df = pd.DataFrame({ 'gene': gene_names, 'log2FC': log2_fold_changes, 'pvalue': pvalues, 'qvalue': qvalues }) # Example: Export cluster statistics cluster_stats = adata.obs.groupby('clusters').size().reset_index(name='count')
Step 3: Create Excel Workbook
# Create new workbook wb = Workbook() ws = wb.active ws.title = "Sheet Name" # Write DataFrame to worksheet for r in dataframe_to_rows(df, index=False, header=True): ws.append(r)
Step 4: Add Formatting (Optional)
# Style header row header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid") header_font = Font(bold=True, color="FFFFFF") for cell in ws[1]: cell.fill = header_fill cell.font = header_font cell.alignment = Alignment(horizontal='center') # Auto-adjust column widths for column in ws.columns: max_length = 0 column_letter = column[0].column_letter for cell in column: try: if len(str(cell.value)) > max_length: max_length = len(str(cell.value)) except: pass adjusted_width = min(max_length + 2, 50) ws.column_dimensions[column_letter].width = adjusted_width # Add borders thin_border = Border( left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin') ) for row in ws.iter_rows(min_row=1, max_row=ws.max_row, min_col=1, max_col=ws.max_column): for cell in row: cell.border = thin_border
Step 5: Save the Workbook
# Save to file output_path = "analysis_results.xlsx" wb.save(output_path) print(f"✅ Excel file saved to: {output_path}")
Multi-Sheet Workbooks
Create workbooks with multiple sheets for different data types:
wb = Workbook() # Sheet 1: Cell metadata ws1 = wb.active ws1.title = "Cell Metadata" for r in dataframe_to_rows(adata.obs, index=True, header=True): ws1.append(r) # Sheet 2: Gene metadata ws2 = wb.create_sheet("Gene Metadata") for r in dataframe_to_rows(adata.var, index=True, header=True): ws2.append(r) # Sheet 3: DEG results ws3 = wb.create_sheet("DEG Results") for r in dataframe_to_rows(deg_df, index=False, header=True): ws3.append(r) wb.save("multi_sheet_analysis.xlsx")
Best Practices
- Column Headers: Always include column headers in the first row
- Data Types: Convert numpy arrays to lists before writing
- Large Datasets: For datasets >100K rows, consider CSV export instead
- File Paths: Use absolute paths or ensure output directory exists
- Formatting: Apply formatting sparingly to reduce file size
- Index: Decide whether to include DataFrame index (set
inindex=True/False
)dataframe_to_rows
Common Use Cases
Export Quality Control Metrics
qc_metrics = adata.obs[['n_genes', 'n_counts', 'percent_mito', 'clusters']].copy() wb = Workbook() ws = wb.active ws.title = "QC Metrics" for r in dataframe_to_rows(qc_metrics, index=False, header=True): ws.append(r) # Highlight cells with high mitochondrial content for row in range(2, ws.max_row + 1): if ws.cell(row, 3).value > 0.2: # percent_mito > 20% ws.cell(row, 3).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid") wb.save("qc_metrics.xlsx")
Export Marker Genes by Cluster
# Assuming you have marker genes for each cluster marker_dict = { 'Cluster_0': ['CD3D', 'CD3E', 'CD8A'], 'Cluster_1': ['CD79A', 'MS4A1', 'CD19'], 'Cluster_2': ['LYZ', 'S100A9', 'CD14'] } wb = Workbook() for cluster_name, genes in marker_dict.items(): ws = wb.create_sheet(cluster_name) ws.append(['Marker Gene']) for gene in genes: ws.append([gene]) # Remove default sheet if 'Sheet' in wb.sheetnames: wb.remove(wb['Sheet']) wb.save("marker_genes.xlsx")
Export DEG Analysis with Conditional Formatting
wb = Workbook() ws = wb.active ws.title = "DEG Analysis" # Write DEG results for r in dataframe_to_rows(deg_df, index=False, header=True): ws.append(r) # Color code by fold change for row in range(2, ws.max_row + 1): log2fc = ws.cell(row, 2).value # Assuming log2FC in column 2 if log2fc > 1: # Upregulated ws.cell(row, 2).fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid") elif log2fc < -1: # Downregulated ws.cell(row, 2).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid") wb.save("deg_results_formatted.xlsx")
Troubleshooting
Issue: "openpyxl not found"
Solution: Install the library:
import subprocess subprocess.check_call(['pip', 'install', 'openpyxl'])
Issue: "Invalid data type for cell"
Solution: Convert numpy/pandas types to native Python types:
# Convert numpy types df = df.astype(object).where(pd.notnull(df), None) # Or convert specific columns df['column_name'] = df['column_name'].astype(str)
Issue: "Memory error with large datasets"
Solution: Export in chunks or use CSV format instead:
# Fallback to CSV for large data df.to_csv('large_dataset.csv', index=False) print("Dataset too large for Excel, saved as CSV instead")
Technical Notes
- Library: Uses
(pure Python, no external dependencies)openpyxl - Execution: Runs locally in the agent's sandbox
- Compatibility: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
- File Limits: Excel has a 1,048,576 row limit (use CSV for larger datasets)
- Performance: Writing ~10K rows takes 1-2 seconds; 100K rows takes 10-20 seconds
References
- openpyxl documentation: https://openpyxl.readthedocs.io/
- pandas DataFrame export: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html