install
source · Clone the upstream repo
git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Drug_Discovery/Chemoinformatics/molecular-descriptors" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-molecular-descript && rm -rf "$T"
manifest:
Skills/Drug_Discovery/Chemoinformatics/molecular-descriptors/SKILL.mdsource content
<!--
# COPYRIGHT NOTICE
# This file is part of the "Universal Biomedical Skills" project.
# Copyright (c) 2026 MD BABU MIA, PhD <md.babu.mia@mssm.edu>
# All Rights Reserved.
#
# This code is proprietary and confidential.
# Unauthorized copying of this file, via any medium is strictly prohibited.
#
# Provenance: Authenticated by MD BABU MIA
-->
name: bio-molecular-descriptors description: Calculates molecular descriptors and fingerprints using RDKit. Computes Morgan fingerprints (ECFP), MACCS keys, Lipinski properties, QED drug-likeness, TPSA, and 3D conformer descriptors. Use when featurizing molecules for machine learning or filtering by drug-likeness criteria. tool_type: python primary_tool: RDKit measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:
- read_file
- run_shell_command
Molecular Descriptors
Calculate fingerprints and physicochemical properties for molecules.
Morgan Fingerprints (ECFP)
from rdkit import Chem from rdkit.Chem import AllChem mol = Chem.MolFromSmiles('CCO') # ECFP4 = radius 2 (diameter = 2 * radius + 2 = 6) # ECFP6 = radius 3 (diameter = 8) ecfp4 = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048) ecfp6 = AllChem.GetMorganFingerprintAsBitVect(mol, radius=3, nBits=2048) # With stereochemistry information ecfp4_chiral = AllChem.GetMorganFingerprintAsBitVect( mol, radius=2, nBits=2048, useChirality=True ) # As count vector (for some ML methods) ecfp4_counts = AllChem.GetMorganFingerprint(mol, radius=2) # Convert to numpy array import numpy as np fp_array = np.array(ecfp4)
MACCS Keys
from rdkit.Chem import MACCSkeys maccs = MACCSkeys.GenMACCSKeys(mol) # 167 bits # As numpy array maccs_array = np.array(maccs)
Lipinski Properties
from rdkit import Chem from rdkit.Chem import Descriptors, Lipinski mol = Chem.MolFromSmiles('CCO') # Lipinski Rule of 5 properties mw = Descriptors.MolWt(mol) # Molecular weight (<=500) logp = Descriptors.MolLogP(mol) # LogP (<=5) hbd = Lipinski.NumHDonors(mol) # H-bond donors (<=5) hba = Lipinski.NumHAcceptors(mol) # H-bond acceptors (<=10) # Check Lipinski compliance def passes_lipinski(mol): '''Check Lipinski Rule of 5 compliance.''' return ( Descriptors.MolWt(mol) <= 500 and Descriptors.MolLogP(mol) <= 5 and Lipinski.NumHDonors(mol) <= 5 and Lipinski.NumHAcceptors(mol) <= 10 ) # Additional properties tpsa = Descriptors.TPSA(mol) # Topological polar surface area rotatable = Lipinski.NumRotatableBonds(mol)
QED Drug-Likeness
from rdkit.Chem.QED import qed # QED score (0-1 scale, >0.5 generally drug-like) qed_score = qed(mol) # Weighted QED (default) # Considers MW, LogP, TPSA, HBD, HBA, PSA, RotBonds, Aromatic rings
Complete Descriptor Set
from rdkit.Chem import Descriptors from rdkit.ML.Descriptors import MoleculeDescriptors # Get all available descriptor names descriptor_names = [d[0] for d in Descriptors.descList] # Create descriptor calculator calculator = MoleculeDescriptors.MolecularDescriptorCalculator(descriptor_names) # Calculate for a molecule descriptors = calculator.CalcDescriptors(mol) # As DataFrame import pandas as pd desc_df = pd.DataFrame([descriptors], columns=descriptor_names)
3D Conformer Descriptors
from rdkit import Chem from rdkit.Chem import AllChem, Descriptors3D mol = Chem.MolFromSmiles('CCO') mol = Chem.AddHs(mol) # Generate 3D conformer (ETKDGv3 is now default) AllChem.EmbedMolecule(mol, AllChem.ETKDGv3()) # Optimize geometry AllChem.MMFFOptimizeMolecule(mol) # 3D descriptors (require conformer) # Asphericity: 0 = sphere, 1 = rod asphericity = Descriptors3D.Asphericity(mol) # Eccentricity eccentricity = Descriptors3D.Eccentricity(mol) # Inertial shape factor isf = Descriptors3D.InertialShapeFactor(mol) # Radius of gyration rog = Descriptors3D.RadiusOfGyration(mol)
Batch Descriptor Calculation
def calculate_descriptors_batch(molecules, descriptor_names=None): '''Calculate descriptors for multiple molecules.''' if descriptor_names is None: descriptor_names = ['MolWt', 'MolLogP', 'TPSA', 'NumHDonors', 'NumHAcceptors', 'NumRotatableBonds', 'qed'] results = [] for mol in molecules: if mol is None: results.append({d: None for d in descriptor_names}) continue row = {} for name in descriptor_names: if name == 'qed': from rdkit.Chem.QED import qed row[name] = qed(mol) else: row[name] = getattr(Descriptors, name)(mol) results.append(row) return pd.DataFrame(results)
Related Skills
- molecular-io - Load molecules for descriptor calculation
- similarity-searching - Use fingerprints for similarity
- admet-prediction - Predict ADMET from descriptors
- machine-learning/biomarker-discovery - ML on molecular features