Tutorial 05 — Complexes and ligand extraction¶
cTopo can represent complexes when coordination is encoded consistently via dative bonds:
They can be usefule in two practical workflows:
- Extracting ligands from a complex for the further processing
- Computing fingerprints for complexes since basic fingerprint functionality supports any chemical graphs
Prerequisites¶
RDKit + cTopo installed.
Please note: cTopo assumes the convention donor → metal for dative bonds. If your dataset encodes coordination differently, you’ll need a conversion step.
try:
from rdkit import Chem
except ImportError as e:
raise ImportError("RDKit is required for these tutorials.") from e
from ctopo import complex_from_smiles, complex_from_mol
from ctopo.fragments import ligands_from_complex
1. Parsing a simple complex from SMILES¶
For complex_from_smiles, donor atoms must be marked via atom-map numbers, and each mapped donor must have a dative bond to a metal.
Below is a minimal example with two ammonia ligands bound to Cu(II). (This is mainly to illustrate the encoding.)
# NOTE: RDKit support for arrow/dative SMILES depends on the RDKit build.
# If parsing fails in your environment, skip to section 2 (programmatic construction).
smiles = '[NH3:1]->[Cu+2]<-[NH3:2]'
try:
cx = complex_from_smiles(smiles)
print(f'metal_atoms: {sorted(cx.metal_atoms)}')
print(f'donor_atoms: {sorted(cx.donor_atoms)}')
except Exception as e:
print(f'Could not parse complex SMILES in this environment: {e}')
metal_atoms: [1] donor_atoms: [0, 2]
2. Building a complex programmatically (reliable for tutorials)¶
For multidentate ligands, programmatic construction is often simpler than writing a complex SMILES by hand.
We will:
- start with a ligand SMILES where donors are mapped
- add a metal atom
- add dative bonds from each donor to the metal
- call
complex_from_molfor validation
# Build a bidentate ethylenediamine ligand, donors marked by map numbers
lig_mol = Chem.MolFromSmiles('[NH2:1]CC[NH2:2].[Cl-:3].[Cl-:4]')
if lig_mol is None:
raise ValueError('RDKit could not parse the ligand SMILES')
donors = [a.GetIdx() for a in lig_mol.GetAtoms() if a.GetAtomMapNum() != 0]
for a in lig_mol.GetAtoms():
a.SetAtomMapNum(0)
from rdkit.Chem.rdchem import BondType
rwm = Chem.RWMol(lig_mol)
# Add Pt(II)
pt = Chem.Atom(78)
pt.SetFormalCharge(2)
m_idx = rwm.AddAtom(pt)
# Add donor -> metal dative bonds
for d in donors:
rwm.AddBond(int(d), int(m_idx), BondType.DATIVE)
cx_mol = rwm.GetMol()
# We intentionally avoid full sanitization because coordination compounds can violate typical valence rules.
# cTopo performs its own validation of dative bonds.
cx2 = complex_from_mol(cx_mol, metal_atoms=[m_idx])
print(f'metal_atoms: {sorted(cx2.metal_atoms)}')
print(f'donor_atoms: {sorted(cx2.donor_atoms)}')
metal_atoms: [6] donor_atoms: [0, 3, 4, 5]
3. Extract ligands from a complex¶
ligands_from_complex removes metal atoms, fragments the remaining structure, and returns unique ligands with counts.
This is meant for workflows like CSD complexes => unique multidentate ligands.
lig_counts = ligands_from_complex(cx2)
for lc in lig_counts:
print("count:", lc.count, "| smiles:", lc.smiles, "| denticity:", lc.ligand.denticity)
count: 1 | smiles: C(C[NH2:1])[NH2:1] | denticity: 2 count: 2 | smiles: [Cl-:1] | denticity: 1
4. Caveats (important for real datasets)¶
- Bridging ligands: removing metals may split a bridging ligand into multiple fragments.
- Missing/incorrect dative bonds: cTopo will raise validation errors rather than silently guessing.
- Multiple metals: supported, but be explicit about conventions and sanity-check results.
For CSD-like sources, you’ll usually need a preprocessing step to consistently assign dative bonds and donor mapping.
Takeaways¶
- Complex support is intentionally strict: dative bonds must be donor → metal.
- Ligand extraction helps build ligand datasets from complex datasets.
At this point you have the full story arc: abstraction → topology → dataset hierarchy → fingerprints → complexes.