Tutorial 02 — Skeleton and topology¶
This notebook explains how cTopo turns a ligand into:
- a skeleton graph (donors + connecting atoms)
- a topology graph (a reduced skeleton that keeps only the connection pattern)
Prerequisites¶
Same as Tutorial 01: RDKit + pip install -e ..
# Core imports (RDKit + cTopo)
try:
from rdkit import Chem
from rdkit.Chem import Draw
except ImportError as e:
raise ImportError(
'RDKit is required for these tutorials. '
'If you used conda-forge: `conda install -c conda-forge rdkit`.'
) from e
from IPython.display import SVG, display, Markdown
import networkx as nx
import ctopo
from ctopo import AtomType, ligand_from_smiles, ligand_from_mol
def show(v):
"""Display a cTopo visualization object (with .svg)."""
display(SVG(v.svg))
print(f'ctopo version: {ctopo.__version__}')
ctopo version: 0.1.0
1. Two ligands, same topology¶
Topology is intended to ignore “how long the linker is” and keep only the connection pattern.
Below are two linear tridentate amines with different chain lengths. Their skeletons differ, but their topologies match.
lig_lin_1 = ligand_from_smiles("[NH2:1]CC[NH:2]CC[NH2:3]") # diethylenetriamine (dien-like)
lig_lin_2 = ligand_from_smiles("[NH2:1]CCC[NH:2]CCC[NH2:3]") # longer dien-like
for L, name in [(lig_lin_1, 'lin_1'), (lig_lin_2, 'lin_2')]:
print(f'Skeleton SMILES ({name}): {L.visualize_skeleton().smiles}')
show(L.visualize_skeleton())
print(f'Topology SMILES ({name}): {L.visualize_topology().smiles}')
show(L.visualize_topology())
print('\n')
Skeleton SMILES (lin_1): C(C[N:1]CC[N:1])[N:1]
Topology SMILES (lin_1): [*:1][*:1][*:1]
Skeleton SMILES (lin_2): C(C[N:1])C[N:1]CCC[N:1]
Topology SMILES (lin_2): [*:1][*:1][*:1]
2. A different tridentate topology (tripod)¶
Now compare to a tripod-like tridentate ligand. Even though all are tridentate, the connection pattern is different.
lig_tripod = ligand_from_smiles('N(CC[NH2:1])(CC[NH2:2])CC[NH2:3]')
print(f'Topology SMILES (tripod): {lig_tripod.visualize_topology().smiles}')
show(lig_tripod.visualize_skeleton())
show(lig_tripod.visualize_topology())
Topology SMILES (tripod): *([*:1])([*:1])[*:1]
3. What gets reduced?¶
Topology reduction in cTopo operates on the skeleton graph and repeatedly:
- contracts degree-2 non-donor nodes on simple paths,
- removes “bubble” nodes (degree-2 nodes in triangles),
- normalizes bond types to single bonds.
The result is a small graph suitable as a dataset grouping key.
You can always keep the original skeleton separately if you need more detail.
4. Variability of skeleton and topology depictions¶
By default, cTopo reduces a ligand to two graph abstractions:
- Skeleton: donors + the atoms/bonds on shortest paths between donors.
- Topology: a further reduced skeleton where most non-essential atoms are contracted to a minimal “shape”.
For exploration and reporting it’s often useful to control how much chemical information is retained in the visual (and in the exported SMILES). Both visualize_topology() and visualize_skeleton() accept arguments that toggle this.
Important: these switches affect the representation you export/plot (and the SMILES you store as a key), but they do not change the underlying donor detection / skeleton extraction logic.
Topology: keep donor atoms as real atoms or as generic donors¶
In the topology, most intermediate atoms become “dummy” nodes (because the goal is shape, not chemistry). However, donor atoms can be preserved as the original elements (P/N/S/…) instead of being shown as generic donor markers.
lig_pns = ligand_from_smiles('C[P:1](C)Cc(ccc1)[n:1]c1C[S:1]C')
# default: reduced topology, donor atoms preserved as original atoms
topo = lig_pns.visualize_topology()
print(f"Topology SMILES (donors as original atoms): {topo.smiles}")
show(topo)
# donors=True: donors shown explicitly as 'DA' nodes (useful for comparing across donor elements)
topo_da = lig_pns.visualize_topology(donors=True)
print(f"\nTopology SMILES (donors as DA): {topo_da.smiles}")
show(topo_da)
Topology SMILES (donors as original atoms): [*:1][*:1][*:1]
Topology SMILES (donors as DA): [P:1][N:1][S:1]
When to use what:
- donors=False (default): you want topology but still distinguish P vs N vs S donors.
- donors=True: you want to group ligands by the connectivity pattern only, ignoring donor element identity.
Skeleton: toggle donor labels, skeleton atoms, and skeleton bond types¶
Skeleton depictions can retain different levels of chemical detail:
- donors: whether donor atoms are explicitly marked as donors (e.g. “DA” labels)
- skeleton: whether skeleton atoms are shown as original atoms (instead of being reduced to generic nodes)
- bonds: whether original bond types are kept inside the skeleton (single/double/aromatic, etc.)
# Minimal: skeleton reduced to connectivity only (no explicit donors, no atom identities, no bond types)
skel_min = lig_pns.visualize_skeleton(donors=False, skeleton=False, bonds=False)
print(f"Skeleton SMILES (minimal): {skel_min.smiles}")
show(skel_min)
# Maximal: show donors + keep skeleton atom identities + keep skeleton bond types
skel_full = lig_pns.visualize_skeleton(donors=True, skeleton=True, bonds=True)
print(f"\nSkeleton SMILES (donors + atoms + bonds): {skel_full.smiles}")
show(skel_full)
Skeleton SMILES (minimal): *(*[*:1]**[*:1])[*:1]
Skeleton SMILES (donors + atoms + bonds): C(c[n:1]cC[S:1])[P:1]
Practical guideline:
- Use minimal skeletons/topologies to build coarse trees (dataset overview).
- Use full skeletons when you want chemically interpretable representatives (e.g. “this topology family usually contains aromatic linkers”).
Extract topology/skeleton keys for grouping¶
A practical way to group ligands is to use the SMILES of the skeleton or topology visualization output as keys.
def keys(lig):
return {
"denticity": lig.denticity,
"topology": lig.visualize_topology(donors=True).smiles,
"skeleton": lig.visualize_skeleton(donors=True, bonds=True, skeleton=False).smiles,
"ligand": lig.visualize_ligand().smiles,
}
for name, lig in [("lin_1", lig_lin_1), ("lin_2", lig_lin_2), ("tripod", lig_tripod)]:
print(name, keys(lig))
lin_1 {'denticity': 3, 'topology': '[N:1][N:1][N:1]', 'skeleton': '*(*[N:1]**[N:1])[N:1]', 'ligand': 'C(C[NH:1]CC[NH2:1])[NH2:1]'}
lin_2 {'denticity': 3, 'topology': '[N:1][N:1][N:1]', 'skeleton': '*(*[N:1])*[N:1]***[N:1]', 'ligand': 'C(C[NH2:1])C[NH:1]CCC[NH2:1]'}
tripod {'denticity': 3, 'topology': '*([N:1])([N:1])[N:1]', 'skeleton': '*(*[N:1])*(**[N:1])**[N:1]', 'ligand': 'C(C[NH2:1])N(CC[NH2:1])CC[NH2:1]'}
Takeaways¶
- Skeleton keeps the real connecting subgraph between donors.
- Topology compresses skeleton to the minimal connection pattern.
- Topology is a natural key for a chemical-space census.
Next: Tutorial 03 builds a small dataset tree (denticity → topology → skeleton → ligand) and exports an HTML report.