Tutorial 00 — Overview¶
This notebook is a template for the rest of the tutorial series.
It introduces the three core workflows:
- Ligand abstraction: donor atoms → skeleton → topology
- Dataset browsing: build a denticity/topology/skeleton hierarchy
- Role-aware descriptors: fingerprints and similarity
The notebook is safe to render inside MkDocs because it avoids heavy computation. If you want images/outputs embedded in the docs, run the notebook once and commit it with outputs.
0) Setup¶
Make sure you installed the package in editable mode:
pip install -e .
and that RDKit is available in the environment.
In [1]:
Copied!
# Imports (safe pattern: fail with a clear message)
try:
from ctopo import ligand_from_smiles
except Exception as e:
raise RuntimeError(
'Failed to import ctopo. Did you run `pip install -e .` in this environment?'
) from e
# Imports (safe pattern: fail with a clear message)
try:
from ctopo import ligand_from_smiles
except Exception as e:
raise RuntimeError(
'Failed to import ctopo. Did you run `pip install -e .` in this environment?'
) from e
1) A ligand: donors → skeleton → topology¶
In tutorial SMILES, donor atoms are marked using atom-map numbers: :1, :2, ...
In [2]:
Copied!
from IPython.display import display, SVG
lig = ligand_from_smiles('[NH2:1]CC[NH:2]CC[NH2:3]')
v_lig = lig.visualize_ligand()
v_skel = lig.visualize_skeleton()
v_topo = lig.visualize_topology()
display(SVG(v_lig.svg))
print(f'Ligand smiles: {v_lig.smiles}')
print(f'Denticity: {lig.denticity}; donor atoms: {sorted(lig.donor_atoms)}\n')
display(SVG(v_skel.svg))
print(f'Skeleton smiles: {v_skel.smiles}\n')
display(SVG(v_topo.svg))
print(f'Topology smiles: {v_topo.smiles}\n')
from IPython.display import display, SVG
lig = ligand_from_smiles('[NH2:1]CC[NH:2]CC[NH2:3]')
v_lig = lig.visualize_ligand()
v_skel = lig.visualize_skeleton()
v_topo = lig.visualize_topology()
display(SVG(v_lig.svg))
print(f'Ligand smiles: {v_lig.smiles}')
print(f'Denticity: {lig.denticity}; donor atoms: {sorted(lig.donor_atoms)}\n')
display(SVG(v_skel.svg))
print(f'Skeleton smiles: {v_skel.smiles}\n')
display(SVG(v_topo.svg))
print(f'Topology smiles: {v_topo.smiles}\n')
Ligand smiles: C(C[NH:1]CC[NH2:1])[NH2:1] Denticity: 3; donor atoms: [0, 3, 6]
Skeleton smiles: C(C[N:1]CC[N:1])[N:1]
Topology smiles: [*:1][*:1][*:1]
In [3]:
Copied!
from IPython.display import HTML
from ctopo.trees import build_ligand_tree, tree_to_html
examples = {
# bidentate
'en': '[NH2:1]CC[NH2:2]',
'propanediamine': '[NH2:1]CCC[NH2:2]',
'bipy': '[n:1]1ccccc1c2cccc[n:2]2',
'oxalate': '[O-:1]C(=O)C(=O)[O-:2]',
'acac': '[O-:1]C(=O)CC(=O)[O-:2]',
# tridentate linear
'dien': '[NH2:1]CC[NH:2]CC[NH2:3]',
'dien_long': '[NH2:1]CCC[NH:2]CCC[NH2:3]',
'PNP': 'C[P:1](C)CC[NH:2]CC[P:3](C)C',
'PNS': 'C[P:1](C)CC[NH:2]CC[S:3]C',
'PNpyP': 'C[P:1](C)Cc(ccc1)[n:2]c1C[P:3](C)C',
'PNpyS': 'C[P:1](C)Cc(ccc1)[n:2]c1C[S:3]C',
# tripods
'tripod_N': 'N(CC[NH2:1])(CC[NH2:2])CC[NH2:3]',
'tripod_C': 'C(CC[NH2:1])(CC[NH2:2])CC[NH2:3]',
'tripod_B': '[BH-](CC[NH2:1])(CC[NH2:2])CC[NH2:3]',
}
ligands = [ligand_from_smiles(smi) for smi in examples.values()]
names = list(examples.keys())
levels = ('denticity', ('topo',), ('skeleton', 'DA'), 'ligand')
tree = build_ligand_tree(ligands, ligand_ids=names, levels=levels)
html = tree_to_html(tree)
HTML(html)
from IPython.display import HTML
from ctopo.trees import build_ligand_tree, tree_to_html
examples = {
# bidentate
'en': '[NH2:1]CC[NH2:2]',
'propanediamine': '[NH2:1]CCC[NH2:2]',
'bipy': '[n:1]1ccccc1c2cccc[n:2]2',
'oxalate': '[O-:1]C(=O)C(=O)[O-:2]',
'acac': '[O-:1]C(=O)CC(=O)[O-:2]',
# tridentate linear
'dien': '[NH2:1]CC[NH:2]CC[NH2:3]',
'dien_long': '[NH2:1]CCC[NH:2]CCC[NH2:3]',
'PNP': 'C[P:1](C)CC[NH:2]CC[P:3](C)C',
'PNS': 'C[P:1](C)CC[NH:2]CC[S:3]C',
'PNpyP': 'C[P:1](C)Cc(ccc1)[n:2]c1C[P:3](C)C',
'PNpyS': 'C[P:1](C)Cc(ccc1)[n:2]c1C[S:3]C',
# tripods
'tripod_N': 'N(CC[NH2:1])(CC[NH2:2])CC[NH2:3]',
'tripod_C': 'C(CC[NH2:1])(CC[NH2:2])CC[NH2:3]',
'tripod_B': '[BH-](CC[NH2:1])(CC[NH2:2])CC[NH2:3]',
}
ligands = [ligand_from_smiles(smi) for smi in examples.values()]
names = list(examples.keys())
levels = ('denticity', ('topo',), ('skeleton', 'DA'), 'ligand')
tree = build_ligand_tree(ligands, ligand_ids=names, levels=levels)
html = tree_to_html(tree)
HTML(html)
Out[3]:
root
root
DA=2 [1/2]
DA=2 [1/2]
topo [1/1]
topo [1/1]
skeleton+DA [1/4]
skeleton+DA [1/4]
en [1/2]
bipy [2/2]
skeleton+DA [2/4]
skeleton+DA [2/4]
oxalate [1/1]
skeleton+DA [3/4]
skeleton+DA [3/4]
propanediamine [1/1]
skeleton+DA [4/4]
skeleton+DA [4/4]
acac [1/1]
DA=3 [2/2]
DA=3 [2/2]
topo [1/2]
topo [1/2]
skeleton+DA [1/4]
skeleton+DA [1/4]
dien [1/1]
skeleton+DA [2/4]
skeleton+DA [2/4]
PNP [1/2]
PNpyP [2/2]
skeleton+DA [3/4]
skeleton+DA [3/4]
PNS [1/2]
PNpyS [2/2]
skeleton+DA [4/4]
skeleton+DA [4/4]
dien_long [1/1]
topo [2/2]
topo [2/2]
skeleton+DA [1/1]
skeleton+DA [1/1]
tripod_N [1/3]
tripod_C [2/3]
tripod_B [3/3]
3) Role-aware fingerprints (teaser)¶
Later tutorials will go deeper into:
- skeleton-only fingerprints,
- donor-environment fingerprints,
- similarity matrices and clustering.
In [4]:
Copied!
from ctopo.descriptors import MorganSpec, ALLOWED_PROPERTIES, make_fingerprinter
from ctopo.distances import tanimoto_similarity_bits
print(ALLOWED_PROPERTIES)
from ctopo.descriptors import MorganSpec, ALLOWED_PROPERTIES, make_fingerprinter
from ctopo.distances import tanimoto_similarity_bits
print(ALLOWED_PROPERTIES)
('atom_type', 'Z', 'degree', 'heavy_degree', 'num_pi_electrons', 'num_hs', 'charge', 'in_ring', 'aromatic')
In [5]:
Copied!
l1 = ligand_from_smiles('C[P:1](C)CC[P:1](C)C')
l2 = ligand_from_smiles('c1ccccc1[P:1](c1ccccc1)CC[P:1](c1ccccc1)c1ccccc1')
fp_full = make_fingerprinter(
kind='morgan',
spec=MorganSpec(radius=2, use_chirality=False),
atomic_properties=['atom_type', 'Z', 'degree', 'num_pi_electrons'],
graph_view='original',
bond_mode='all',
output='bits',
fp_size=1024,
)
fp_skel = make_fingerprinter(
kind='morgan',
spec=MorganSpec(radius=2, use_chirality=False),
atomic_properties=['atom_type', 'Z', 'degree', 'num_pi_electrons'],
graph_view='skeleton',
bond_mode='all',
output='bits',
fp_size=1024,
)
f1_full, f2_full = fp_full(l1), fp_full(l2)
sim_full = tanimoto_similarity_bits(f1_full, f2_full)
print(f'Ligands\' similarity based on all atoms: {sim_full:.3f}')
f1_skel, f2_skel = fp_skel(l1), fp_skel(l2)
sim_skel = tanimoto_similarity_bits(f1_skel, f2_skel)
print(f'Ligands\' similarity based on skeletons: {sim_skel:.3f}')
l1 = ligand_from_smiles('C[P:1](C)CC[P:1](C)C')
l2 = ligand_from_smiles('c1ccccc1[P:1](c1ccccc1)CC[P:1](c1ccccc1)c1ccccc1')
fp_full = make_fingerprinter(
kind='morgan',
spec=MorganSpec(radius=2, use_chirality=False),
atomic_properties=['atom_type', 'Z', 'degree', 'num_pi_electrons'],
graph_view='original',
bond_mode='all',
output='bits',
fp_size=1024,
)
fp_skel = make_fingerprinter(
kind='morgan',
spec=MorganSpec(radius=2, use_chirality=False),
atomic_properties=['atom_type', 'Z', 'degree', 'num_pi_electrons'],
graph_view='skeleton',
bond_mode='all',
output='bits',
fp_size=1024,
)
f1_full, f2_full = fp_full(l1), fp_full(l2)
sim_full = tanimoto_similarity_bits(f1_full, f2_full)
print(f'Ligands\' similarity based on all atoms: {sim_full:.3f}')
f1_skel, f2_skel = fp_skel(l1), fp_skel(l2)
sim_skel = tanimoto_similarity_bits(f1_skel, f2_skel)
print(f'Ligands\' similarity based on skeletons: {sim_skel:.3f}')
Ligands' similarity based on all atoms: 0.176 Ligands' similarity based on skeletons: 1.000