Topology¶

The topology is a reduced representation of the skeleton that keeps the connectivity pattern between donors but removes reducible “path length”.

Think of it as: “How are donors connected, ignoring how many atoms are in each linker (when that length is reducible)?”

Topology reduction

What the reduction does¶

Starting from the skeleton graph, cTopo repeatedly applies simple graph transformations:

Contract linear linkers
Degree-2 non-donor nodes are contracted away when they only represent a path segment.
Remove reducible bubbles
Some degree-2 patterns inside small cycles are simplified (triangle-like “bubbles”). This keeps the reduced graph stable in common ring situations.

Then:

All remaining non-donor nodes are turned into a dummy atom (Z = 0).
Bonds in the topology are normalized to single bonds (topology is about connectivity, not bond order).

The result is a small graph that captures:

branching vs linear arrangements,
connectivity between donors,
non-reducible “junction” structure.

…and deliberately discards:

exact linker lengths,
aromaticity/bond order,
most “organic scaffold” detail.

Computing topology¶

from ctopo import ligand_from_smiles, get_ligands_topology

lig = ligand_from_smiles("[NH2:1]CC[NH:2]CC[NH2:3]")
G_topo = get_ligands_topology(lig.G)

v = lig.visualize_topology()
print(v.smiles)

When topology is useful¶

Dataset summaries (“what patterns exist?”)
Finding near-duplicates at the “cage” level
Stratifying datasets for sampling (e.g. balanced by topology)

When you should NOT rely on topology alone¶

When donor identity or ring chemistry matters (use skeleton+bonds levels)
When stereochemistry or rigid geometry is central (topology is 2D graph-only)