Skip to content

Topology

The topology is a reduced representation of the skeleton that keeps the connectivity pattern between donors but removes reducible “path length”.

Think of it as: “How are donors connected, ignoring how many atoms are in each linker (when that length is reducible)?”

Topology reduction

What the reduction does

Starting from the skeleton graph, cTopo repeatedly applies simple graph transformations:

  • Contract linear linkers
    Degree-2 non-donor nodes are contracted away when they only represent a path segment.

  • Remove reducible bubbles
    Some degree-2 patterns inside small cycles are simplified (triangle-like “bubbles”). This keeps the reduced graph stable in common ring situations.

Then:

  • All remaining non-donor nodes are turned into a dummy atom (Z = 0).
  • Bonds in the topology are normalized to single bonds (topology is about connectivity, not bond order).

The result is a small graph that captures:

  • branching vs linear arrangements,
  • connectivity between donors,
  • non-reducible “junction” structure.

…and deliberately discards:

  • exact linker lengths,
  • aromaticity/bond order,
  • most “organic scaffold” detail.

Computing topology

from ctopo import ligand_from_smiles, get_ligands_topology

lig = ligand_from_smiles("[NH2:1]CC[NH:2]CC[NH2:3]")
G_topo = get_ligands_topology(lig.G)

v = lig.visualize_topology()
print(v.smiles)

When topology is useful

  • Dataset summaries (“what patterns exist?”)
  • Finding near-duplicates at the “cage” level
  • Stratifying datasets for sampling (e.g. balanced by topology)

When you should NOT rely on topology alone

  • When donor identity or ring chemistry matters (use skeleton+bonds levels)
  • When stereochemistry or rigid geometry is central (topology is 2D graph-only)