Skip to content

Trees and fragments

Trees

ctopo.trees.build

Hierarchy builder for ligand datasets.

This module builds a hierarchical representation of a ligand dataset as a directed acyclic graph (a tree in the typical configuration), where internal nodes correspond to progressively more detailed structural abstractions and leaves correspond to individual ligands (or unique ligands if collapse_leaves=True).

The intended high-level hierarchy is:

denticity -> topology -> skeleton -> ligand

Where: - denticity is Ligand.denticity (number of donor atoms), - topology is the reduced donor-linker topology graph (computed by get_ligands_topology), - skeleton is the ligand skeleton subgraph (computed by get_ligands_skeleton), - ligand leaves represent the original ligand depiction/SMILES.

Levels are configured using LevelSpec objects (or short tuple forms such as ('skeleton', 'da', 'bonds')). Each level yields a grouping key (SMILES) and, for non-denticity nodes, a thumbnail SVG depiction.

Monotonic detail constraint

The level sequence must be hierarchical not only in kind (denticity -> topo -> skeleton -> ligand), but also in the information content ("flags") between adjacent levels.

Example (valid): ('skeleton', 'da') -> ('skeleton', 'da', 'bonds')

Example (invalid, loses information): ('skeleton', 'da') -> ('skeleton', 'bonds')

This constraint is validated by validate_levels.

Performance notes

Generating SVG depictions can be expensive. The builder avoids redundant work by: - computing SMILES keys per ligand per level (cheap), - generating SVG only once per unique (level, SMILES) node, - optionally disabling leaf SVG generation (include_leaf_svg=False).

Output graph format

The returned object is an nx.DiGraph with node attributes suitable for downstream rendering. At minimum, nodes store: - kind: 'root' | 'denticity' | 'topo' | 'skeleton' | 'ligand' - level: a tuple identifier of the level - label: short label - smiles: grouping key (None for root/denticity) - svg: thumbnail (None for root/denticity and optionally leaves) - leaf_count: number of leaf ligands under the node

The graph is expected to be a tree for tree_to_html: each node (except root) has exactly one parent.

LevelId = Union[str, Sequence[str]] module-attribute

_KIND_RANK = {'denticity': 0, 'topo': 1, 'topology': 1, 'skeleton': 2, 'ligand': 3} module-attribute

__all__ = ['LevelSpec', 'level_spec', 'validate_levels', 'build_ligand_tree'] module-attribute

LevelSpec(kind, flags=frozenset()) dataclass

A single hierarchy level specification.

A LevelSpec defines how ligands are grouped at a particular level of the hierarchy.

Parameters:

Name Type Description Default
kind str

One of: 'denticity', 'topo' (or 'topology'), 'skeleton', 'ligand'.

required
flags frozenset[str]

A set of optional modifiers affecting how the grouping key and depiction are produced.

frozenset()

Supported flags (case-insensitive): 'da': Preserve original donor atom elements in depictions/SMILES where applicable (otherwise donors are shown as dummy atoms labelled 'DA'). 'bonds' (skeleton only): Preserve original bond orders in skeleton depictions/SMILES (otherwise all single). 'skeleton' (skeleton only): Preserve original elements for skeleton atoms (otherwise dummy atoms).

Notes

validate_levels enforces that adjacent levels are monotone in the information they retain, so that a later level never 'forgets' a previously requested feature.

features property

Features that must not be lost across adjacent levels.

Ligand(mol, G, donor_atoms, skeleton_atoms, substituent_atoms, smiles_settings=SmilesSettings(), svg_settings=SvgSettings()) dataclass

Ligand represented as an RDKit molecule plus a NetworkX graph and atom partitions.

Attributes:

Name Type Description
mol Mol

RDKit molecule

G Graph

NetworkX graph with node attributes

donor_atoms FrozenSet[int]

Frozen set of donor atom indices

skeleton_atoms FrozenSet[int]

Frozen set of skeleton atom indices (excluding donors)

substituent_atoms FrozenSet[int]

Frozen set of substituent atom indices

smiles_settings SmilesSettings

Default settings for SMILES generation in visualization helpers

svg_settings SvgSettings

Default settings for SVG generation in visualization helpers

Visualization

The methods visualize_ligand, visualize_skeleton, and visualize_topology return (smiles, svg) pairs that are convenient for building dataset browsers.

Keyword arguments for visual style are forwarded to the corresponding functions in ctopo.visuals: - visualize_ligand -> ctopo.visuals.prepare_ligand_visual - visualize_skeleton -> ctopo.visuals.prepare_skeleton_visual - visualize_topology -> ctopo.visuals.prepare_topology_visual

See ctopo.visuals for the available options.

denticity property

Returns ligand's denticity

visualize_ligand(**kwargs)

Return ligand visualization (SMILES with donor maps + chemical-like SVG).

Keyword arguments are forwarded to ctopo.visuals.prepare_ligand_visual. See ctopo.visuals for available options.

visualize_skeleton(donors=True, skeleton=True, bonds=True, **kwargs)

Return skeleton visualization (SMILES + SVG) for this ligand.

Parameters:

Name Type Description Default
donors bool

If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'.

True
skeleton bool

If True, skeleton atoms are shown as original elements. If False, skeleton atoms are dummies with empty labels.

True
bonds bool

If True, keep original bond orders from the skeleton graph. If False, force all bonds to be single.

True

Keyword arguments are forwarded to ctopo.visuals.prepare_skeleton_visual. See ctopo.visuals for available options.

visualize_topology(donors=False, **kwargs)

Return topology visualization (SMILES + SVG) for this ligand.

Parameters:

Name Type Description Default
donors bool

If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'. Non-donor atoms are always dummies with empty labels in the topology depiction.

False

Keyword arguments are forwarded to ctopo.visuals.prepare_topology_visual. See ctopo.visuals for available options.

_norm_flag(x)

_smiles_metrics(smiles, cache)

Return (n_atoms, n_bonds, branching, rings) for sorting.

build_ligand_tree(ligands, levels=('denticity', ('topo',), ('skeleton',), ('skeleton', 'bonds'), ('skeleton', 'da', 'bonds'), 'ligand'), ligand_ids=None, collapse_leaves=False, include_leaf_svg=True, topo_kwargs=None, skeleton_kwargs=None, ligand_kwargs=None)

Build a hierarchical tree (as an nx.DiGraph) for a ligand dataset.

The resulting graph groups ligands by successive abstractions defined by levels. Internal nodes represent unique groups at each level; leaf nodes represent ligands.

Parameters:

Name Type Description Default
ligands Sequence[Ligand]

Input ligands.

required
levels Sequence[Union[LevelSpec, LevelId]]

Level specification sequence. See LevelSpec and validate_levels.

('denticity', ('topo',), ('skeleton',), ('skeleton', 'bonds'), ('skeleton', 'da', 'bonds'), 'ligand')
ligand_ids Optional[Sequence[str]]

Optional stable identifiers for ligands, used for leaf labels (non-collapsed mode) and for leaf example lists (collapsed mode).

None
collapse_leaves bool

If False, create one leaf node per input ligand. If True, create one leaf node per unique ligand SMILES and store occurrences in count.

False
include_leaf_svg bool

If True, leaf nodes store an SVG depiction. If False, leaves have svg=None.

True
topo_kwargs Optional[Mapping[str, Any]]

Optional keyword arguments forwarded to ctopo.visuals.prepare_topology_visual.

None
skeleton_kwargs Optional[Mapping[str, Any]]

Optional keyword arguments forwarded to ctopo.visuals.prepare_skeleton_visual.

None
ligand_kwargs Optional[Mapping[str, Any]]

Optional keyword arguments forwarded to ctopo.visuals.prepare_ligand_visual.

None

Returns: An nx.DiGraph rooted at a single 'root' node.

Node attributes typically include: `kind`, `level`, `label`, `smiles`, `svg`,
`leaf_count`, and `sort_key`.

Leaf nodes additionally include `count` and (in non-collapsed mode) `source_index`
and/or `source_id`.
Notes

SVG generation is performed only for unique nodes per level (and optionally leaves), to reduce overhead on large datasets.

get_ligands_skeleton(G, atom_type_key='atom_type')

Return the ligand skeleton as an induced subgraph of the original graph.

Skeleton definition
  • Prefer precomputed AtomType labels: keep atoms with type in {DONOR, SKELETON}.
  • Otherwise compute skeleton as the union of nodes on all shortest paths between donor pairs (plus donors themselves).

Parameters:

Name Type Description Default
G Graph

Original ligand graph (NetworkX Graph).

required
atom_type_key str

Node attribute holding AtomType integer codes.

'atom_type'

Returns:

Type Description
Graph

A copy of the induced skeleton subgraph (same node ids as in G).

Raises:

Type Description
ValueError

If no donor atoms are present.

get_ligands_topology(G, atom_type_key='atom_type')

Return a simplified topology graph of the ligand (ignore_cycles=False behavior).

Algorithm (mirrors your RDKit reference): - start from ligand skeleton - remove linear linkers (degree-2 non-donors, neighbors not bonded) by contracting - remove bubbles (degree-2 non-donors, neighbors bonded) by deleting - remove remaining linear linkers again - finalize: * donors keep original node attributes * non-donors become dummy nodes with only {'Z': 0} * all edges become single bonds with minimal attrs

Parameters:

Name Type Description Default
G Graph

Original ligand graph (NetworkX Graph).

required
atom_type_key str

Node attribute holding AtomType integer codes.

'atom_type'

Returns:

Type Description
Graph

A new NetworkX Graph representing the ligand topology.

Raises:

Type Description
ValueError

If no donor atoms are present.

level_spec(level)

Creates LevelSpec object from text description

prepare_ligand_visual(ligand, donor_map_num=1, donor_color=(1.0, 0.8, 0.45), skeleton_bond_color=(0.65, 0.8, 1.0), donor_radius=0.45, mark_donors_in_smiles=True, highlight_skeleton_bonds=True)

Prepare an RDKit Mol of the ligand for drawing and SMILES generation.

Behavior
  • starts from ligand.mol (source of truth)
  • optionally sets the same atom-map number on all donor atoms
  • highlights donor atoms and (optionally) skeleton bonds

Skeleton bonds are computed from the ligand graph partition.

prepare_skeleton_visual(G_skeleton, use_original_donor_atoms=True, use_original_skeleton_atoms=True, use_original_bonds=True, donor_map_num=1, mark_donors_in_smiles=True, donor_color=(1.0, 0.8, 0.45), donor_radius=0.45, donor_label='DA')

Prepare an RDKit Mol for a ligand skeleton graph.

prepare_topology_visual(G_topology, use_original_donor_atoms=False, donor_map_num=1, mark_donors_in_smiles=True, donor_color=(1.0, 0.8, 0.45), donor_radius=0.45, donor_label='DA')

Prepare an RDKit Mol for a ligand topology graph.

validate_levels(levels, *, require_leaf=True)

Validate and normalize a sequence of level specifications.

This function enforces two constraints:

1) Kind order: The level kinds must follow the structural hierarchy: denticity -> topo -> skeleton -> ligand

2) Monotone information retention: Adjacent levels must not lose features. For example: ('skeleton', 'da') -> ('skeleton', 'da', 'bonds') is valid ('skeleton', 'da') -> ('skeleton', 'bonds') is invalid (drops 'da')

Parameters:

Name Type Description Default
levels Sequence[Union[LevelSpec, LevelId]]

Sequence of LevelSpec or short forms: - 'denticity' - ('topo',) or ('topo', 'da') - ('skeleton',), ('skeleton', 'bonds'), ('skeleton', 'da', 'bonds'), ... - 'ligand'

required
require_leaf bool

If True, the last level must be 'ligand'.

True

Returns:

Type Description
List[LevelSpec]

A list of normalized LevelSpec instances.

Raises:

Type Description
ValueError

if levels are empty, contain unknown kinds, violate kind ordering, violate monotonicity, or (if require_leaf=True) do not end with 'ligand'.

ctopo.trees.html

HTML renderer for ligand hierarchy trees.

This module turns a ligand hierarchy graph (typically produced by ctopo.trees.build.build_ligand_tree) into a self-contained HTML document that can be opened directly in a web browser.

The output uses simple built-in HTML elements (

/) for collapsible nodes, and CSS for styling. Child nodes are placed into a scrollable container to keep large branching factors manageable; by default only a small vertical window is shown while allowing scrolling.

Expected input graph

The renderer expects a directed acyclic graph that behaves like a tree: - exactly one root node (in-degree == 0), - every other node has exactly one parent (in-degree == 1).

Node attributes used by the renderer: - svg: an SVG snippet to display as a thumbnail (optional for some nodes) - label: human-readable label - kind: node type string (e.g. 'topo', 'skeleton', 'ligand') - leaf_count: number of ligand leaves under the node (used as a summary statistic) - sort_key: used to order children consistently

This renderer intentionally avoids external dependencies and keeps the HTML portable. If you later want a richer UI (search, lazy-loading, virtual scrolling, etc.), this module can be replaced while keeping the builder output unchanged.

__all__ = ['tree_to_html'] module-attribute

_find_root(G)

_sorted_children(G, nid)

tree_to_html(G, root=None, title='cTopo ligand tree', max_children_visible=5, child_item_height_px=120, open_root=True)

Render a ligand tree into a self-contained HTML string.

Parameters:

Name Type Description Default
G DiGraph

A directed acyclic graph representing a tree. Typically produced by ctopo.trees.build.build_ligand_tree.

required
root Optional[int]

Optional explicit root node id. If None, the unique node with in-degree == 0 is used.

None
title str

HTML document title.

'cTopo ligand tree'
max_children_visible int

Maximum number of child 'cards' visible in the children container before scrolling.

5
child_item_height_px int

Approximate pixel height of each child item; used to compute container max-height.

120
open_root bool

If True, the root node is expanded by default.

True

Returns:

Type Description
str

A complete HTML document as a string.

Raises:

Type Description
ValueError

if the graph is not a DAG, does not have exactly one root, or is not tree-like (some nodes have multiple parents).

Fragments

ctopo.fragments

Fragmentation utilities.

This module contains RDKit-based helpers to decompose a Complex into ligand fragments without reconstructing ligand RDKit molecules from graphs.

Current scope (v1): - Bridging ligands are not handled specially: removing metal centers may split a bridging ligand into multiple fragments. This is expected behavior for now.

__all__ = ['LigandCount', 'ligands_from_complex'] module-attribute

Complex(mol, G, metal_atoms, donor_atoms, skeleton_atoms, substituent_atoms) dataclass

Complex represented as an RDKit molecule plus a NetworkX graph and atom partitions.

Attributes:

Name Type Description
mol Mol

RDKit molecule

G Graph

NetworkX graph with node attributes

metal_atoms FrozenSet[int]

Frozen set of metal atom indices

donor_atoms FrozenSet[int]

Frozen set of donor atom indices

skeleton_atoms FrozenSet[int]

Frozen set of skeleton atom indices (excluding donors)

substituent_atoms FrozenSet[int]

Frozen set of substituent atom indices

Ligand(mol, G, donor_atoms, skeleton_atoms, substituent_atoms, smiles_settings=SmilesSettings(), svg_settings=SvgSettings()) dataclass

Ligand represented as an RDKit molecule plus a NetworkX graph and atom partitions.

Attributes:

Name Type Description
mol Mol

RDKit molecule

G Graph

NetworkX graph with node attributes

donor_atoms FrozenSet[int]

Frozen set of donor atom indices

skeleton_atoms FrozenSet[int]

Frozen set of skeleton atom indices (excluding donors)

substituent_atoms FrozenSet[int]

Frozen set of substituent atom indices

smiles_settings SmilesSettings

Default settings for SMILES generation in visualization helpers

svg_settings SvgSettings

Default settings for SVG generation in visualization helpers

Visualization

The methods visualize_ligand, visualize_skeleton, and visualize_topology return (smiles, svg) pairs that are convenient for building dataset browsers.

Keyword arguments for visual style are forwarded to the corresponding functions in ctopo.visuals: - visualize_ligand -> ctopo.visuals.prepare_ligand_visual - visualize_skeleton -> ctopo.visuals.prepare_skeleton_visual - visualize_topology -> ctopo.visuals.prepare_topology_visual

See ctopo.visuals for the available options.

denticity property

Returns ligand's denticity

visualize_ligand(**kwargs)

Return ligand visualization (SMILES with donor maps + chemical-like SVG).

Keyword arguments are forwarded to ctopo.visuals.prepare_ligand_visual. See ctopo.visuals for available options.

visualize_skeleton(donors=True, skeleton=True, bonds=True, **kwargs)

Return skeleton visualization (SMILES + SVG) for this ligand.

Parameters:

Name Type Description Default
donors bool

If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'.

True
skeleton bool

If True, skeleton atoms are shown as original elements. If False, skeleton atoms are dummies with empty labels.

True
bonds bool

If True, keep original bond orders from the skeleton graph. If False, force all bonds to be single.

True

Keyword arguments are forwarded to ctopo.visuals.prepare_skeleton_visual. See ctopo.visuals for available options.

visualize_topology(donors=False, **kwargs)

Return topology visualization (SMILES + SVG) for this ligand.

Parameters:

Name Type Description Default
donors bool

If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'. Non-donor atoms are always dummies with empty labels in the topology depiction.

False

Keyword arguments are forwarded to ctopo.visuals.prepare_topology_visual. See ctopo.visuals for available options.

LigandCount(smiles, ligand, count) dataclass

Unique ligand representative with occurrence count.

SmilesSettings(canonical=True, isomeric=False) dataclass

Settings for SMILES generation.

Mirrors PreparedMol.to_smiles() in ctopo.visuals.

SvgSettings(size=(300, 220), line_width=2, add_atom_indices=False) dataclass

Settings for SVG generation.

Mirrors PreparedMol.to_svg() in ctopo.visuals.

_remove_atoms_by_index(mol, atom_indices)

_set_orig_idx_props(mol, prop='orig_idx')

ligand_from_mol(mol, donor_atoms, smiles_settings=None, svg_settings=None)

Construct a Ligand from an RDKit Mol and explicit donor atom indices.

Parameters:

Name Type Description Default
mol Mol

RDKit molecule.

required
donor_atoms Sequence[int]

Atom indices that should be treated as donor atoms.

required
smiles_settings Optional[SmilesSettings]

Optional default SMILES settings stored in the Ligand and used by visualization helpers.

None
svg_settings Optional[SvgSettings]

Optional default SVG settings stored in the Ligand and used by visualization helpers.

None

Returns:

Type Description
Ligand

Ligand instance with populated graph and atom partitions.

Raises:

Type Description
TypeError

If mol is None or donor_atoms contains non-integers.

ValueError

If donor atom indices are out of range.

NodeNotFound

If a donor index is not present in the graph.

ligands_from_complex(complex, sanitize_frags=True, smiles_settings=None, svg_settings=None)

Extract ligand fragments from a Complex via RDKit fragmentation.

Assumptions
  • complex.metal_atoms and complex.donor_atoms are correct (e.g. Complex was created via ctopo.core.complex.complex_from_mol which validates coordination).
Algorithm
  • copy complex.mol
  • annotate atoms with int prop 'orig_idx'
  • remove metal atoms
  • split into fragments via Chem.GetMolFrags(asMols=True)
  • for each fragment, recover donor atoms by checking orig_idx ∈ complex.donor_atoms
  • build Ligand objects from fragments
  • compute unique ligands by canonical+isomeric SMILES (canonical=True, isomericSmiles=True)

Parameters:

Name Type Description Default
complex Complex

cTopo Complex object.

required
sanitize_frags bool

Passed to RDKit GetMolFrags(sanitizeFrags=...). Default True.

True
smiles_settings Optional[SmilesSettings]

Optional Ligand visualization default SMILES settings to store.

None
svg_settings Optional[SvgSettings]

Optional Ligand visualization default SVG settings to store.

None

Returns:

Type Description
List[LigandCount]

list of LigandCount (unique ligands with counts), sorted by SMILES.

Raises:

Type Description
TypeError

If complex.mol is None.