Trees and fragments¶
Trees¶
ctopo.trees.build
¶
Hierarchy builder for ligand datasets.
This module builds a hierarchical representation of a ligand dataset as a directed
acyclic graph (a tree in the typical configuration), where internal nodes correspond
to progressively more detailed structural abstractions and leaves correspond to
individual ligands (or unique ligands if collapse_leaves=True).
The intended high-level hierarchy is:
denticity -> topology -> skeleton -> ligand
Where:
- denticity is Ligand.denticity (number of donor atoms),
- topology is the reduced donor-linker topology graph (computed by get_ligands_topology),
- skeleton is the ligand skeleton subgraph (computed by get_ligands_skeleton),
- ligand leaves represent the original ligand depiction/SMILES.
Levels are configured using LevelSpec objects (or short tuple forms such as
('skeleton', 'da', 'bonds')). Each level yields a grouping key (SMILES) and, for
non-denticity nodes, a thumbnail SVG depiction.
Monotonic detail constraint¶
The level sequence must be hierarchical not only in kind (denticity -> topo -> skeleton -> ligand), but also in the information content ("flags") between adjacent levels.
Example (valid): ('skeleton', 'da') -> ('skeleton', 'da', 'bonds')
Example (invalid, loses information): ('skeleton', 'da') -> ('skeleton', 'bonds')
This constraint is validated by validate_levels.
Performance notes¶
Generating SVG depictions can be expensive. The builder avoids redundant work by:
- computing SMILES keys per ligand per level (cheap),
- generating SVG only once per unique (level, SMILES) node,
- optionally disabling leaf SVG generation (include_leaf_svg=False).
Output graph format¶
The returned object is an nx.DiGraph with node attributes suitable for downstream rendering.
At minimum, nodes store:
- kind: 'root' | 'denticity' | 'topo' | 'skeleton' | 'ligand'
- level: a tuple identifier of the level
- label: short label
- smiles: grouping key (None for root/denticity)
- svg: thumbnail (None for root/denticity and optionally leaves)
- leaf_count: number of leaf ligands under the node
The graph is expected to be a tree for tree_to_html:
each node (except root) has exactly one parent.
LevelId = Union[str, Sequence[str]]
module-attribute
¶
_KIND_RANK = {'denticity': 0, 'topo': 1, 'topology': 1, 'skeleton': 2, 'ligand': 3}
module-attribute
¶
__all__ = ['LevelSpec', 'level_spec', 'validate_levels', 'build_ligand_tree']
module-attribute
¶
LevelSpec(kind, flags=frozenset())
dataclass
¶
A single hierarchy level specification.
A LevelSpec defines how ligands are grouped at a particular level of the hierarchy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
str
|
One of: 'denticity', 'topo' (or 'topology'), 'skeleton', 'ligand'. |
required |
flags
|
frozenset[str]
|
A set of optional modifiers affecting how the grouping key and depiction are produced. |
frozenset()
|
Supported flags (case-insensitive): 'da': Preserve original donor atom elements in depictions/SMILES where applicable (otherwise donors are shown as dummy atoms labelled 'DA'). 'bonds' (skeleton only): Preserve original bond orders in skeleton depictions/SMILES (otherwise all single). 'skeleton' (skeleton only): Preserve original elements for skeleton atoms (otherwise dummy atoms).
Notes
validate_levels enforces that adjacent levels are monotone in the information they retain,
so that a later level never 'forgets' a previously requested feature.
features
property
¶
Features that must not be lost across adjacent levels.
Ligand(mol, G, donor_atoms, skeleton_atoms, substituent_atoms, smiles_settings=SmilesSettings(), svg_settings=SvgSettings())
dataclass
¶
Ligand represented as an RDKit molecule plus a NetworkX graph and atom partitions.
Attributes:
| Name | Type | Description |
|---|---|---|
mol |
Mol
|
RDKit molecule |
G |
Graph
|
NetworkX graph with node attributes |
donor_atoms |
FrozenSet[int]
|
Frozen set of donor atom indices |
skeleton_atoms |
FrozenSet[int]
|
Frozen set of skeleton atom indices (excluding donors) |
substituent_atoms |
FrozenSet[int]
|
Frozen set of substituent atom indices |
smiles_settings |
SmilesSettings
|
Default settings for SMILES generation in visualization helpers |
svg_settings |
SvgSettings
|
Default settings for SVG generation in visualization helpers |
Visualization
The methods visualize_ligand, visualize_skeleton, and visualize_topology
return (smiles, svg) pairs that are convenient for building dataset browsers.
Keyword arguments for visual style are forwarded to the corresponding functions
in ctopo.visuals:
- visualize_ligand -> ctopo.visuals.prepare_ligand_visual
- visualize_skeleton -> ctopo.visuals.prepare_skeleton_visual
- visualize_topology -> ctopo.visuals.prepare_topology_visual
See ctopo.visuals for the available options.
denticity
property
¶
Returns ligand's denticity
visualize_ligand(**kwargs)
¶
Return ligand visualization (SMILES with donor maps + chemical-like SVG).
Keyword arguments are forwarded to ctopo.visuals.prepare_ligand_visual.
See ctopo.visuals for available options.
visualize_skeleton(donors=True, skeleton=True, bonds=True, **kwargs)
¶
Return skeleton visualization (SMILES + SVG) for this ligand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
donors
|
bool
|
If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'. |
True
|
skeleton
|
bool
|
If True, skeleton atoms are shown as original elements. If False, skeleton atoms are dummies with empty labels. |
True
|
bonds
|
bool
|
If True, keep original bond orders from the skeleton graph. If False, force all bonds to be single. |
True
|
Keyword arguments are forwarded to ctopo.visuals.prepare_skeleton_visual.
See ctopo.visuals for available options.
visualize_topology(donors=False, **kwargs)
¶
Return topology visualization (SMILES + SVG) for this ligand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
donors
|
bool
|
If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'. Non-donor atoms are always dummies with empty labels in the topology depiction. |
False
|
Keyword arguments are forwarded to ctopo.visuals.prepare_topology_visual.
See ctopo.visuals for available options.
_norm_flag(x)
¶
_smiles_metrics(smiles, cache)
¶
Return (n_atoms, n_bonds, branching, rings) for sorting.
build_ligand_tree(ligands, levels=('denticity', ('topo',), ('skeleton',), ('skeleton', 'bonds'), ('skeleton', 'da', 'bonds'), 'ligand'), ligand_ids=None, collapse_leaves=False, include_leaf_svg=True, topo_kwargs=None, skeleton_kwargs=None, ligand_kwargs=None)
¶
Build a hierarchical tree (as an nx.DiGraph) for a ligand dataset.
The resulting graph groups ligands by successive abstractions defined by levels.
Internal nodes represent unique groups at each level; leaf nodes represent ligands.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ligands
|
Sequence[Ligand]
|
Input ligands. |
required |
levels
|
Sequence[Union[LevelSpec, LevelId]]
|
Level specification sequence. See |
('denticity', ('topo',), ('skeleton',), ('skeleton', 'bonds'), ('skeleton', 'da', 'bonds'), 'ligand')
|
ligand_ids
|
Optional[Sequence[str]]
|
Optional stable identifiers for ligands, used for leaf labels (non-collapsed mode) and for leaf example lists (collapsed mode). |
None
|
collapse_leaves
|
bool
|
If False, create one leaf node per input ligand.
If True, create one leaf node per unique ligand SMILES and store occurrences in |
False
|
include_leaf_svg
|
bool
|
If True, leaf nodes store an SVG depiction. If False, leaves have |
True
|
topo_kwargs
|
Optional[Mapping[str, Any]]
|
Optional keyword arguments forwarded to |
None
|
skeleton_kwargs
|
Optional[Mapping[str, Any]]
|
Optional keyword arguments forwarded to |
None
|
ligand_kwargs
|
Optional[Mapping[str, Any]]
|
Optional keyword arguments forwarded to |
None
|
Returns:
An nx.DiGraph rooted at a single 'root' node.
Node attributes typically include: `kind`, `level`, `label`, `smiles`, `svg`,
`leaf_count`, and `sort_key`.
Leaf nodes additionally include `count` and (in non-collapsed mode) `source_index`
and/or `source_id`.
Notes
SVG generation is performed only for unique nodes per level (and optionally leaves), to reduce overhead on large datasets.
get_ligands_skeleton(G, atom_type_key='atom_type')
¶
Return the ligand skeleton as an induced subgraph of the original graph.
Skeleton definition
- Prefer precomputed AtomType labels: keep atoms with type in {DONOR, SKELETON}.
- Otherwise compute skeleton as the union of nodes on all shortest paths between donor pairs (plus donors themselves).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
Graph
|
Original ligand graph (NetworkX Graph). |
required |
atom_type_key
|
str
|
Node attribute holding AtomType integer codes. |
'atom_type'
|
Returns:
| Type | Description |
|---|---|
Graph
|
A copy of the induced skeleton subgraph (same node ids as in G). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no donor atoms are present. |
get_ligands_topology(G, atom_type_key='atom_type')
¶
Return a simplified topology graph of the ligand (ignore_cycles=False behavior).
Algorithm (mirrors your RDKit reference): - start from ligand skeleton - remove linear linkers (degree-2 non-donors, neighbors not bonded) by contracting - remove bubbles (degree-2 non-donors, neighbors bonded) by deleting - remove remaining linear linkers again - finalize: * donors keep original node attributes * non-donors become dummy nodes with only {'Z': 0} * all edges become single bonds with minimal attrs
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
Graph
|
Original ligand graph (NetworkX Graph). |
required |
atom_type_key
|
str
|
Node attribute holding AtomType integer codes. |
'atom_type'
|
Returns:
| Type | Description |
|---|---|
Graph
|
A new NetworkX Graph representing the ligand topology. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no donor atoms are present. |
level_spec(level)
¶
Creates LevelSpec object from text description
prepare_ligand_visual(ligand, donor_map_num=1, donor_color=(1.0, 0.8, 0.45), skeleton_bond_color=(0.65, 0.8, 1.0), donor_radius=0.45, mark_donors_in_smiles=True, highlight_skeleton_bonds=True)
¶
Prepare an RDKit Mol of the ligand for drawing and SMILES generation.
Behavior
- starts from ligand.mol (source of truth)
- optionally sets the same atom-map number on all donor atoms
- highlights donor atoms and (optionally) skeleton bonds
Skeleton bonds are computed from the ligand graph partition.
prepare_skeleton_visual(G_skeleton, use_original_donor_atoms=True, use_original_skeleton_atoms=True, use_original_bonds=True, donor_map_num=1, mark_donors_in_smiles=True, donor_color=(1.0, 0.8, 0.45), donor_radius=0.45, donor_label='DA')
¶
Prepare an RDKit Mol for a ligand skeleton graph.
prepare_topology_visual(G_topology, use_original_donor_atoms=False, donor_map_num=1, mark_donors_in_smiles=True, donor_color=(1.0, 0.8, 0.45), donor_radius=0.45, donor_label='DA')
¶
Prepare an RDKit Mol for a ligand topology graph.
validate_levels(levels, *, require_leaf=True)
¶
Validate and normalize a sequence of level specifications.
This function enforces two constraints:
1) Kind order: The level kinds must follow the structural hierarchy: denticity -> topo -> skeleton -> ligand
2) Monotone information retention: Adjacent levels must not lose features. For example: ('skeleton', 'da') -> ('skeleton', 'da', 'bonds') is valid ('skeleton', 'da') -> ('skeleton', 'bonds') is invalid (drops 'da')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
levels
|
Sequence[Union[LevelSpec, LevelId]]
|
Sequence of |
required |
require_leaf
|
bool
|
If True, the last level must be 'ligand'. |
True
|
Returns:
| Type | Description |
|---|---|
List[LevelSpec]
|
A list of normalized |
Raises:
| Type | Description |
|---|---|
ValueError
|
if levels are empty, contain unknown kinds, violate kind ordering, violate monotonicity, or (if require_leaf=True) do not end with 'ligand'. |
ctopo.trees.html
¶
HTML renderer for ligand hierarchy trees.
This module turns a ligand hierarchy graph (typically produced by ctopo.trees.build.build_ligand_tree)
into a self-contained HTML document that can be opened directly in a web browser.
The output uses simple built-in HTML elements () for collapsible nodes, and
CSS for styling. Child nodes are placed into a scrollable container to keep large branching
factors manageable; by default only a small vertical window is shown while allowing scrolling.
Expected input graph¶
The renderer expects a directed acyclic graph that behaves like a tree: - exactly one root node (in-degree == 0), - every other node has exactly one parent (in-degree == 1).
Node attributes used by the renderer: - svg: an SVG snippet to display as a thumbnail (optional for some nodes) - label: human-readable label - kind: node type string (e.g. 'topo', 'skeleton', 'ligand') - leaf_count: number of ligand leaves under the node (used as a summary statistic) - sort_key: used to order children consistently
This renderer intentionally avoids external dependencies and keeps the HTML portable. If you later want a richer UI (search, lazy-loading, virtual scrolling, etc.), this module can be replaced while keeping the builder output unchanged.
__all__ = ['tree_to_html']
module-attribute
¶
_find_root(G)
¶
_sorted_children(G, nid)
¶
tree_to_html(G, root=None, title='cTopo ligand tree', max_children_visible=5, child_item_height_px=120, open_root=True)
¶
Render a ligand tree into a self-contained HTML string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
DiGraph
|
A directed acyclic graph representing a tree. Typically produced by
|
required |
root
|
Optional[int]
|
Optional explicit root node id. If None, the unique node with in-degree == 0 is used. |
None
|
title
|
str
|
HTML document title. |
'cTopo ligand tree'
|
max_children_visible
|
int
|
Maximum number of child 'cards' visible in the children container before scrolling. |
5
|
child_item_height_px
|
int
|
Approximate pixel height of each child item; used to compute container max-height. |
120
|
open_root
|
bool
|
If True, the root node is expanded by default. |
True
|
Returns:
| Type | Description |
|---|---|
str
|
A complete HTML document as a string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
if the graph is not a DAG, does not have exactly one root, or is not tree-like (some nodes have multiple parents). |
Fragments¶
ctopo.fragments
¶
Fragmentation utilities.
This module contains RDKit-based helpers to decompose a Complex into ligand fragments without reconstructing ligand RDKit molecules from graphs.
Current scope (v1): - Bridging ligands are not handled specially: removing metal centers may split a bridging ligand into multiple fragments. This is expected behavior for now.
__all__ = ['LigandCount', 'ligands_from_complex']
module-attribute
¶
Complex(mol, G, metal_atoms, donor_atoms, skeleton_atoms, substituent_atoms)
dataclass
¶
Complex represented as an RDKit molecule plus a NetworkX graph and atom partitions.
Attributes:
| Name | Type | Description |
|---|---|---|
mol |
Mol
|
RDKit molecule |
G |
Graph
|
NetworkX graph with node attributes |
metal_atoms |
FrozenSet[int]
|
Frozen set of metal atom indices |
donor_atoms |
FrozenSet[int]
|
Frozen set of donor atom indices |
skeleton_atoms |
FrozenSet[int]
|
Frozen set of skeleton atom indices (excluding donors) |
substituent_atoms |
FrozenSet[int]
|
Frozen set of substituent atom indices |
Ligand(mol, G, donor_atoms, skeleton_atoms, substituent_atoms, smiles_settings=SmilesSettings(), svg_settings=SvgSettings())
dataclass
¶
Ligand represented as an RDKit molecule plus a NetworkX graph and atom partitions.
Attributes:
| Name | Type | Description |
|---|---|---|
mol |
Mol
|
RDKit molecule |
G |
Graph
|
NetworkX graph with node attributes |
donor_atoms |
FrozenSet[int]
|
Frozen set of donor atom indices |
skeleton_atoms |
FrozenSet[int]
|
Frozen set of skeleton atom indices (excluding donors) |
substituent_atoms |
FrozenSet[int]
|
Frozen set of substituent atom indices |
smiles_settings |
SmilesSettings
|
Default settings for SMILES generation in visualization helpers |
svg_settings |
SvgSettings
|
Default settings for SVG generation in visualization helpers |
Visualization
The methods visualize_ligand, visualize_skeleton, and visualize_topology
return (smiles, svg) pairs that are convenient for building dataset browsers.
Keyword arguments for visual style are forwarded to the corresponding functions
in ctopo.visuals:
- visualize_ligand -> ctopo.visuals.prepare_ligand_visual
- visualize_skeleton -> ctopo.visuals.prepare_skeleton_visual
- visualize_topology -> ctopo.visuals.prepare_topology_visual
See ctopo.visuals for the available options.
denticity
property
¶
Returns ligand's denticity
visualize_ligand(**kwargs)
¶
Return ligand visualization (SMILES with donor maps + chemical-like SVG).
Keyword arguments are forwarded to ctopo.visuals.prepare_ligand_visual.
See ctopo.visuals for available options.
visualize_skeleton(donors=True, skeleton=True, bonds=True, **kwargs)
¶
Return skeleton visualization (SMILES + SVG) for this ligand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
donors
|
bool
|
If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'. |
True
|
skeleton
|
bool
|
If True, skeleton atoms are shown as original elements. If False, skeleton atoms are dummies with empty labels. |
True
|
bonds
|
bool
|
If True, keep original bond orders from the skeleton graph. If False, force all bonds to be single. |
True
|
Keyword arguments are forwarded to ctopo.visuals.prepare_skeleton_visual.
See ctopo.visuals for available options.
visualize_topology(donors=False, **kwargs)
¶
Return topology visualization (SMILES + SVG) for this ligand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
donors
|
bool
|
If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'. Non-donor atoms are always dummies with empty labels in the topology depiction. |
False
|
Keyword arguments are forwarded to ctopo.visuals.prepare_topology_visual.
See ctopo.visuals for available options.
LigandCount(smiles, ligand, count)
dataclass
¶
Unique ligand representative with occurrence count.
SmilesSettings(canonical=True, isomeric=False)
dataclass
¶
Settings for SMILES generation.
Mirrors PreparedMol.to_smiles() in ctopo.visuals.
SvgSettings(size=(300, 220), line_width=2, add_atom_indices=False)
dataclass
¶
Settings for SVG generation.
Mirrors PreparedMol.to_svg() in ctopo.visuals.
_remove_atoms_by_index(mol, atom_indices)
¶
_set_orig_idx_props(mol, prop='orig_idx')
¶
ligand_from_mol(mol, donor_atoms, smiles_settings=None, svg_settings=None)
¶
Construct a Ligand from an RDKit Mol and explicit donor atom indices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mol
|
Mol
|
RDKit molecule. |
required |
donor_atoms
|
Sequence[int]
|
Atom indices that should be treated as donor atoms. |
required |
smiles_settings
|
Optional[SmilesSettings]
|
Optional default SMILES settings stored in the Ligand and used by visualization helpers. |
None
|
svg_settings
|
Optional[SvgSettings]
|
Optional default SVG settings stored in the Ligand and used by visualization helpers. |
None
|
Returns:
| Type | Description |
|---|---|
Ligand
|
Ligand instance with populated graph and atom partitions. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If donor atom indices are out of range. |
NodeNotFound
|
If a donor index is not present in the graph. |
ligands_from_complex(complex, sanitize_frags=True, smiles_settings=None, svg_settings=None)
¶
Extract ligand fragments from a Complex via RDKit fragmentation.
Assumptions
complex.metal_atomsandcomplex.donor_atomsare correct (e.g. Complex was created viactopo.core.complex.complex_from_molwhich validates coordination).
Algorithm
- copy complex.mol
- annotate atoms with int prop 'orig_idx'
- remove metal atoms
- split into fragments via Chem.GetMolFrags(asMols=True)
- for each fragment, recover donor atoms by checking orig_idx ∈ complex.donor_atoms
- build Ligand objects from fragments
- compute unique ligands by canonical+isomeric SMILES (canonical=True, isomericSmiles=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
complex
|
Complex
|
cTopo Complex object. |
required |
sanitize_frags
|
bool
|
Passed to RDKit GetMolFrags(sanitizeFrags=...). Default True. |
True
|
smiles_settings
|
Optional[SmilesSettings]
|
Optional Ligand visualization default SMILES settings to store. |
None
|
svg_settings
|
Optional[SvgSettings]
|
Optional Ligand visualization default SVG settings to store. |
None
|
Returns:
| Type | Description |
|---|---|
List[LigandCount]
|
list of LigandCount (unique ligands with counts), sorted by SMILES. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If complex.mol is None. |