Skip to content

Top-level API

ctopo

ctopo - tools for analyzing the chemical space of coordination-complex ligands.

The package centers around graph-based representations of ligands and complexes derived from RDKit molecules and converted to NetworkX graphs.

ALLOWED_PROPERTIES = ('atom_type', 'Z', 'degree', 'heavy_degree', 'num_pi_electrons', 'num_hs', 'charge', 'in_ring', 'aromatic') module-attribute

DEFAULT_PROPERTIES = ('atom_type', 'Z', 'heavy_degree', 'num_pi_electrons', 'num_hs', 'in_ring') module-attribute

__all__ = ['AtomType', 'mol_to_nx', 'SmilesSettings', 'SvgSettings', 'Ligand', 'ligand_from_mol', 'Complex', 'complex_from_mol', 'ligand_from_smiles', 'complex_from_smiles', 'ALLOWED_PROPERTIES', 'DEFAULT_PROPERTIES', 'MorganSpec', 'AtomPairsSpec', 'Fingerprinter', 'make_fingerprinter', 'fingerprint_from_structure', 'tanimoto_similarity_bits', 'tanimoto_distance_bits', 'tanimoto_similarity_counts', 'tanimoto_distance_counts', 'get_ligands_skeleton', 'get_ligands_topology', 'ligands_from_complex'] module-attribute

__version__ = '0.1.0' module-attribute

AtomPairsSpec(min_distance=1, max_distance=30, bits=32) dataclass

Parameters controlling AtomPairs fingerprint generation.

Parameters:

Name Type Description Default
min_distance int

Minimum topological distance (in bonds) to include.

1
max_distance int

Maximum topological distance (in bonds) to include.

30
bits int

Bit width for feature id hashing (default 32).

32

AtomType

Bases: IntEnum

Atom partition labels stored in graph node attribute atom_type.

Complex(mol, G, metal_atoms, donor_atoms, skeleton_atoms, substituent_atoms) dataclass

Complex represented as an RDKit molecule plus a NetworkX graph and atom partitions.

Attributes:

Name Type Description
mol Mol

RDKit molecule

G Graph

NetworkX graph with node attributes

metal_atoms FrozenSet[int]

Frozen set of metal atom indices

donor_atoms FrozenSet[int]

Frozen set of donor atom indices

skeleton_atoms FrozenSet[int]

Frozen set of skeleton atom indices (excluding donors)

substituent_atoms FrozenSet[int]

Frozen set of substituent atom indices

Fingerprinter(kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code') dataclass

Configured fingerprint callable.

This is a small factory object to bundle fingerprint settings into a reusable callable. It is meant for repeated feature extraction from many Ligands/Complexes.

Parameters:

Name Type Description Default
kind FingerprintKind

Fingerprint kind.

required
spec Optional[Union[MorganSpec, AtomPairsSpec]]

Algorithm configuration (MorganSpec for now).

required
atomic_properties AtomicProperties

Atom invariant property selection.

required
graph_view GraphView

Named graph view.

'original'
keep_metals bool

Whether to keep metal atoms in Complex graphs.

True
bond_mode BondMode

Bond handling regime.

'all'
output OutputKind

Output format.

'sparse_counts'
fp_size int

Folding size for folded outputs.

2048
emit_from EmitFromStructure

Emission selector (regime or list of original atom indices).

None
folded_format FoldCountsFormat

Folded counts output format.

'dict'
bits_format BitsFormat

Bits output format.

'set'
atom_type_key str

Node attribute key holding atom type.

'atom_type'
idx_key str

Node attribute key holding original atom index.

'idx'
bond_code_key str

Edge attribute key holding original bond code.

'bond_code'
fp_bond_code_key str

Edge attribute key to store fingerprint bond code.

'fp_bond_code'

__call__(structure, bit_info=None)

Compute a fingerprint for a structure with stored settings.

Parameters:

Name Type Description Default
structure Any

Ligand or Complex (must have attribute G).

required
bit_info Optional[Dict[int, Any]]

Optional provenance dict populated by the algorithm.

None

Returns:

Type Description
Any

Fingerprint in the configured output format.

Raises:

Type Description
See

func:fingerprint_from_structure.

Ligand(mol, G, donor_atoms, skeleton_atoms, substituent_atoms, smiles_settings=SmilesSettings(), svg_settings=SvgSettings()) dataclass

Ligand represented as an RDKit molecule plus a NetworkX graph and atom partitions.

Attributes:

Name Type Description
mol Mol

RDKit molecule

G Graph

NetworkX graph with node attributes

donor_atoms FrozenSet[int]

Frozen set of donor atom indices

skeleton_atoms FrozenSet[int]

Frozen set of skeleton atom indices (excluding donors)

substituent_atoms FrozenSet[int]

Frozen set of substituent atom indices

smiles_settings SmilesSettings

Default settings for SMILES generation in visualization helpers

svg_settings SvgSettings

Default settings for SVG generation in visualization helpers

Visualization

The methods visualize_ligand, visualize_skeleton, and visualize_topology return (smiles, svg) pairs that are convenient for building dataset browsers.

Keyword arguments for visual style are forwarded to the corresponding functions in ctopo.visuals: - visualize_ligand -> ctopo.visuals.prepare_ligand_visual - visualize_skeleton -> ctopo.visuals.prepare_skeleton_visual - visualize_topology -> ctopo.visuals.prepare_topology_visual

See ctopo.visuals for the available options.

denticity property

Returns ligand's denticity

visualize_ligand(**kwargs)

Return ligand visualization (SMILES with donor maps + chemical-like SVG).

Keyword arguments are forwarded to ctopo.visuals.prepare_ligand_visual. See ctopo.visuals for available options.

visualize_skeleton(donors=True, skeleton=True, bonds=True, **kwargs)

Return skeleton visualization (SMILES + SVG) for this ligand.

Parameters:

Name Type Description Default
donors bool

If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'.

True
skeleton bool

If True, skeleton atoms are shown as original elements. If False, skeleton atoms are dummies with empty labels.

True
bonds bool

If True, keep original bond orders from the skeleton graph. If False, force all bonds to be single.

True

Keyword arguments are forwarded to ctopo.visuals.prepare_skeleton_visual. See ctopo.visuals for available options.

visualize_topology(donors=False, **kwargs)

Return topology visualization (SMILES + SVG) for this ligand.

Parameters:

Name Type Description Default
donors bool

If True, donor atoms are shown as original elements. If False, donors are dummies labeled 'DA'. Non-donor atoms are always dummies with empty labels in the topology depiction.

False

Keyword arguments are forwarded to ctopo.visuals.prepare_topology_visual. See ctopo.visuals for available options.

MorganSpec(radius, use_chirality=False, use_bond_types=True, only_nonzero_invariants=False, include_redundant_environments=False, bond_code_key='bond_code', bond_idx_key='idx', chiral_tag_key='chiral_tag', cip_code_key='cip_code') dataclass

Parameters controlling Morgan fingerprint generation.

Parameters:

Name Type Description Default
radius int

Number of iterations (ECFP radius).

required
use_chirality bool

If True, incorporate chirality information.

False
use_bond_types bool

If True, incorporate bond types in neighbor pairs.

True
only_nonzero_invariants bool

If True, atoms with round-0 invariant==0 do not emit.

False
include_redundant_environments bool

If True, do not suppress redundant environments.

False
bond_code_key str

Edge attribute name for integer bond code.

'bond_code'
bond_idx_key str

Edge attribute name for unique bond index (for env masks).

'idx'
chiral_tag_key str

Node attribute name for chiral tag integer.

'chiral_tag'
cip_code_key str

Node attribute name for CIP code string ('R','S',...).

'cip_code'

Raises:

Type Description
ValueError

If radius is negative.

SmilesSettings(canonical=True, isomeric=False) dataclass

Settings for SMILES generation.

Mirrors PreparedMol.to_smiles() in ctopo.visuals.

SvgSettings(size=(300, 220), line_width=2, add_atom_indices=False) dataclass

Settings for SVG generation.

Mirrors PreparedMol.to_svg() in ctopo.visuals.

build_ligand_tree(ligands, levels=('denticity', ('topo',), ('skeleton',), ('skeleton', 'bonds'), ('skeleton', 'da', 'bonds'), 'ligand'), ligand_ids=None, collapse_leaves=False, include_leaf_svg=True, topo_kwargs=None, skeleton_kwargs=None, ligand_kwargs=None)

Build a hierarchical tree (as an nx.DiGraph) for a ligand dataset.

The resulting graph groups ligands by successive abstractions defined by levels. Internal nodes represent unique groups at each level; leaf nodes represent ligands.

Parameters:

Name Type Description Default
ligands Sequence[Ligand]

Input ligands.

required
levels Sequence[Union[LevelSpec, LevelId]]

Level specification sequence. See LevelSpec and validate_levels.

('denticity', ('topo',), ('skeleton',), ('skeleton', 'bonds'), ('skeleton', 'da', 'bonds'), 'ligand')
ligand_ids Optional[Sequence[str]]

Optional stable identifiers for ligands, used for leaf labels (non-collapsed mode) and for leaf example lists (collapsed mode).

None
collapse_leaves bool

If False, create one leaf node per input ligand. If True, create one leaf node per unique ligand SMILES and store occurrences in count.

False
include_leaf_svg bool

If True, leaf nodes store an SVG depiction. If False, leaves have svg=None.

True
topo_kwargs Optional[Mapping[str, Any]]

Optional keyword arguments forwarded to ctopo.visuals.prepare_topology_visual.

None
skeleton_kwargs Optional[Mapping[str, Any]]

Optional keyword arguments forwarded to ctopo.visuals.prepare_skeleton_visual.

None
ligand_kwargs Optional[Mapping[str, Any]]

Optional keyword arguments forwarded to ctopo.visuals.prepare_ligand_visual.

None

Returns: An nx.DiGraph rooted at a single 'root' node.

Node attributes typically include: `kind`, `level`, `label`, `smiles`, `svg`,
`leaf_count`, and `sort_key`.

Leaf nodes additionally include `count` and (in non-collapsed mode) `source_index`
and/or `source_id`.
Notes

SVG generation is performed only for unique nodes per level (and optionally leaves), to reduce overhead on large datasets.

complex_from_mol(mol, metal_atoms)

Construct a Complex from an RDKit Mol and explicit metal atom indices.

Node attribute atom_type is assigned as: - int(AtomType.CENTER) for metal atoms (metal centers / center atoms) - int(AtomType.DONOR) for donor atoms (neighbors of metals, excluding metals) - int(AtomType.SKELETON) for skeleton atoms (excluding donors/metals) - int(AtomType.SUBSTITUENT) for remaining non-metal atoms

Skeleton is computed on the graph with metal atoms removed, so shortest paths do not traverse metal centers.

Validity checks (coordination encoding): - metal–nonmetal bonds must be dative and oriented donor -> metal (metal is end atom) - metal–metal bonds are allowed but must be non-dative

Parameters:

Name Type Description Default
mol Mol

RDKit molecule.

required
metal_atoms Sequence[int]

Atom indices that should be treated as metal centers.

required

Returns:

Type Description
Complex

Complex instance with populated graph and atom partitions.

Raises:

Type Description
TypeError

If mol is None or metal_atoms contains non-integers.

ValueError

If metal atom indices are out of range or metal_atoms is empty, or if bonding validity checks fail.

NodeNotFound

If a metal index is not present in the graph (shouldn’t happen if mol is consistent).

complex_from_smiles(smiles)

Construct a Complex from a SMILES string.

Convention (no atom-map requirement): - Metal–ligand coordination must be encoded via RDKit dative bonds. - Each dative bond must be oriented donor -> metal (i.e., the metal is the end atom). - Any atom inferred as a metal center must have only: * dative bonds to non-metal atoms (metal is the end atom), and * non-dative bonds to other metal atoms (metal–metal bonds are allowed but must be non-dative).

Atom-map numbers (if present) are ignored for inference and cleared before constructing the cTopo object.

Parameters:

Name Type Description Default
smiles str

SMILES string. Coordination must be represented with dative bonds.

required

Returns:

Type Description
Complex

Complex instance.

Raises:

Type Description
ValueError

If SMILES is invalid or the coordination encoding requirements are violated.

fingerprint_from_structure(structure, kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, bit_info=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code')

Compute a fingerprint from a Ligand or Complex (ctopo core objects).

This is a convenience wrapper that: 1) reads structure.G (NetworkX graph) 2) builds a named graph view (subgraph + relabel to 0..n-1) 3) writes fingerprint bond codes under fp_bond_code_key according to bond_mode 4) computes atom invariants from node attributes (including per-atom_type property selection) 5) resolves emit_from (regime or explicit list of original atom indices) into view node ids 6) calls :func:fingerprint

Indexing conventions: - The view graph is relabeled to node ids 0..n-1. - The original atom index is preserved as node attribute idx_key. - If emit_from is an explicit iterable, it is interpreted as original atom indices (values of idx_key), not view node ids.

Parameters:

Name Type Description Default
structure Any

Ligand or Complex object with attribute G.

required
kind FingerprintKind

Fingerprint kind ('morgan' or 'atompairs').

required
spec Optional[Union[MorganSpec, AtomPairsSpec]]

Algorithm configuration. - for kind='morgan', this must be a MorganSpec - for kind='atompairs', this can be an AtomPairsSpec or None (defaults are used)

required
atomic_properties AtomicProperties

Atomic-property selection for invariant construction. This can be either: - a list of property names applied to all atom types, or - a dict mapping atom_type -> list of property names.

required
graph_view GraphView

Named view applied before fingerprinting.

'original'
keep_metals bool

If False, attempts to drop metal atoms in the view (Complex use).

True
bond_mode BondMode

Bond handling regime for fingerprint bond codes.

'all'
output OutputKind

Output format: 'sparse_counts', 'folded_counts', or 'bits'.

'sparse_counts'
fp_size int

Size for folded outputs.

2048
emit_from EmitFromStructure

Optional emission selector: - None or 'all': emit from all atoms - named regime ('skeleton', 'substituent', 'donor', 'center') - explicit iterable of original atom indices (node[idx_key])

None
bit_info Optional[Dict[int, Any]]

Optional provenance mapping (see Morgan docs).

None
folded_format FoldCountsFormat

Output format for folded counts ('dict' or 'numpy').

'dict'
bits_format BitsFormat

Output format for bits ('set', 'dict', or 'numpy').

'set'
atom_type_key str

Node attribute key holding atom type.

'atom_type'
idx_key str

Node attribute key holding original atom index.

'idx'
bond_code_key str

Edge attribute key holding original bond code.

'bond_code'
fp_bond_code_key str

Edge attribute key to store fingerprint bond code.

'fp_bond_code'

Returns:

Type Description
Any

Fingerprint in the requested format.

Raises:

Type Description
AttributeError

If structure has no attribute G.

ValueError

If the created view is not relabeled to 0..n-1.

get_ligands_skeleton(G, atom_type_key='atom_type')

Return the ligand skeleton as an induced subgraph of the original graph.

Skeleton definition
  • Prefer precomputed AtomType labels: keep atoms with type in {DONOR, SKELETON}.
  • Otherwise compute skeleton as the union of nodes on all shortest paths between donor pairs (plus donors themselves).

Parameters:

Name Type Description Default
G Graph

Original ligand graph (NetworkX Graph).

required
atom_type_key str

Node attribute holding AtomType integer codes.

'atom_type'

Returns:

Type Description
Graph

A copy of the induced skeleton subgraph (same node ids as in G).

Raises:

Type Description
ValueError

If no donor atoms are present.

get_ligands_topology(G, atom_type_key='atom_type')

Return a simplified topology graph of the ligand (ignore_cycles=False behavior).

Algorithm (mirrors your RDKit reference): - start from ligand skeleton - remove linear linkers (degree-2 non-donors, neighbors not bonded) by contracting - remove bubbles (degree-2 non-donors, neighbors bonded) by deleting - remove remaining linear linkers again - finalize: * donors keep original node attributes * non-donors become dummy nodes with only {'Z': 0} * all edges become single bonds with minimal attrs

Parameters:

Name Type Description Default
G Graph

Original ligand graph (NetworkX Graph).

required
atom_type_key str

Node attribute holding AtomType integer codes.

'atom_type'

Returns:

Type Description
Graph

A new NetworkX Graph representing the ligand topology.

Raises:

Type Description
ValueError

If no donor atoms are present.

ligand_from_mol(mol, donor_atoms, smiles_settings=None, svg_settings=None)

Construct a Ligand from an RDKit Mol and explicit donor atom indices.

Parameters:

Name Type Description Default
mol Mol

RDKit molecule.

required
donor_atoms Sequence[int]

Atom indices that should be treated as donor atoms.

required
smiles_settings Optional[SmilesSettings]

Optional default SMILES settings stored in the Ligand and used by visualization helpers.

None
svg_settings Optional[SvgSettings]

Optional default SVG settings stored in the Ligand and used by visualization helpers.

None

Returns:

Type Description
Ligand

Ligand instance with populated graph and atom partitions.

Raises:

Type Description
TypeError

If mol is None or donor_atoms contains non-integers.

ValueError

If donor atom indices are out of range.

NodeNotFound

If a donor index is not present in the graph.

ligand_from_smiles(smiles, smiles_settings=None, svg_settings=None)

Construct a Ligand from a SMILES string.

Donor atoms are determined by atoms with atom map numbers != 0.

The function
  • parses SMILES,
  • collects donor atom indices (map != 0),
  • clears atom map numbers everywhere,
  • calls ligand_from_mol(mol, donor_atoms=...).

Parameters:

Name Type Description Default
smiles str

SMILES string. Donors must be marked via atom-map numbers.

required
smiles_settings Optional[SmilesSettings]

Optional default visualization SMILES settings to store in Ligand.

None
svg_settings Optional[SvgSettings]

Optional default visualization SVG settings to store in Ligand.

None

Returns:

Type Description
Ligand

Ligand instance.

Raises:

Type Description
ValueError

If SMILES is invalid or no donor atoms are marked.

ligands_from_complex(complex, sanitize_frags=True, smiles_settings=None, svg_settings=None)

Extract ligand fragments from a Complex via RDKit fragmentation.

Assumptions
  • complex.metal_atoms and complex.donor_atoms are correct (e.g. Complex was created via ctopo.core.complex.complex_from_mol which validates coordination).
Algorithm
  • copy complex.mol
  • annotate atoms with int prop 'orig_idx'
  • remove metal atoms
  • split into fragments via Chem.GetMolFrags(asMols=True)
  • for each fragment, recover donor atoms by checking orig_idx ∈ complex.donor_atoms
  • build Ligand objects from fragments
  • compute unique ligands by canonical+isomeric SMILES (canonical=True, isomericSmiles=True)

Parameters:

Name Type Description Default
complex Complex

cTopo Complex object.

required
sanitize_frags bool

Passed to RDKit GetMolFrags(sanitizeFrags=...). Default True.

True
smiles_settings Optional[SmilesSettings]

Optional Ligand visualization default SMILES settings to store.

None
svg_settings Optional[SvgSettings]

Optional Ligand visualization default SVG settings to store.

None

Returns:

Type Description
List[LigandCount]

list of LigandCount (unique ligands with counts), sorted by SMILES.

Raises:

Type Description
TypeError

If complex.mol is None.

make_fingerprinter(kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code')

Create a configured fingerprint callable.

Parameters:

Name Type Description Default
kind FingerprintKind

Fingerprint kind.

required
spec Optional[Union[MorganSpec, AtomPairsSpec]]

Algorithm configuration (MorganSpec for now).

required
atomic_properties AtomicProperties

Atom invariant property selection.

required
graph_view GraphView

Named graph view.

'original'
keep_metals bool

Whether to keep metal atoms in Complex graphs.

True
bond_mode BondMode

Bond handling regime.

'all'
output OutputKind

Output format.

'sparse_counts'
fp_size int

Folding size for folded outputs.

2048
emit_from EmitFromStructure

Emission selector (regime or list of original atom indices).

None
folded_format FoldCountsFormat

Folded counts output format.

'dict'
bits_format BitsFormat

Bits output format.

'set'
atom_type_key str

Node attribute key holding atom type.

'atom_type'
idx_key str

Node attribute key holding original atom index.

'idx'
bond_code_key str

Edge attribute key holding original bond code.

'bond_code'
fp_bond_code_key str

Edge attribute key to store fingerprint bond code.

'fp_bond_code'

Returns:

Type Description
Fingerprinter

A configured :class:Fingerprinter instance.

mol_to_nx(mol)

Transform an RDKit Mol into a NetworkX Graph.

Adds two extra node attributes useful for atom-pair / topological-torsion style atom typing: - heavy_degree: number of non-hydrogen neighbors (independent of whether H are explicit) - num_pi_electrons: RDKit-style "pi count" used in atom-pairs/torsions (aromatic -> 1, otherwise (heavy_valence - heavy_degree), rounded and floored at 0)

Parameters:

Name Type Description Default
mol Mol

RDKit molecule to convert.

required

Returns:

Type Description
Graph

A NetworkX undirected graph whose nodes correspond to atom indices.

Raises:

Type Description
TypeError

If mol is None.

tanimoto_distance_bits(a, b)

Tanimoto distance for binary/presence fingerprints (1 - similarity).

tanimoto_distance_counts(a, b)

Generalized Tanimoto distance for count vectors (1 - similarity).

tanimoto_similarity_bits(a, b)

Tanimoto similarity for binary/presence fingerprints.

Returns:

Type Description
float

Similarity in [0, 1]. If both fingerprints are empty (all zeros),

float

returns 1.0 by convention.

tanimoto_similarity_counts(a, b)

Generalized Tanimoto similarity for non-negative count vectors.

This is the standard extension of Tanimoto to count vectors

sim = (a·b) / (||a||^2 + ||b||^2 - a·b)

Works with both dense (numpy/sequence) and sparse (dict-like) inputs. If both vectors are all zeros, returns 1.0 by convention.

tree_to_html(G, root=None, title='cTopo ligand tree', max_children_visible=5, child_item_height_px=120, open_root=True)

Render a ligand tree into a self-contained HTML string.

Parameters:

Name Type Description Default
G DiGraph

A directed acyclic graph representing a tree. Typically produced by ctopo.trees.build.build_ligand_tree.

required
root Optional[int]

Optional explicit root node id. If None, the unique node with in-degree == 0 is used.

None
title str

HTML document title.

'cTopo ligand tree'
max_children_visible int

Maximum number of child 'cards' visible in the children container before scrolling.

5
child_item_height_px int

Approximate pixel height of each child item; used to compute container max-height.

120
open_root bool

If True, the root node is expanded by default.

True

Returns:

Type Description
str

A complete HTML document as a string.

Raises:

Type Description
ValueError

if the graph is not a DAG, does not have exactly one root, or is not tree-like (some nodes have multiple parents).