Skip to content

Fingerprints and descriptors

ctopo.descriptors

Descriptor and fingerprint computation utilities.

This subpackage contains: - RDKit-free implementations of fingerprints (Morgan, AtomPairs) - atom invariant hashing helpers - graph view helpers for selecting subgraphs (skeleton/substituent, etc.) - utilities to convert sparse count fingerprints into folded arrays or bit sets

Public API is re-exported here for convenience.

ALLOWED_PROPERTIES = ('atom_type', 'Z', 'degree', 'heavy_degree', 'num_pi_electrons', 'num_hs', 'charge', 'in_ring', 'aromatic') module-attribute

BitsFormat = Literal['set', 'dict', 'numpy'] module-attribute

DEFAULT_PROPERTIES = ('atom_type', 'Z', 'heavy_degree', 'num_pi_electrons', 'num_hs', 'in_ring') module-attribute

FoldCountsFormat = Literal['dict', 'numpy'] module-attribute

GraphView = Literal['original', 'skeleton', 'skeleton_alpha_substituents', 'substituent', 'substituent_alpha_skeleton'] module-attribute

__all__ = ['ALLOWED_PROPERTIES', 'DEFAULT_PROPERTIES', 'atomic_invariants_from_graph', 'BitsFormat', 'FoldCountsFormat', 'counts_to_bits', 'fold_counts', 'GraphView', 'make_graph_view', 'AtomPairsSpec', 'atom_pairs_sparse_counts', 'MorganSpec', 'morgan_sparse_counts', 'Fingerprinter', 'fingerprint', 'fingerprint_from_structure', 'make_fingerprinter'] module-attribute

AtomPairsSpec(min_distance=1, max_distance=30, bits=32) dataclass

Parameters controlling AtomPairs fingerprint generation.

Parameters:

Name Type Description Default
min_distance int

Minimum topological distance (in bonds) to include.

1
max_distance int

Maximum topological distance (in bonds) to include.

30
bits int

Bit width for feature id hashing (default 32).

32

Fingerprinter(kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code') dataclass

Configured fingerprint callable.

This is a small factory object to bundle fingerprint settings into a reusable callable. It is meant for repeated feature extraction from many Ligands/Complexes.

Parameters:

Name Type Description Default
kind FingerprintKind

Fingerprint kind.

required
spec Optional[Union[MorganSpec, AtomPairsSpec]]

Algorithm configuration (MorganSpec for now).

required
atomic_properties AtomicProperties

Atom invariant property selection.

required
graph_view GraphView

Named graph view.

'original'
keep_metals bool

Whether to keep metal atoms in Complex graphs.

True
bond_mode BondMode

Bond handling regime.

'all'
output OutputKind

Output format.

'sparse_counts'
fp_size int

Folding size for folded outputs.

2048
emit_from EmitFromStructure

Emission selector (regime or list of original atom indices).

None
folded_format FoldCountsFormat

Folded counts output format.

'dict'
bits_format BitsFormat

Bits output format.

'set'
atom_type_key str

Node attribute key holding atom type.

'atom_type'
idx_key str

Node attribute key holding original atom index.

'idx'
bond_code_key str

Edge attribute key holding original bond code.

'bond_code'
fp_bond_code_key str

Edge attribute key to store fingerprint bond code.

'fp_bond_code'

__call__(structure, bit_info=None)

Compute a fingerprint for a structure with stored settings.

Parameters:

Name Type Description Default
structure Any

Ligand or Complex (must have attribute G).

required
bit_info Optional[Dict[int, Any]]

Optional provenance dict populated by the algorithm.

None

Returns:

Type Description
Any

Fingerprint in the configured output format.

Raises:

Type Description
See

func:fingerprint_from_structure.

MorganSpec(radius, use_chirality=False, use_bond_types=True, only_nonzero_invariants=False, include_redundant_environments=False, bond_code_key='bond_code', bond_idx_key='idx', chiral_tag_key='chiral_tag', cip_code_key='cip_code') dataclass

Parameters controlling Morgan fingerprint generation.

Parameters:

Name Type Description Default
radius int

Number of iterations (ECFP radius).

required
use_chirality bool

If True, incorporate chirality information.

False
use_bond_types bool

If True, incorporate bond types in neighbor pairs.

True
only_nonzero_invariants bool

If True, atoms with round-0 invariant==0 do not emit.

False
include_redundant_environments bool

If True, do not suppress redundant environments.

False
bond_code_key str

Edge attribute name for integer bond code.

'bond_code'
bond_idx_key str

Edge attribute name for unique bond index (for env masks).

'idx'
chiral_tag_key str

Node attribute name for chiral tag integer.

'chiral_tag'
cip_code_key str

Node attribute name for CIP code string ('R','S',...).

'cip_code'

Raises:

Type Description
ValueError

If radius is negative.

atom_pairs_sparse_counts(G, atom_invariants, emit_from=None, min_distance=1, max_distance=30, bits=32, pair_info=None)

Compute a sparse count AtomPairs fingerprint.

Parameters:

Name Type Description Default
G Graph

Molecular graph (NetworkX, undirected).

required
atom_invariants Sequence[int]

Round-0 atom invariants, indexed by node id (node id is atom id).

required
emit_from Optional[Iterable[int]]

Optional iterable of atom ids restricting emission: a pair (i, j) is included if i in emit_from OR j in emit_from. (If None, all pairs are included.)

None
min_distance int

Minimum shortest-path distance (in bonds) to include.

1
max_distance int

Maximum shortest-path distance (in bonds) to include.

30
bits int

Bit width for feature id hashing (default 32).

32
pair_info Optional[Dict[int, List[Dict[str, int]]]]

Optional mapping populated as pair_info[feature_id] -> list of dicts with keys: {'a', 'b', 'distance'}. This is helpful for debugging/analysis.

None

Returns:

Type Description
Dict[int, int]

Mapping feature_id -> count.

Raises:

Type Description
ValueError

If min/max distances are invalid or invariants are too short.

TypeError

If emit_from is a string.

atomic_invariants_from_graph(G, properties=None, bits=32, node_idxs=None, atom_type_key='atom_type')

Build per-atom invariants from a ctopo graph using selected properties.

If properties is a list/tuple of strings, that list is used for all atoms. If it is a mapping atom_type -> list[str], each atom uses the list corresponding to its atom_type (with fallback to DEFAULT_PROPERTIES).

Parameters:

Name Type Description Default
G Graph

NetworkX graph produced by ctopo (Ligand.G or Complex.G).

required
properties AtomicProperties | None

Ordered list of property keys to include for each atom, or a dict mapping atom_type -> property list. If not provided, DEFAULT_PROPERTIES is used.

None
bits int

Bit width for the hashed invariant integers (default 32).

32
node_idxs Sequence[int] | None

Optional list of node ids to compute invariants. If not provided, it falls back to sorted(G.nodes()).

None
atom_type_key str

Node attribute key holding the integer atom type.

'atom_type'

Returns:

Type Description
list[int]

List of hashed invariants, one per node in the chosen node order.

Raises:

Type Description
KeyError

If an unknown property key is provided, or a node lacks a required attribute.

ValueError

If bits <= 0.

TypeError

If properties is neither a sequence of strings nor a mapping.

counts_to_bits(counts, fp_size, output_format='set', dtype=None)

Convert sparse feature counts into a fixed-size presence representation.

Parameters:

Name Type Description Default
counts CountsFP

Sparse mapping feature_id -> count.

required
fp_size int

Target fingerprint length (number of positions).

required
output_format BitsFormat

Output format. Supported values: - 'set': return set of present bit indices - 'dict': return dict bit_index -> 1 for present bits - 'numpy': return dense numpy array of shape (fp_size,) with 0/1 values

'set'
dtype Optional[dtype]

Optional numpy dtype for output_format='numpy'. If omitted, uses numpy.uint8.

None

Returns:

Type Description
BitsOut

Presence representation in the requested format.

Raises:

Type Description
ValueError

If fp_size <= 0.

ValueError

If output_format is unsupported.

fingerprint_from_structure(structure, kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, bit_info=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code')

Compute a fingerprint from a Ligand or Complex (ctopo core objects).

This is a convenience wrapper that: 1) reads structure.G (NetworkX graph) 2) builds a named graph view (subgraph + relabel to 0..n-1) 3) writes fingerprint bond codes under fp_bond_code_key according to bond_mode 4) computes atom invariants from node attributes (including per-atom_type property selection) 5) resolves emit_from (regime or explicit list of original atom indices) into view node ids 6) calls :func:fingerprint

Indexing conventions: - The view graph is relabeled to node ids 0..n-1. - The original atom index is preserved as node attribute idx_key. - If emit_from is an explicit iterable, it is interpreted as original atom indices (values of idx_key), not view node ids.

Parameters:

Name Type Description Default
structure Any

Ligand or Complex object with attribute G.

required
kind FingerprintKind

Fingerprint kind ('morgan' or 'atompairs').

required
spec Optional[Union[MorganSpec, AtomPairsSpec]]

Algorithm configuration. - for kind='morgan', this must be a MorganSpec - for kind='atompairs', this can be an AtomPairsSpec or None (defaults are used)

required
atomic_properties AtomicProperties

Atomic-property selection for invariant construction. This can be either: - a list of property names applied to all atom types, or - a dict mapping atom_type -> list of property names.

required
graph_view GraphView

Named view applied before fingerprinting.

'original'
keep_metals bool

If False, attempts to drop metal atoms in the view (Complex use).

True
bond_mode BondMode

Bond handling regime for fingerprint bond codes.

'all'
output OutputKind

Output format: 'sparse_counts', 'folded_counts', or 'bits'.

'sparse_counts'
fp_size int

Size for folded outputs.

2048
emit_from EmitFromStructure

Optional emission selector: - None or 'all': emit from all atoms - named regime ('skeleton', 'substituent', 'donor', 'center') - explicit iterable of original atom indices (node[idx_key])

None
bit_info Optional[Dict[int, Any]]

Optional provenance mapping (see Morgan docs).

None
folded_format FoldCountsFormat

Output format for folded counts ('dict' or 'numpy').

'dict'
bits_format BitsFormat

Output format for bits ('set', 'dict', or 'numpy').

'set'
atom_type_key str

Node attribute key holding atom type.

'atom_type'
idx_key str

Node attribute key holding original atom index.

'idx'
bond_code_key str

Edge attribute key holding original bond code.

'bond_code'
fp_bond_code_key str

Edge attribute key to store fingerprint bond code.

'fp_bond_code'

Returns:

Type Description
Any

Fingerprint in the requested format.

Raises:

Type Description
AttributeError

If structure has no attribute G.

ValueError

If the created view is not relabeled to 0..n-1.

fold_counts(counts, fp_size, output_format='dict', dtype=None)

Fold sparse feature counts into a fixed-size hashed count fingerprint.

Parameters:

Name Type Description Default
counts CountsFP

Sparse mapping feature_id -> count.

required
fp_size int

Target fingerprint length (number of positions).

required
output_format FoldCountsFormat

Output format. Supported values: - 'dict': return dict bit_index -> count (sparse) - 'numpy': return dense numpy array of shape (fp_size,)

'dict'
dtype Optional[dtype]

Optional numpy dtype for output_format='numpy'. If omitted, uses numpy.int32.

None

Returns:

Type Description
FoldCountsOut

Folded fingerprint in the requested format.

Raises:

Type Description
ValueError

If fp_size <= 0.

ValueError

If output_format is unsupported.

make_fingerprinter(kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code')

Create a configured fingerprint callable.

Parameters:

Name Type Description Default
kind FingerprintKind

Fingerprint kind.

required
spec Optional[Union[MorganSpec, AtomPairsSpec]]

Algorithm configuration (MorganSpec for now).

required
atomic_properties AtomicProperties

Atom invariant property selection.

required
graph_view GraphView

Named graph view.

'original'
keep_metals bool

Whether to keep metal atoms in Complex graphs.

True
bond_mode BondMode

Bond handling regime.

'all'
output OutputKind

Output format.

'sparse_counts'
fp_size int

Folding size for folded outputs.

2048
emit_from EmitFromStructure

Emission selector (regime or list of original atom indices).

None
folded_format FoldCountsFormat

Folded counts output format.

'dict'
bits_format BitsFormat

Bits output format.

'set'
atom_type_key str

Node attribute key holding atom type.

'atom_type'
idx_key str

Node attribute key holding original atom index.

'idx'
bond_code_key str

Edge attribute key holding original bond code.

'bond_code'
fp_bond_code_key str

Edge attribute key to store fingerprint bond code.

'fp_bond_code'

Returns:

Type Description
Fingerprinter

A configured :class:Fingerprinter instance.

make_graph_view(G, view='original', keep_metals=True, atom_type_key='atom_type', idx_key='idx')

Create a relabeled subgraph view with mappings.

The returned graph is a copy, relabeled to contiguous node ids 0..n-1. The original atom index should remain available as node[idx_key].

Parameters:

Name Type Description Default
G Graph

Input molecular graph.

required
view GraphView

Named view to build.

'original'
keep_metals bool

If False, drops atoms of CENTER/METAL type.

True
atom_type_key str

Node attribute key holding atom types (AtomType codes).

'atom_type'
idx_key str

Node attribute key holding the original atom index.

'idx'

Returns:

Type Description
Tuple[Graph, Dict[int, int], Dict[int, int]]

A tuple (V, old_to_new, new_to_old) where: V: the view graph with node ids 0..n-1, old_to_new: mapping old node id -> new node id, new_to_old: mapping new node id -> old node id.

Raises:

Type Description
ValueError

If the view name is unknown.

morgan_sparse_counts(G, radius, atom_invariants, emit_from=None, use_chirality=False, use_bond_types=True, only_nonzero_invariants=False, include_redundant_environments=False, bit_info=None, bond_code_key='bond_code', bond_idx_key='idx', chiral_tag_key='chiral_tag', cip_code_key='cip_code')

Compute an RDKit-compatible sparse count Morgan fingerprint.

Parameters:

Name Type Description Default
G Graph

Molecular graph (NetworkX, undirected).

required
radius int

Number of iterations (ECFP radius).

required
atom_invariants Sequence[int]

Round-0 atom invariants, indexed by atom id (node id). For strict RDKit-compat, this should match RDKit’s invariants for the same molecule.

required
emit_from Optional[Iterable[int]]

Optional iterable of atom ids to act as centers (like RDKit emitFrom).

None
use_chirality bool

If True, incorporate chirality.

False
use_bond_types bool

If True, incorporate bond types in hashing.

True
only_nonzero_invariants bool

If True, atoms with invariant==0 (round-0) do not emit.

False
include_redundant_environments bool

If True, do not suppress redundant environments.

False
bit_info Optional[Dict[int, List[Dict[str, object]]]]

Optional dict populated as bit_info[feature_id] -> list of (atom_id, radius).

None
bond_code_key str

Edge attribute name for integer bond code.

'bond_code'
bond_idx_key str

Edge attribute name for unique bond index (for env masks).

'idx'
chiral_tag_key str

Node attribute name for chiral tag integer.

'chiral_tag'
cip_code_key str

Node attribute name for CIP code string ('R','S',...).

'cip_code'

Returns:

Type Description
Dict[int, int]

Mapping feature_id -> count.

Raises:

Type Description
ValueError

If node ids are incompatible with atom_invariants length or if required edge attributes are missing.