Fingerprints and descriptors¶
ctopo.descriptors
¶
Descriptor and fingerprint computation utilities.
This subpackage contains: - RDKit-free implementations of fingerprints (Morgan, AtomPairs) - atom invariant hashing helpers - graph view helpers for selecting subgraphs (skeleton/substituent, etc.) - utilities to convert sparse count fingerprints into folded arrays or bit sets
Public API is re-exported here for convenience.
ALLOWED_PROPERTIES = ('atom_type', 'Z', 'degree', 'heavy_degree', 'num_pi_electrons', 'num_hs', 'charge', 'in_ring', 'aromatic')
module-attribute
¶
BitsFormat = Literal['set', 'dict', 'numpy']
module-attribute
¶
DEFAULT_PROPERTIES = ('atom_type', 'Z', 'heavy_degree', 'num_pi_electrons', 'num_hs', 'in_ring')
module-attribute
¶
FoldCountsFormat = Literal['dict', 'numpy']
module-attribute
¶
GraphView = Literal['original', 'skeleton', 'skeleton_alpha_substituents', 'substituent', 'substituent_alpha_skeleton']
module-attribute
¶
__all__ = ['ALLOWED_PROPERTIES', 'DEFAULT_PROPERTIES', 'atomic_invariants_from_graph', 'BitsFormat', 'FoldCountsFormat', 'counts_to_bits', 'fold_counts', 'GraphView', 'make_graph_view', 'AtomPairsSpec', 'atom_pairs_sparse_counts', 'MorganSpec', 'morgan_sparse_counts', 'Fingerprinter', 'fingerprint', 'fingerprint_from_structure', 'make_fingerprinter']
module-attribute
¶
AtomPairsSpec(min_distance=1, max_distance=30, bits=32)
dataclass
¶
Parameters controlling AtomPairs fingerprint generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_distance
|
int
|
Minimum topological distance (in bonds) to include. |
1
|
max_distance
|
int
|
Maximum topological distance (in bonds) to include. |
30
|
bits
|
int
|
Bit width for feature id hashing (default 32). |
32
|
Fingerprinter(kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code')
dataclass
¶
Configured fingerprint callable.
This is a small factory object to bundle fingerprint settings into a reusable callable. It is meant for repeated feature extraction from many Ligands/Complexes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
FingerprintKind
|
Fingerprint kind. |
required |
spec
|
Optional[Union[MorganSpec, AtomPairsSpec]]
|
Algorithm configuration (MorganSpec for now). |
required |
atomic_properties
|
AtomicProperties
|
Atom invariant property selection. |
required |
graph_view
|
GraphView
|
Named graph view. |
'original'
|
keep_metals
|
bool
|
Whether to keep metal atoms in Complex graphs. |
True
|
bond_mode
|
BondMode
|
Bond handling regime. |
'all'
|
output
|
OutputKind
|
Output format. |
'sparse_counts'
|
fp_size
|
int
|
Folding size for folded outputs. |
2048
|
emit_from
|
EmitFromStructure
|
Emission selector (regime or list of original atom indices). |
None
|
folded_format
|
FoldCountsFormat
|
Folded counts output format. |
'dict'
|
bits_format
|
BitsFormat
|
Bits output format. |
'set'
|
atom_type_key
|
str
|
Node attribute key holding atom type. |
'atom_type'
|
idx_key
|
str
|
Node attribute key holding original atom index. |
'idx'
|
bond_code_key
|
str
|
Edge attribute key holding original bond code. |
'bond_code'
|
fp_bond_code_key
|
str
|
Edge attribute key to store fingerprint bond code. |
'fp_bond_code'
|
__call__(structure, bit_info=None)
¶
Compute a fingerprint for a structure with stored settings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
structure
|
Any
|
Ligand or Complex (must have attribute G). |
required |
bit_info
|
Optional[Dict[int, Any]]
|
Optional provenance dict populated by the algorithm. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
Fingerprint in the configured output format. |
Raises:
| Type | Description |
|---|---|
See
|
func: |
MorganSpec(radius, use_chirality=False, use_bond_types=True, only_nonzero_invariants=False, include_redundant_environments=False, bond_code_key='bond_code', bond_idx_key='idx', chiral_tag_key='chiral_tag', cip_code_key='cip_code')
dataclass
¶
Parameters controlling Morgan fingerprint generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
radius
|
int
|
Number of iterations (ECFP radius). |
required |
use_chirality
|
bool
|
If True, incorporate chirality information. |
False
|
use_bond_types
|
bool
|
If True, incorporate bond types in neighbor pairs. |
True
|
only_nonzero_invariants
|
bool
|
If True, atoms with round-0 invariant==0 do not emit. |
False
|
include_redundant_environments
|
bool
|
If True, do not suppress redundant environments. |
False
|
bond_code_key
|
str
|
Edge attribute name for integer bond code. |
'bond_code'
|
bond_idx_key
|
str
|
Edge attribute name for unique bond index (for env masks). |
'idx'
|
chiral_tag_key
|
str
|
Node attribute name for chiral tag integer. |
'chiral_tag'
|
cip_code_key
|
str
|
Node attribute name for CIP code string ('R','S',...). |
'cip_code'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If radius is negative. |
atom_pairs_sparse_counts(G, atom_invariants, emit_from=None, min_distance=1, max_distance=30, bits=32, pair_info=None)
¶
Compute a sparse count AtomPairs fingerprint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
Graph
|
Molecular graph (NetworkX, undirected). |
required |
atom_invariants
|
Sequence[int]
|
Round-0 atom invariants, indexed by node id (node id is atom id). |
required |
emit_from
|
Optional[Iterable[int]]
|
Optional iterable of atom ids restricting emission: a pair (i, j) is included if i in emit_from OR j in emit_from. (If None, all pairs are included.) |
None
|
min_distance
|
int
|
Minimum shortest-path distance (in bonds) to include. |
1
|
max_distance
|
int
|
Maximum shortest-path distance (in bonds) to include. |
30
|
bits
|
int
|
Bit width for feature id hashing (default 32). |
32
|
pair_info
|
Optional[Dict[int, List[Dict[str, int]]]]
|
Optional mapping populated as pair_info[feature_id] -> list of dicts with keys: {'a', 'b', 'distance'}. This is helpful for debugging/analysis. |
None
|
Returns:
| Type | Description |
|---|---|
Dict[int, int]
|
Mapping feature_id -> count. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If min/max distances are invalid or invariants are too short. |
TypeError
|
If emit_from is a string. |
atomic_invariants_from_graph(G, properties=None, bits=32, node_idxs=None, atom_type_key='atom_type')
¶
Build per-atom invariants from a ctopo graph using selected properties.
If properties is a list/tuple of strings, that list is used for all atoms.
If it is a mapping atom_type -> list[str], each atom uses the list
corresponding to its atom_type (with fallback to DEFAULT_PROPERTIES).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
Graph
|
NetworkX graph produced by ctopo (Ligand.G or Complex.G). |
required |
properties
|
AtomicProperties | None
|
Ordered list of property keys to include for each atom, or
a dict mapping |
None
|
bits
|
int
|
Bit width for the hashed invariant integers (default 32). |
32
|
node_idxs
|
Sequence[int] | None
|
Optional list of node ids to compute invariants. If not provided,
it falls back to |
None
|
atom_type_key
|
str
|
Node attribute key holding the integer atom type. |
'atom_type'
|
Returns:
| Type | Description |
|---|---|
list[int]
|
List of hashed invariants, one per node in the chosen node order. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If an unknown property key is provided, or a node lacks a required attribute. |
ValueError
|
If bits <= 0. |
TypeError
|
If properties is neither a sequence of strings nor a mapping. |
counts_to_bits(counts, fp_size, output_format='set', dtype=None)
¶
Convert sparse feature counts into a fixed-size presence representation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
counts
|
CountsFP
|
Sparse mapping feature_id -> count. |
required |
fp_size
|
int
|
Target fingerprint length (number of positions). |
required |
output_format
|
BitsFormat
|
Output format. Supported values: - 'set': return set of present bit indices - 'dict': return dict bit_index -> 1 for present bits - 'numpy': return dense numpy array of shape (fp_size,) with 0/1 values |
'set'
|
dtype
|
Optional[dtype]
|
Optional numpy dtype for output_format='numpy'. If omitted, uses numpy.uint8. |
None
|
Returns:
| Type | Description |
|---|---|
BitsOut
|
Presence representation in the requested format. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If fp_size <= 0. |
ValueError
|
If output_format is unsupported. |
fingerprint_from_structure(structure, kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, bit_info=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code')
¶
Compute a fingerprint from a Ligand or Complex (ctopo core objects).
This is a convenience wrapper that:
1) reads structure.G (NetworkX graph)
2) builds a named graph view (subgraph + relabel to 0..n-1)
3) writes fingerprint bond codes under fp_bond_code_key according to bond_mode
4) computes atom invariants from node attributes (including per-atom_type property selection)
5) resolves emit_from (regime or explicit list of original atom indices) into view node ids
6) calls :func:fingerprint
Indexing conventions:
- The view graph is relabeled to node ids 0..n-1.
- The original atom index is preserved as node attribute idx_key.
- If emit_from is an explicit iterable, it is interpreted as original atom indices
(values of idx_key), not view node ids.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
structure
|
Any
|
Ligand or Complex object with attribute |
required |
kind
|
FingerprintKind
|
Fingerprint kind ('morgan' or 'atompairs'). |
required |
spec
|
Optional[Union[MorganSpec, AtomPairsSpec]]
|
Algorithm configuration. - for kind='morgan', this must be a MorganSpec - for kind='atompairs', this can be an AtomPairsSpec or None (defaults are used) |
required |
atomic_properties
|
AtomicProperties
|
Atomic-property selection for invariant construction. This can be either: - a list of property names applied to all atom types, or - a dict mapping atom_type -> list of property names. |
required |
graph_view
|
GraphView
|
Named view applied before fingerprinting. |
'original'
|
keep_metals
|
bool
|
If False, attempts to drop metal atoms in the view (Complex use). |
True
|
bond_mode
|
BondMode
|
Bond handling regime for fingerprint bond codes. |
'all'
|
output
|
OutputKind
|
Output format: 'sparse_counts', 'folded_counts', or 'bits'. |
'sparse_counts'
|
fp_size
|
int
|
Size for folded outputs. |
2048
|
emit_from
|
EmitFromStructure
|
Optional emission selector: - None or 'all': emit from all atoms - named regime ('skeleton', 'substituent', 'donor', 'center') - explicit iterable of original atom indices (node[idx_key]) |
None
|
bit_info
|
Optional[Dict[int, Any]]
|
Optional provenance mapping (see Morgan docs). |
None
|
folded_format
|
FoldCountsFormat
|
Output format for folded counts ('dict' or 'numpy'). |
'dict'
|
bits_format
|
BitsFormat
|
Output format for bits ('set', 'dict', or 'numpy'). |
'set'
|
atom_type_key
|
str
|
Node attribute key holding atom type. |
'atom_type'
|
idx_key
|
str
|
Node attribute key holding original atom index. |
'idx'
|
bond_code_key
|
str
|
Edge attribute key holding original bond code. |
'bond_code'
|
fp_bond_code_key
|
str
|
Edge attribute key to store fingerprint bond code. |
'fp_bond_code'
|
Returns:
| Type | Description |
|---|---|
Any
|
Fingerprint in the requested format. |
Raises:
| Type | Description |
|---|---|
AttributeError
|
If structure has no attribute |
ValueError
|
If the created view is not relabeled to 0..n-1. |
fold_counts(counts, fp_size, output_format='dict', dtype=None)
¶
Fold sparse feature counts into a fixed-size hashed count fingerprint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
counts
|
CountsFP
|
Sparse mapping feature_id -> count. |
required |
fp_size
|
int
|
Target fingerprint length (number of positions). |
required |
output_format
|
FoldCountsFormat
|
Output format. Supported values: - 'dict': return dict bit_index -> count (sparse) - 'numpy': return dense numpy array of shape (fp_size,) |
'dict'
|
dtype
|
Optional[dtype]
|
Optional numpy dtype for output_format='numpy'. If omitted, uses numpy.int32. |
None
|
Returns:
| Type | Description |
|---|---|
FoldCountsOut
|
Folded fingerprint in the requested format. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If fp_size <= 0. |
ValueError
|
If output_format is unsupported. |
make_fingerprinter(kind, spec, atomic_properties, graph_view='original', keep_metals=True, bond_mode='all', output='sparse_counts', fp_size=2048, emit_from=None, folded_format='dict', bits_format='set', atom_type_key='atom_type', idx_key='idx', bond_code_key='bond_code', fp_bond_code_key='fp_bond_code')
¶
Create a configured fingerprint callable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
FingerprintKind
|
Fingerprint kind. |
required |
spec
|
Optional[Union[MorganSpec, AtomPairsSpec]]
|
Algorithm configuration (MorganSpec for now). |
required |
atomic_properties
|
AtomicProperties
|
Atom invariant property selection. |
required |
graph_view
|
GraphView
|
Named graph view. |
'original'
|
keep_metals
|
bool
|
Whether to keep metal atoms in Complex graphs. |
True
|
bond_mode
|
BondMode
|
Bond handling regime. |
'all'
|
output
|
OutputKind
|
Output format. |
'sparse_counts'
|
fp_size
|
int
|
Folding size for folded outputs. |
2048
|
emit_from
|
EmitFromStructure
|
Emission selector (regime or list of original atom indices). |
None
|
folded_format
|
FoldCountsFormat
|
Folded counts output format. |
'dict'
|
bits_format
|
BitsFormat
|
Bits output format. |
'set'
|
atom_type_key
|
str
|
Node attribute key holding atom type. |
'atom_type'
|
idx_key
|
str
|
Node attribute key holding original atom index. |
'idx'
|
bond_code_key
|
str
|
Edge attribute key holding original bond code. |
'bond_code'
|
fp_bond_code_key
|
str
|
Edge attribute key to store fingerprint bond code. |
'fp_bond_code'
|
Returns:
| Type | Description |
|---|---|
Fingerprinter
|
A configured :class: |
make_graph_view(G, view='original', keep_metals=True, atom_type_key='atom_type', idx_key='idx')
¶
Create a relabeled subgraph view with mappings.
The returned graph is a copy, relabeled to contiguous node ids 0..n-1.
The original atom index should remain available as node[idx_key].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
Graph
|
Input molecular graph. |
required |
view
|
GraphView
|
Named view to build. |
'original'
|
keep_metals
|
bool
|
If False, drops atoms of CENTER/METAL type. |
True
|
atom_type_key
|
str
|
Node attribute key holding atom types (AtomType codes). |
'atom_type'
|
idx_key
|
str
|
Node attribute key holding the original atom index. |
'idx'
|
Returns:
| Type | Description |
|---|---|
Tuple[Graph, Dict[int, int], Dict[int, int]]
|
A tuple (V, old_to_new, new_to_old) where: V: the view graph with node ids 0..n-1, old_to_new: mapping old node id -> new node id, new_to_old: mapping new node id -> old node id. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the view name is unknown. |
morgan_sparse_counts(G, radius, atom_invariants, emit_from=None, use_chirality=False, use_bond_types=True, only_nonzero_invariants=False, include_redundant_environments=False, bit_info=None, bond_code_key='bond_code', bond_idx_key='idx', chiral_tag_key='chiral_tag', cip_code_key='cip_code')
¶
Compute an RDKit-compatible sparse count Morgan fingerprint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
G
|
Graph
|
Molecular graph (NetworkX, undirected). |
required |
radius
|
int
|
Number of iterations (ECFP radius). |
required |
atom_invariants
|
Sequence[int]
|
Round-0 atom invariants, indexed by atom id (node id). For strict RDKit-compat, this should match RDKit’s invariants for the same molecule. |
required |
emit_from
|
Optional[Iterable[int]]
|
Optional iterable of atom ids to act as centers (like RDKit emitFrom). |
None
|
use_chirality
|
bool
|
If True, incorporate chirality. |
False
|
use_bond_types
|
bool
|
If True, incorporate bond types in hashing. |
True
|
only_nonzero_invariants
|
bool
|
If True, atoms with invariant==0 (round-0) do not emit. |
False
|
include_redundant_environments
|
bool
|
If True, do not suppress redundant environments. |
False
|
bit_info
|
Optional[Dict[int, List[Dict[str, object]]]]
|
Optional dict populated as bit_info[feature_id] -> list of (atom_id, radius). |
None
|
bond_code_key
|
str
|
Edge attribute name for integer bond code. |
'bond_code'
|
bond_idx_key
|
str
|
Edge attribute name for unique bond index (for env masks). |
'idx'
|
chiral_tag_key
|
str
|
Node attribute name for chiral tag integer. |
'chiral_tag'
|
cip_code_key
|
str
|
Node attribute name for CIP code string ('R','S',...). |
'cip_code'
|
Returns:
| Type | Description |
|---|---|
Dict[int, int]
|
Mapping feature_id -> count. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If node ids are incompatible with atom_invariants length or if required edge attributes are missing. |