Basic Search
Basic search
There are five available search types:
by name (
search_type = 'name'
);by InChI (
search_type = 'inchi'
);by CAS RN (
search_type = 'cas'
);by chemical formula (
search_type = 'formula'
);and by NIST Compound ID (
search_type = 'id'
):
[1]:
import nistchempy as nist
s = nist.run_search(identifier = '1,2,3*-butane', search_type = 'name')
s
[1]:
NistSearch(success=True, num_compounds=10, lost=False)
List of found compounds is stored in the compound_ids
attribute, and the compounds can be retrieved via the load_found_compounds
method:
[2]:
s.compound_ids
[2]:
['C1871585',
'C18338404',
'C298180',
'C1529686',
'C632053',
'C13138517',
'C62521691',
'C76397234',
'C101257798',
'C1464535']
[3]:
s.load_found_compounds()
s.compounds
[3]:
[NistCompound(ID=C1871585),
NistCompound(ID=C18338404),
NistCompound(ID=C298180),
NistCompound(ID=C1529686),
NistCompound(ID=C632053),
NistCompound(ID=C13138517),
NistCompound(ID=C62521691),
NistCompound(ID=C76397234),
NistCompound(ID=C101257798),
NistCompound(ID=C1464535)]
Search Parameters
In addition to the main identifier, you can limit the search using several parameters, which can be using the print_search_params
function:
[4]:
nist.print_search_parameters()
use_SI : Units for thermodynamic data, "SI" if True and "calories" if False
match_isotopes : Exactly match the specified isotopes (formula search only)
allow_other : Allow elements not specified in formula (formula search only)
allow_extra : Allow more atoms of elements in formula than specified (formula search only)
no_ion : Exclude ions from the search (formula search only)
cTG : Gas phase thermochemistry data
cTC : Condensed phase thermochemistry data
cTP : Phase change data
cTR : Reaction thermochemistry data
cIE : Gas phase ion energetics data
cIC : Ion clustering data
cIR : IR Spectrum
cTZ : THz IR spectrum
cMS : Mass spectrum (electron ionization)
cUV : UV/Visible spectrum
cGC : Gas Chromatography
cES : Vibrational and/or electronic energy levels
cDI : Constants of diatomic molecules
cSO : Henry's Law data
These options can be specified as arguments of the nist.search
function or defined in nist.NistSearchParameters
object:
[5]:
# query
identifier = 'C4H?Cl2'
search_type = 'formula'
# direct search (entries with IR spectra)
s1 = nist.run_search(identifier, search_type, cIR = True)
# search with NistSearchParameters
params = nist.NistSearchParameters(cIR = True)
s2 = nist.run_search(identifier, search_type, params)
# compare searches
print(sorted(s1.compound_ids))
print(sorted(s2.compound_ids))
['C110565', 'C110576', 'C1190223', 'C4028562', 'C4279225', 'C541333', 'C594376', 'C616217', 'C7581977', 'C760236', 'C764410', 'C821103', 'C926578']
['C110565', 'C110576', 'C1190223', 'C4028562', 'C4279225', 'C541333', 'C594376', 'C616217', 'C7581977', 'C760236', 'C764410', 'C821103', 'C926578']
Limit of Found Compounds
NIST Chemistry WebBook limits the search results by 400 compounds. To check if that happened for your search, you need to check the lost
property:
[6]:
params = nist.NistSearchParameters(no_ion = True, cMS = True)
s = nist.run_search('C6H?O?', 'formula', params)
s
[6]:
NistSearch(success=True, num_compounds=400, lost=True)
To overcome that when searching for a large number of substances, try to break the chemical formula into subsets:
[7]:
sub_searches = []
for i in range(1, 7):
s = nist.run_search(f'C6H?O{i}', 'formula', params)
sub_searches.append( (len(s.compound_ids), s.lost) )
sub_searches
[7]:
[(170, False), (178, False), (80, False), (42, False), (7, False), (24, False)]
The better way to overcome this problem is to use the pre-prepared compound list. For more details see the Advanced Search
page of this CookBook.
Structural Search
nistchempy
also supports structural search with exact match and substructural modes. For the purpose one needs to generate MOL-file of the molecule or molecular fragment (NIST API requirement). The easiest way is to generate text block of MOL-file using rdkit
:
[8]:
from rdkit import Chem
mol = Chem.MolFromSmiles('CC(=O)OCC')
text = Chem.MolToMolBlock(mol)
print(text)
RDKit 2D
6 5 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2990 2.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.5981 -0.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
3.8971 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
5.1962 -0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 2 0
2 4 1 0
4 5 1 0
5 6 1 0
M END
[9]:
# exact match
s = nist.run_structural_search(molblock=text, search_type='struct')
print(s.compound_ids)
['C141786']
[10]:
# substructure search
s = nist.run_structural_search(molblock=text, search_type='sub')
s
[10]:
NistSearch(success=True, num_compounds=400, lost=True)
If MOL-file already exists, then instead of molblock
argument one can use molfile
, e.g. nist.run_substructural_search(molfile='path.mol', search_type='sub')
.