Basic Search
Basic search
There are five available search types:
by name (
search_type = 'name'
);by InChI (
search_type = 'inchi'
);by CAS RN (
search_type = 'cas'
);by chemical formula (
search_type = 'formula'
);and by NIST Compound ID (
search_type = 'id'
):
[1]:
import nistchempy as nist
s = nist.run_search(identifier = '1,2,3*-butane', search_type = 'name')
s
[1]:
NistSearch(success=True, num_compounds=10, lost=False)
List of found compounds is stored in the compound_ids
attribute, and the compounds can be retrieved via the load_found_compounds
method:
[2]:
s.compound_ids
[2]:
['C1871585',
'C18338404',
'C298180',
'C1529686',
'C632053',
'C13138517',
'C62521691',
'C76397234',
'C101257798',
'C1464535']
[3]:
s.load_found_compounds()
s.compounds
[3]:
[NistCompound(ID=C1871585),
NistCompound(ID=C18338404),
NistCompound(ID=C298180),
NistCompound(ID=C1529686),
NistCompound(ID=C632053),
NistCompound(ID=C13138517),
NistCompound(ID=C62521691),
NistCompound(ID=C76397234),
NistCompound(ID=C101257798),
NistCompound(ID=C1464535)]
Search Parameters
In addition to the main identifier, you can limit the search using several parameters, which can be using the print_search_params
function:
[4]:
nist.print_search_parameters()
use_SI : Units for thermodynamic data, "SI" if True and "calories" if False
match_isotopes : Exactly match the specified isotopes (formula search only)
allow_other : Allow elements not specified in formula (formula search only)
allow_extra : Allow more atoms of elements in formula than specified (formula search only)
no_ion : Exclude ions from the search (formula search only)
cTG : Gas phase thermochemistry data
cTC : Condensed phase thermochemistry data
cTP : Phase change data
cTR : Reaction thermochemistry data
cIE : Gas phase ion energetics data
cIC : Ion clustering data
cIR : IR Spectrum
cTZ : THz IR spectrum
cMS : Mass spectrum (electron ionization)
cUV : UV/Visible spectrum
cGC : Gas Chromatography
cES : Vibrational and/or electronic energy levels
cDI : Constants of diatomic molecules
cSO : Henry's Law data
These options can be specified as arguments of the nist.search
function or defined in nist.NistSearchParameters
object:
[5]:
# query
identifier = 'C4H?Cl2'
search_type = 'formula'
# direct search (entries with IR spectra)
s1 = nist.run_search(identifier, search_type, cIR = True)
# search with NistSearchParameters
params = nist.NistSearchParameters(cIR = True)
s2 = nist.run_search(identifier, search_type, params)
# compare searches
print(sorted(s1.compound_ids))
print(sorted(s2.compound_ids))
['C110565', 'C110576', 'C1190223', 'C4028562', 'C4279225', 'C541333', 'C594376', 'C616217', 'C7581977', 'C760236', 'C764410', 'C821103', 'C926578']
['C110565', 'C110576', 'C1190223', 'C4028562', 'C4279225', 'C541333', 'C594376', 'C616217', 'C7581977', 'C760236', 'C764410', 'C821103', 'C926578']
Limit of Found Compounds
NIST Chemistry WebBook limits the search results by 400 compounds. To check if that happened for your search, you need to check the lost
property:
[6]:
params = nist.NistSearchParameters(no_ion = True, cMS = True)
s = nist.run_search('C6H?O?', 'formula', params)
s
[6]:
NistSearch(success=True, num_compounds=400, lost=True)
To overcome that when searching for a large number of substances, try to break the chemical formula into subsets:
[7]:
sub_searches = []
for i in range(1, 7):
s = nist.run_search(f'C6H?O{i}', 'formula', params)
sub_searches.append( (len(s.compound_ids), s.lost) )
sub_searches
[7]:
[(170, False), (178, False), (80, False), (42, False), (7, False), (24, False)]
The better way is to overcome this problem is to use the pre-prepared compound list. For more details see the Structure Search
page of the CookBook.