Advanced search
NIST Chemistry WebBook supports structure search, however to the best of our knowledge there is no straightforward way to implement it as a Python API. To overcome this problem, as well as WebBook’s limitation of found compounds, NistChemPy package contains dataframe with the main info on all NIST Chemistry WebBook compounds:
[1]:
import nistchempy as nist
import pandas as pd
pd.set_option('display.max_columns', None)
df = nist.get_all_data()
df
[1]:
ID | name | synonyms | formula | mol_weight | inchi | inchi_key | cas_rn | mol2D | mol3D | Gas phase thermochemistry data | Condensed phase thermochemistry data | Phase change data | Reaction thermochemistry data | Gas phase ion energetics data | Ion clustering data | IR Spectrum | THz IR spectrum | Mass spectrum (electron ionization) | UV/Visible spectrum | Gas Chromatography | Vibrational and/or electronic energy levels | Constants of diatomic molecules | Henry's Law data | Fluid Properties | Computational Chemistry Comparison and Benchmark Database | Electron-Impact Ionization Cross Sections (on physics web site) | Gas Phase Kinetics Database | Microwave spectra (on physics lab web site) | NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) | NIST Atomic Spectra Database - Levels Holdings (on physics web site) | NIST Atomic Spectra Database - Lines Holdings (on physics web site) | NIST Polycyclic Aromatic Hydrocarbon Structure Index | Reference simulation | Reference simulation: SPC/E Water | Reference simulation: TraPPE Carbon Dioxide | X-ray Photoelectron Spectroscopy Database, version 5.0 | NIST / TRC Web Thermo Tables, "lite" edition (thermophysical and thermochemical data) | NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | B100 | iron oxide anion | NaN | FeO- | 71.8450 | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... | https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | B1000 | AsF3..Cl anion | NaN | AsClF3- | 167.3700 | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | B1000000 | AgH2- | NaN | AgH2- | 109.8846 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | B1000001 | HAg(H2) | NaN | AgH3 | 110.8920 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | B1000002 | AgNO+ | NaN | AgNO+ | 137.8738 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129323 | U99777 | Methyl 3-hydroxycholest-5-en-26-oate, TMS deri... | Methyl (25RS)-3β-hydroxy-5-cholesten-26-oate, ... | C31 H54 O3 Si | 502.8442 | InChI=1S/C31H54O3Si/c1-21(10-9-11-22(2)29(32)3... | DNXGNXYNSBCWGX-QBUYVTDMSA-N | NaN | https://webbook.nist.gov/cgi/cbook.cgi?Str2Fil... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=U997... | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=U997... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
129324 | U99830 | 2-Methyl-3-oxovaleric acid, O,O'-bis(trimethyl... | 3-Oxopentanoic acid, 2-methyl, TMS\n2-Methyl-3... | C12 H26 O3 Si2 | 274.5040 | InChI=1S/C12H26O3Si2/c1-9-11(14-16(3,4)5)10(2)... | LXAIQDVPXKOIGO-KHPPLWFESA-N | NaN | https://webbook.nist.gov/cgi/inchi?Str2File=U9... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/inchi?ID=U99830&M... | NaN | https://webbook.nist.gov/cgi/inchi?ID=U99830&M... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
129325 | U99942 | 3-Hydroxy-3-(4'-hydroxy-3'-methoxyphenyl)propi... | Vanillylhydracrylic acid, tri-TMS\nVanillylhyd... | C19 H36 O5 Si3 | 428.7426 | InChI=1S/C19H36O5Si3/c1-21-18-13-15(11-12-16(1... | QCMUGKOFXVYNCF-UHFFFAOYSA-N | NaN | https://webbook.nist.gov/cgi/inchi?Str2File=U9... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/inchi?ID=U99942&M... | NaN | https://webbook.nist.gov/cgi/inchi?ID=U99942&M... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
129326 | U99947 | 2-Propylpentanoic acid, 2,3,4,6-tetra(trimethy... | Valproic acid, glucuronide, TMS | C26 H58 O7 Si4 | 595.0765 | InChI=1S/C26H58O7Si4/c1-15-17-20(18-16-2)25(27... | OVXMRISJDUWFKB-UHFFFAOYSA-N | NaN | https://webbook.nist.gov/cgi/inchi?Str2File=U9... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/inchi?ID=U99947&M... | NaN | https://webbook.nist.gov/cgi/inchi?ID=U99947&M... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
129327 | x | Y5O2 radical | NaN | O2 Y5 | 476.5281 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://webbook.nist.gov/cgi/cbook.cgi?ID=x&Ma... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
129328 rows × 39 columns
Its columns can be divided in 5 groups:
General properties:
ID
: NIST Compound IDname
: chemical namesynonyms
: synonymsformula
: chemical formulamol_weight
: molecular weigthinchi
/inchi_key
: InChI / InChIKey stringscas_rn
: CAS Registry Number
Molecular files:
mol2D
/mol3D
: 2D and 3D MOL-files
NIST Chemistry WebBook data:
Gas phase thermochemistry data
Condensed phase thermochemistry data
Phase change data
Reaction thermochemistry data
Gas phase ion energetics data
Ion clustering data
IR Spectrum
THz IR spectrum
Mass spectrum (electron ionization)
UV/Visible spectrum
Gas Chromatography
Vibrational and/or electronic energy levels
Constants of diatomic molecules
Henry’s Law data
Fluid Properties
NIST public data:
Computational Chemistry Comparison and Benchmark Database
Electron-Impact Ionization Cross Sections (on physics web site)
Gas Phase Kinetics Database
Microwave spectra (on physics lab web site)
NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)
NIST Atomic Spectra Database - Levels Holdings (on physics web site)
NIST Atomic Spectra Database - Lines Holdings (on physics web site)
NIST Polycyclic Aromatic Hydrocarbon Structure Index
Reference simulation
Reference simulation: SPC/E Water
Reference simulation: TraPPE Carbon Dioxide
X-ray Photoelectron Spectroscopy Database, version 5.0
NIST subscription data:
NIST / TRC Web Thermo Tables, “lite” edition (thermophysical and thermochemical data)
NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)
All columns except for the first group contain URLs for the corresponding data, allowing one to parse the relevant pages without the need to preload the compounds themselves:
[2]:
col = 'NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)'
df.loc[~df[col].isna(), ['ID', 'inchi', col]]
[2]:
ID | inchi | NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) | |
---|---|---|---|
11504 | C10028145 | InChI=1S/No | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
11587 | C10043922 | InChI=1S/Rn | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
11743 | C10097322 | InChI=1S/Br | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
16920 | C12385136 | InChI=1S/H | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
18616 | C13494809 | InChI=1S/Te | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
... | ... | ... | ... |
59684 | C7440735 | InChI=1S/Fr | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
59685 | C7440746 | InChI=1S/In | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
60897 | C7704349 | InChI=1S/S | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
60995 | C7723140 | InChI=1S/P | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
61257 | C7782492 | InChI=1S/Se | https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... |
101 rows × 3 columns
This dataframe can be used to limit all entries to those ones with desired properties. To use short names for NIST Chemistry WebBook properties, one can use the nist.get_search_parameters
function:
[3]:
ps = nist.get_search_parameters()
ps
[3]:
{'use_SI': 'Units for thermodynamic data, "SI" if True and "calories" if False',
'match_isotopes': 'Exactly match the specified isotopes (formula search only)',
'allow_other': 'Allow elements not specified in formula (formula search only)',
'allow_extra': 'Allow more atoms of elements in formula than specified (formula search only)',
'no_ion': 'Exclude ions from the search (formula search only)',
'cTG': 'Gas phase thermochemistry data',
'cTC': 'Condensed phase thermochemistry data',
'cTP': 'Phase change data',
'cTR': 'Reaction thermochemistry data',
'cIE': 'Gas phase ion energetics data',
'cIC': 'Ion clustering data',
'cIR': 'IR Spectrum',
'cTZ': 'THz IR spectrum',
'cMS': 'Mass spectrum (electron ionization)',
'cUV': 'UV/Visible spectrum',
'cGC': 'Gas Chromatography',
'cES': 'Vibrational and/or electronic energy levels',
'cDI': 'Constants of diatomic molecules',
'cSO': "Henry's Law data"}
[4]:
pd.set_option('display.max_columns', 20)
sub = df.loc[~df.inchi.isna() & ~df.mol2D.isna() & ~df[ps['cMS']].isna() & ~df[ps['cUV']].isna()]
sub
[4]:
ID | name | synonyms | formula | mol_weight | inchi | inchi_key | cas_rn | mol2D | mol3D | ... | NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) | NIST Atomic Spectra Database - Levels Holdings (on physics web site) | NIST Atomic Spectra Database - Lines Holdings (on physics web site) | NIST Polycyclic Aromatic Hydrocarbon Structure Index | Reference simulation | Reference simulation: SPC/E Water | Reference simulation: TraPPE Carbon Dioxide | X-ray Photoelectron Spectroscopy Database, version 5.0 | NIST / TRC Web Thermo Tables, "lite" edition (thermophysical and thermochemical data) | NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
11392 | C100016 | p-Nitroaniline | Benzenamine, 4-nitro-\nAniline, p-nitro-\np-Am... | C6 H6 N2 O2 | 138.1240 | InChI=1S/C6H6N2O2/c7-5-1-3-6(4-2-5)8(9)10/h1-4... | TYMLOMAKGOJONV-UHFFFAOYSA-N | 100-01-6 | https://webbook.nist.gov/cgi/inchi?Str2File=C1... | https://webbook.nist.gov/cgi/inchi?Str3File=C1... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://srdata.nist.gov/xps/SpectralByCompdDd/... | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
11393 | C100027 | Phenol, 4-nitro- | Phenol, p-nitro-\np-Hydroxynitrobenzene\np-Nit... | C6 H5 NO3 | 139.1088 | InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8H | BTJIUGUIPKRLHP-UHFFFAOYSA-N | 100-02-7 | https://webbook.nist.gov/cgi/inchi?Str2File=C1... | https://webbook.nist.gov/cgi/inchi?Str3File=C1... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
11412 | C100094 | Benzoic acid, 4-methoxy- | p-Anisic acid\np-Methoxybenzoic acid\nDraconic... | C8 H8 O3 | 152.1473 | InChI=1S/C8H8O3/c1-11-7-4-2-6(3-5-7)8(9)10/h2-... | ZEYHEAKUIGZSGI-UHFFFAOYSA-N | 100-09-4 | https://webbook.nist.gov/cgi/inchi?Str2File=C1... | https://webbook.nist.gov/cgi/inchi?Str3File=C1... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
11417 | C100107 | Benzaldehyde, 4-(dimethylamino)- | Benzaldehyde, p-(dimethylamino)-\np-(Dimethyla... | C9 H11 NO | 149.1897 | InChI=1S/C9H11NO/c1-10(2)9-5-3-8(7-11)4-6-9/h3... | BGNGWHSBYQYVRX-UHFFFAOYSA-N | 100-10-7 | https://webbook.nist.gov/cgi/inchi?Str2File=C1... | https://webbook.nist.gov/cgi/inchi?Str3File=C1... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
11422 | C100129 | Benzene, 1-ethyl-4-nitro- | p-Ethylnitrobenzene\np-Nitroethylbenzene\np-Ni... | C8 H9 NO2 | 151.1626 | InChI=1S/C8H9NO2/c1-2-7-3-5-8(6-4-7)9(10)11/h3... | RESTWAHJFMZUIZ-UHFFFAOYSA-N | 100-12-9 | https://webbook.nist.gov/cgi/inchi?Str2File=C1... | https://webbook.nist.gov/cgi/inchi?Str3File=C1... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
66252 | C99923 | Acetophenone, 4'-amino- | Ethanone, 1-(4-aminophenyl)-\np-Acetylaniline\... | C8 H9 NO | 135.1632 | InChI=1S/C8H9NO/c1-6(10)7-2-4-8(9)5-3-7/h2-5H,... | GPRYKVSEZCQIHD-UHFFFAOYSA-N | 99-92-3 | https://webbook.nist.gov/cgi/inchi?Str2File=C9... | https://webbook.nist.gov/cgi/inchi?Str3File=C9... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
66254 | C99934 | Acetophenone, 4'-hydroxy- | Ethanone, 1-(4-hydroxyphenyl)-\np-Hydroxyaceto... | C8 H8 O2 | 136.1479 | InChI=1S/C8H8O2/c1-6(9)7-2-4-8(10)5-3-7/h2-5,1... | TXFPEBPIARQUIG-UHFFFAOYSA-N | 99-93-4 | https://webbook.nist.gov/cgi/inchi?Str2File=C9... | https://webbook.nist.gov/cgi/inchi?Str3File=C9... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
66257 | C99945 | Benzoic acid, 4-methyl- | p-Toluic acid\np-Methylbenzoic acid\np-Toluyli... | C8 H8 O2 | 136.1479 | InChI=1S/C8H8O2/c1-6-2-4-7(5-3-6)8(9)10/h2-5H,... | LPNBBFKOUUSUDB-UHFFFAOYSA-N | 99-94-5 | https://webbook.nist.gov/cgi/inchi?Str2File=C9... | https://webbook.nist.gov/cgi/inchi?Str3File=C9... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
66264 | C99967 | Benzoic acid, 4-hydroxy- | Benzoic acid, p-hydroxy-\np-Hydroxybenzoic aci... | C7 H6 O3 | 138.1207 | InChI=1S/C7H6O3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8... | FJKROLUGYXJWQN-UHFFFAOYSA-N | 99-96-7 | https://webbook.nist.gov/cgi/inchi?Str2File=C9... | https://webbook.nist.gov/cgi/inchi?Str3File=C9... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
66269 | C99978 | Benzenamine, N,N,4-trimethyl- | p-Toluidine, N,N-dimethyl-\np-Methyl-N,N-dimet... | C9 H13 N | 135.2062 | InChI=1S/C9H13N/c1-8-4-6-9(7-5-8)10(2)3/h4-7H,... | GYVGXEWAOAAJEU-UHFFFAOYSA-N | 99-97-8 | https://webbook.nist.gov/cgi/inchi?Str2File=C9... | https://webbook.nist.gov/cgi/inchi?Str3File=C9... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | https://wtt-pro.nist.gov/wtt-pro/index.html?cm... |
1469 rows × 39 columns
Also one can run a substructure search, e.g. to get only non-aromatic compounds:
[5]:
from rdkit import Chem
# supress rdkit warnings
from rdkit import RDLogger
RDLogger.DisableLog('rdApp.*')
# prepare molecules for search
mols = [(ID, Chem.MolFromInchi(inchi)) for ID, inchi in zip(sub.ID, sub.inchi)]
mols = [(ID, mol) for ID, mol in mols if mol]
# search
pat = Chem.MolFromSmarts('[a]')
hits = [ID for ID, mol in mols if not mol.HasSubstructMatch(pat)]
print(f'{len(hits)} of {len(sub)} compounds were selected')
320 of 1469 compounds were selected
Those compounds can be retrieved via nist.get_compound
function.