NistChemPy

Cookbook:

  • Basic Search
  • Compound Properties
  • Advanced Search
  • Requests Configuration

Package details:

  • Package API
  • Changelog
NistChemPy
  • Advanced search
  • View page source

Advanced search

NIST Chemistry WebBook supports structure search, however to the best of our knowledge there is no straightforward way to implement it as a Python API. To overcome this problem, as well as WebBook’s limitation of found compounds, NistChemPy package contains dataframe with the main info on all NIST Chemistry WebBook compounds:

[1]:
import nistchempy as nist
import pandas as pd

pd.set_option('display.max_columns', None)
df = nist.get_all_data()
df
[1]:
ID name synonyms formula mol_weight inchi inchi_key cas_rn mol2D mol3D Gas phase thermochemistry data Condensed phase thermochemistry data Phase change data Reaction thermochemistry data Gas phase ion energetics data Ion clustering data IR Spectrum THz IR spectrum Mass spectrum (electron ionization) UV/Visible spectrum Gas Chromatography Vibrational and/or electronic energy levels Constants of diatomic molecules Henry's Law data Fluid Properties Computational Chemistry Comparison and Benchmark Database Electron-Impact Ionization Cross Sections (on physics web site) Gas Phase Kinetics Database Microwave spectra (on physics lab web site) NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) NIST Atomic Spectra Database - Levels Holdings (on physics web site) NIST Atomic Spectra Database - Lines Holdings (on physics web site) NIST Polycyclic Aromatic Hydrocarbon Structure Index Reference simulation Reference simulation: SPC/E Water Reference simulation: TraPPE Carbon Dioxide X-ray Photoelectron Spectroscopy Database, version 5.0 NIST / TRC Web Thermo Tables, "lite" edition (thermophysical and thermochemical data) NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)
0 B100 iron oxide anion NaN FeO- 71.8450 NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 B1000 AsF3..Cl anion NaN AsClF3- 167.3700 NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 B1000000 AgH2- NaN AgH2- 109.8846 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 B1000001 HAg(H2) NaN AgH3 110.8920 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 B1000002 AgNO+ NaN AgNO+ 137.8738 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
129340 U99777 Methyl 3-hydroxycholest-5-en-26-oate, TMS deri... Methyl (25RS)-3β-hydroxy-5-cholesten-26-oate, ... C31H54O3Si 502.8442 InChI=1S/C31H54O3Si/c1-21(10-9-11-22(2)29(32)3... DNXGNXYNSBCWGX-QBUYVTDMSA-N NaN https://webbook.nist.gov/cgi/cbook.cgi?Str2Fil... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=U997... NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=U997... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
129341 U99830 2-Methyl-3-oxovaleric acid, O,O'-bis(trimethyl... 3-Oxopentanoic acid, 2-methyl, TMS\n2-Methyl-3... C12H26O3Si2 274.5040 InChI=1S/C12H26O3Si2/c1-9-11(14-16(3,4)5)10(2)... LXAIQDVPXKOIGO-KHPPLWFESA-N NaN https://webbook.nist.gov/cgi/inchi?Str2File=U9... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/inchi?ID=U99830&M... NaN https://webbook.nist.gov/cgi/inchi?ID=U99830&M... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
129342 U99942 3-Hydroxy-3-(4'-hydroxy-3'-methoxyphenyl)propi... Vanillylhydracrylic acid, tri-TMS\nVanillylhyd... C19H36O5Si3 428.7426 InChI=1S/C19H36O5Si3/c1-21-18-13-15(11-12-16(1... QCMUGKOFXVYNCF-UHFFFAOYSA-N NaN https://webbook.nist.gov/cgi/inchi?Str2File=U9... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/inchi?ID=U99942&M... NaN https://webbook.nist.gov/cgi/inchi?ID=U99942&M... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
129343 U99947 2-Propylpentanoic acid, 2,3,4,6-tetra(trimethy... Valproic acid, glucuronide, TMS C26H58O7Si4 595.0765 InChI=1S/C26H58O7Si4/c1-15-17-20(18-16-2)25(27... OVXMRISJDUWFKB-UHFFFAOYSA-N NaN https://webbook.nist.gov/cgi/inchi?Str2File=U9... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/inchi?ID=U99947&M... NaN https://webbook.nist.gov/cgi/inchi?ID=U99947&M... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
129344 x Y5O2 radical NaN O2Y5 476.5281 NaN NaN NaN NaN NaN NaN NaN NaN NaN https://webbook.nist.gov/cgi/cbook.cgi?ID=x&Ma... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

129345 rows × 39 columns

Its columns can be divided in 5 groups:

  1. General properties:

    • ID: NIST Compound ID

    • name: chemical name

    • synonyms: synonyms

    • formula: chemical formula

    • mol_weight: molecular weigth

    • inchi / inchi_key: InChI / InChIKey strings

    • cas_rn: CAS Registry Number

  2. Molecular files:

    • mol2D / mol3D: 2D and 3D MOL-files

  3. NIST Chemistry WebBook data:

    • Gas phase thermochemistry data

    • Condensed phase thermochemistry data

    • Phase change data

    • Reaction thermochemistry data

    • Gas phase ion energetics data

    • Ion clustering data

    • IR Spectrum

    • THz IR spectrum

    • Mass spectrum (electron ionization)

    • UV/Visible spectrum

    • Gas Chromatography

    • Vibrational and/or electronic energy levels

    • Constants of diatomic molecules

    • Henry’s Law data

    • Fluid Properties

  4. NIST public data:

    • Computational Chemistry Comparison and Benchmark Database

    • Electron-Impact Ionization Cross Sections (on physics web site)

    • Gas Phase Kinetics Database

    • Microwave spectra (on physics lab web site)

    • NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)

    • NIST Atomic Spectra Database - Levels Holdings (on physics web site)

    • NIST Atomic Spectra Database - Lines Holdings (on physics web site)

    • NIST Polycyclic Aromatic Hydrocarbon Structure Index

    • Reference simulation

    • Reference simulation: SPC/E Water

    • Reference simulation: TraPPE Carbon Dioxide

    • X-ray Photoelectron Spectroscopy Database, version 5.0

  5. NIST subscription data:

    • NIST / TRC Web Thermo Tables, “lite” edition (thermophysical and thermochemical data)

    • NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)

All columns except for the first group contain URLs for the corresponding data, allowing one to parse the relevant pages without the need to preload the compounds themselves:

[2]:
col = 'NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)'
df.loc[~df[col].isna(), ['ID', 'inchi', col]]
[2]:
ID inchi NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)
11510 C10028145 InChI=1S/No https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
11593 C10043922 InChI=1S/Rn https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
11749 C10097322 InChI=1S/Br https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
16928 C12385136 InChI=1S/H https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
18624 C13494809 InChI=1S/Te https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
... ... ... ...
59700 C7440735 InChI=1S/Fr https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
59701 C7440746 InChI=1S/In https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
60912 C7704349 InChI=1S/S https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
61010 C7723140 InChI=1S/P https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
61272 C7782492 InChI=1S/Se https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...

101 rows × 3 columns

This dataframe can be used to limit all entries to those ones with desired properties. To use short names for NIST Chemistry WebBook properties, one can use the nist.get_search_parameters function:

[3]:
ps = nist.get_search_parameters()
ps
[3]:
{'use_SI': 'Units for thermodynamic data, "SI" if True and "calories" if False',
 'match_isotopes': 'Exactly match the specified isotopes (formula search only)',
 'allow_other': 'Allow elements not specified in formula (formula search only)',
 'allow_extra': 'Allow more atoms of elements in formula than specified (formula search only)',
 'no_ion': 'Exclude ions from the search (formula search only)',
 'cTG': 'Gas phase thermochemistry data',
 'cTC': 'Condensed phase thermochemistry data',
 'cTP': 'Phase change data',
 'cTR': 'Reaction thermochemistry data',
 'cIE': 'Gas phase ion energetics data',
 'cIC': 'Ion clustering data',
 'cIR': 'IR Spectrum',
 'cTZ': 'THz IR spectrum',
 'cMS': 'Mass spectrum (electron ionization)',
 'cUV': 'UV/Visible spectrum',
 'cGC': 'Gas Chromatography',
 'cES': 'Vibrational and/or electronic energy levels',
 'cDI': 'Constants of diatomic molecules',
 'cSO': "Henry's Law data"}
[4]:
pd.set_option('display.max_columns', 20)
sub = df.loc[~df.inchi.isna() & ~df.mol2D.isna() & ~df[ps['cMS']].isna() & ~df[ps['cUV']].isna()]
sub
[4]:
ID name synonyms formula mol_weight inchi inchi_key cas_rn mol2D mol3D ... NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) NIST Atomic Spectra Database - Levels Holdings (on physics web site) NIST Atomic Spectra Database - Lines Holdings (on physics web site) NIST Polycyclic Aromatic Hydrocarbon Structure Index Reference simulation Reference simulation: SPC/E Water Reference simulation: TraPPE Carbon Dioxide X-ray Photoelectron Spectroscopy Database, version 5.0 NIST / TRC Web Thermo Tables, "lite" edition (thermophysical and thermochemical data) NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)
11398 C100016 p-Nitroaniline Benzenamine, 4-nitro-\nAniline, p-nitro-\np-Am... C6H6N2O2 138.1240 InChI=1S/C6H6N2O2/c7-5-1-3-6(4-2-5)8(9)10/h1-4... TYMLOMAKGOJONV-UHFFFAOYSA-N 100-01-6 https://webbook.nist.gov/cgi/inchi?Str2File=C1... https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... NaN NaN NaN NaN NaN NaN NaN https://srdata.nist.gov/xps/SpectralByCompdDd/... NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11399 C100027 Phenol, 4-nitro- Phenol, p-nitro-\np-Hydroxynitrobenzene\np-Nit... C6H5NO3 139.1088 InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8H BTJIUGUIPKRLHP-UHFFFAOYSA-N 100-02-7 https://webbook.nist.gov/cgi/inchi?Str2File=C1... https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11418 C100094 Benzoic acid, 4-methoxy- p-Anisic acid\np-Methoxybenzoic acid\nDraconic... C8H8O3 152.1473 InChI=1S/C8H8O3/c1-11-7-4-2-6(3-5-7)8(9)10/h2-... ZEYHEAKUIGZSGI-UHFFFAOYSA-N 100-09-4 https://webbook.nist.gov/cgi/inchi?Str2File=C1... https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11423 C100107 Benzaldehyde, 4-(dimethylamino)- Benzaldehyde, p-(dimethylamino)-\np-(Dimethyla... C9H11NO 149.1897 InChI=1S/C9H11NO/c1-10(2)9-5-3-8(7-11)4-6-9/h3... BGNGWHSBYQYVRX-UHFFFAOYSA-N 100-10-7 https://webbook.nist.gov/cgi/inchi?Str2File=C1... https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11428 C100129 Benzene, 1-ethyl-4-nitro- p-Ethylnitrobenzene\np-Nitroethylbenzene\np-Ni... C8H9NO2 151.1626 InChI=1S/C8H9NO2/c1-2-7-3-5-8(6-4-7)9(10)11/h3... RESTWAHJFMZUIZ-UHFFFAOYSA-N 100-12-9 https://webbook.nist.gov/cgi/inchi?Str2File=C1... https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
66269 C99934 Acetophenone, 4'-hydroxy- Ethanone, 1-(4-hydroxyphenyl)-\np-Hydroxyaceto... C8H8O2 136.1479 InChI=1S/C8H8O2/c1-6(9)7-2-4-8(10)5-3-7/h2-5,1... TXFPEBPIARQUIG-UHFFFAOYSA-N 99-93-4 https://webbook.nist.gov/cgi/inchi?Str2File=C9... https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66272 C99945 Benzoic acid, 4-methyl- p-Toluic acid\np-Methylbenzoic acid\np-Toluyli... C8H8O2 136.1479 InChI=1S/C8H8O2/c1-6-2-4-7(5-3-6)8(9)10/h2-5H,... LPNBBFKOUUSUDB-UHFFFAOYSA-N 99-94-5 https://webbook.nist.gov/cgi/inchi?Str2File=C9... https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66279 C99967 Benzoic acid, 4-hydroxy- Benzoic acid, p-hydroxy-\np-Hydroxybenzoic aci... C7H6O3 138.1207 InChI=1S/C7H6O3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8... FJKROLUGYXJWQN-UHFFFAOYSA-N 99-96-7 https://webbook.nist.gov/cgi/inchi?Str2File=C9... https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66284 C99978 Benzenamine, N,N,4-trimethyl- p-Toluidine, N,N-dimethyl-\np-Methyl-N,N-dimet... C9H13N 135.2062 InChI=1S/C9H13N/c1-8-4-6-9(7-5-8)10(2)3/h4-7H,... GYVGXEWAOAAJEU-UHFFFAOYSA-N 99-97-8 https://webbook.nist.gov/cgi/inchi?Str2File=C9... https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN https://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66292 C99990 Benzene, 1-methyl-4-nitro- Toluene, p-nitro-\np-Methylnitrobenzene\np-Nit... C7H7NO2 137.1360 InChI=1S/C7H7NO2/c1-6-2-4-7(5-3-6)8(9)10/h2-5H... ZPTVNYMJQHSSEA-UHFFFAOYSA-N 99-99-0 https://webbook.nist.gov/cgi/cbook.cgi?Str2Fil... https://webbook.nist.gov/cgi/cbook.cgi?Str3Fil... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

1555 rows × 39 columns

Also one can run a substructure search, e.g. to get only non-aromatic compounds:

[5]:
from rdkit import Chem

# supress rdkit warnings
from rdkit import RDLogger
RDLogger.DisableLog('rdApp.*')

# prepare molecules for search
mols = [(ID, Chem.MolFromInchi(inchi)) for ID, inchi in zip(sub.ID, sub.inchi)]
mols = [(ID, mol) for ID, mol in mols if mol]

# search
pat = Chem.MolFromSmarts('[a]')
hits = [ID for ID, mol in mols if not mol.HasSubstructMatch(pat)]
print(f'{len(hits)} of {len(sub)} compounds were selected')
338 of 1555 compounds were selected

Those compounds can be retrieved via nist.get_compound function.

Previous Next

© Copyright 2023, Ivan Yu. Chernyshov.

Built with Sphinx using a theme provided by Read the Docs.