{ "cells": [ { "cell_type": "markdown", "id": "8d7cb2ed-a704-4b35-aa29-6996a9e80272", "metadata": {}, "source": [ "# Advanced search" ] }, { "cell_type": "markdown", "id": "69985863-21be-43fb-90fb-7c79a58c188b", "metadata": {}, "source": [ "NIST Chemistry WebBook supports structure search, however to the best of our knowledge there is no straightforward way to implement it as a Python API. To overcome this problem, as well as WebBook's limitation of found compounds, **NistChemPy** package contains dataframe with the main info on all NIST Chemistry WebBook compounds:" ] }, { "cell_type": "code", "execution_count": 1, "id": "6d78af72-40e0-4f28-b615-0d5bb1be52cc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDnamesynonymsformulamol_weightinchiinchi_keycas_rnmol2Dmol3DGas phase thermochemistry dataCondensed phase thermochemistry dataPhase change dataReaction thermochemistry dataGas phase ion energetics dataIon clustering dataIR SpectrumTHz IR spectrumMass spectrum (electron ionization)UV/Visible spectrumGas ChromatographyVibrational and/or electronic energy levelsConstants of diatomic moleculesHenry's Law dataFluid PropertiesComputational Chemistry Comparison and Benchmark DatabaseElectron-Impact Ionization Cross Sections (on physics web site)Gas Phase Kinetics DatabaseMicrowave spectra (on physics lab web site)NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)NIST Atomic Spectra Database - Levels Holdings (on physics web site)NIST Atomic Spectra Database - Lines Holdings (on physics web site)NIST Polycyclic Aromatic Hydrocarbon Structure IndexReference simulationReference simulation: SPC/E WaterReference simulation: TraPPE Carbon DioxideX-ray Photoelectron Spectroscopy Database, version 5.0NIST / TRC Web Thermo Tables, \"lite\" edition (thermophysical and thermochemical data)NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)
0B100iron oxide anionNaNFeO-71.8450NaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=B100...NaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=B100...https://webbook.nist.gov/cgi/cbook.cgi?ID=B100...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1B1000AsF3..Cl anionNaNAsClF3-167.3700NaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=B100...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2B1000000AgH2-NaNAgH2-109.8846NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=B100...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3B1000001HAg(H2)NaNAgH3110.8920NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=B100...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4B1000002AgNO+NaNAgNO+137.8738NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=B100...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
........................................................................................................................
129340U99777Methyl 3-hydroxycholest-5-en-26-oate, TMS deri...Methyl (25RS)-3β-hydroxy-5-cholesten-26-oate, ...C31H54O3Si502.8442InChI=1S/C31H54O3Si/c1-21(10-9-11-22(2)29(32)3...DNXGNXYNSBCWGX-QBUYVTDMSA-NNaNhttps://webbook.nist.gov/cgi/cbook.cgi?Str2Fil...NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=U997...NaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=U997...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
129341U998302-Methyl-3-oxovaleric acid, O,O'-bis(trimethyl...3-Oxopentanoic acid, 2-methyl, TMS\\n2-Methyl-3...C12H26O3Si2274.5040InChI=1S/C12H26O3Si2/c1-9-11(14-16(3,4)5)10(2)...LXAIQDVPXKOIGO-KHPPLWFESA-NNaNhttps://webbook.nist.gov/cgi/inchi?Str2File=U9...NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/inchi?ID=U99830&M...NaNhttps://webbook.nist.gov/cgi/inchi?ID=U99830&M...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
129342U999423-Hydroxy-3-(4'-hydroxy-3'-methoxyphenyl)propi...Vanillylhydracrylic acid, tri-TMS\\nVanillylhyd...C19H36O5Si3428.7426InChI=1S/C19H36O5Si3/c1-21-18-13-15(11-12-16(1...QCMUGKOFXVYNCF-UHFFFAOYSA-NNaNhttps://webbook.nist.gov/cgi/inchi?Str2File=U9...NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/inchi?ID=U99942&M...NaNhttps://webbook.nist.gov/cgi/inchi?ID=U99942&M...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
129343U999472-Propylpentanoic acid, 2,3,4,6-tetra(trimethy...Valproic acid, glucuronide, TMSC26H58O7Si4595.0765InChI=1S/C26H58O7Si4/c1-15-17-20(18-16-2)25(27...OVXMRISJDUWFKB-UHFFFAOYSA-NNaNhttps://webbook.nist.gov/cgi/inchi?Str2File=U9...NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/inchi?ID=U99947&M...NaNhttps://webbook.nist.gov/cgi/inchi?ID=U99947&M...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
129344xY5O2 radicalNaNO2Y5476.5281NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://webbook.nist.gov/cgi/cbook.cgi?ID=x&Ma...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

129345 rows × 39 columns

\n", "
" ], "text/plain": [ " ID name \\\n", "0 B100 iron oxide anion \n", "1 B1000 AsF3..Cl anion \n", "2 B1000000 AgH2- \n", "3 B1000001 HAg(H2) \n", "4 B1000002 AgNO+ \n", "... ... ... \n", "129340 U99777 Methyl 3-hydroxycholest-5-en-26-oate, TMS deri... \n", "129341 U99830 2-Methyl-3-oxovaleric acid, O,O'-bis(trimethyl... \n", "129342 U99942 3-Hydroxy-3-(4'-hydroxy-3'-methoxyphenyl)propi... \n", "129343 U99947 2-Propylpentanoic acid, 2,3,4,6-tetra(trimethy... \n", "129344 x Y5O2 radical \n", "\n", " synonyms formula \\\n", "0 NaN FeO- \n", "1 NaN AsClF3- \n", "2 NaN AgH2- \n", "3 NaN AgH3 \n", "4 NaN AgNO+ \n", "... ... ... \n", "129340 Methyl (25RS)-3β-hydroxy-5-cholesten-26-oate, ... C31H54O3Si \n", "129341 3-Oxopentanoic acid, 2-methyl, TMS\\n2-Methyl-3... C12H26O3Si2 \n", "129342 Vanillylhydracrylic acid, tri-TMS\\nVanillylhyd... C19H36O5Si3 \n", "129343 Valproic acid, glucuronide, TMS C26H58O7Si4 \n", "129344 NaN O2Y5 \n", "\n", " mol_weight inchi \\\n", "0 71.8450 NaN \n", "1 167.3700 NaN \n", "2 109.8846 NaN \n", "3 110.8920 NaN \n", "4 137.8738 NaN \n", "... ... ... \n", "129340 502.8442 InChI=1S/C31H54O3Si/c1-21(10-9-11-22(2)29(32)3... \n", "129341 274.5040 InChI=1S/C12H26O3Si2/c1-9-11(14-16(3,4)5)10(2)... \n", "129342 428.7426 InChI=1S/C19H36O5Si3/c1-21-18-13-15(11-12-16(1... \n", "129343 595.0765 InChI=1S/C26H58O7Si4/c1-15-17-20(18-16-2)25(27... \n", "129344 476.5281 NaN \n", "\n", " inchi_key cas_rn \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "... ... ... \n", "129340 DNXGNXYNSBCWGX-QBUYVTDMSA-N NaN \n", "129341 LXAIQDVPXKOIGO-KHPPLWFESA-N NaN \n", "129342 QCMUGKOFXVYNCF-UHFFFAOYSA-N NaN \n", "129343 OVXMRISJDUWFKB-UHFFFAOYSA-N NaN \n", "129344 NaN NaN \n", "\n", " mol2D mol3D \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "... ... ... \n", "129340 https://webbook.nist.gov/cgi/cbook.cgi?Str2Fil... NaN \n", "129341 https://webbook.nist.gov/cgi/inchi?Str2File=U9... NaN \n", "129342 https://webbook.nist.gov/cgi/inchi?Str2File=U9... NaN \n", "129343 https://webbook.nist.gov/cgi/inchi?Str2File=U9... NaN \n", "129344 NaN NaN \n", "\n", " Gas phase thermochemistry data \\\n", "0 https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... \n", "1 https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " Condensed phase thermochemistry data Phase change data \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "... ... ... \n", "129340 NaN NaN \n", "129341 NaN NaN \n", "129342 NaN NaN \n", "129343 NaN NaN \n", "129344 NaN NaN \n", "\n", " Reaction thermochemistry data \\\n", "0 https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " Gas phase ion energetics data Ion clustering data \\\n", "0 https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "... ... ... \n", "129340 NaN NaN \n", "129341 NaN NaN \n", "129342 NaN NaN \n", "129343 NaN NaN \n", "129344 https://webbook.nist.gov/cgi/cbook.cgi?ID=x&Ma... NaN \n", "\n", " IR Spectrum THz IR spectrum \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "... ... ... \n", "129340 NaN NaN \n", "129341 NaN NaN \n", "129342 NaN NaN \n", "129343 NaN NaN \n", "129344 NaN NaN \n", "\n", " Mass spectrum (electron ionization) UV/Visible spectrum \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "... ... ... \n", "129340 https://webbook.nist.gov/cgi/cbook.cgi?ID=U997... NaN \n", "129341 https://webbook.nist.gov/cgi/inchi?ID=U99830&M... NaN \n", "129342 https://webbook.nist.gov/cgi/inchi?ID=U99942&M... NaN \n", "129343 https://webbook.nist.gov/cgi/inchi?ID=U99947&M... NaN \n", "129344 NaN NaN \n", "\n", " Gas Chromatography \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 https://webbook.nist.gov/cgi/cbook.cgi?ID=U997... \n", "129341 https://webbook.nist.gov/cgi/inchi?ID=U99830&M... \n", "129342 https://webbook.nist.gov/cgi/inchi?ID=U99942&M... \n", "129343 https://webbook.nist.gov/cgi/inchi?ID=U99947&M... \n", "129344 NaN \n", "\n", " Vibrational and/or electronic energy levels \\\n", "0 NaN \n", "1 NaN \n", "2 https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... \n", "3 https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... \n", "4 https://webbook.nist.gov/cgi/cbook.cgi?ID=B100... \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " Constants of diatomic molecules Henry's Law data Fluid Properties \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "... ... ... ... \n", "129340 NaN NaN NaN \n", "129341 NaN NaN NaN \n", "129342 NaN NaN NaN \n", "129343 NaN NaN NaN \n", "129344 NaN NaN NaN \n", "\n", " Computational Chemistry Comparison and Benchmark Database \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " Electron-Impact Ionization Cross Sections (on physics web site) \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " Gas Phase Kinetics Database \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " Microwave spectra (on physics lab web site) \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " NIST Atomic Spectra Database - Levels Holdings (on physics web site) \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " NIST Atomic Spectra Database - Lines Holdings (on physics web site) \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " NIST Polycyclic Aromatic Hydrocarbon Structure Index \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " Reference simulation Reference simulation: SPC/E Water \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "... ... ... \n", "129340 NaN NaN \n", "129341 NaN NaN \n", "129342 NaN NaN \n", "129343 NaN NaN \n", "129344 NaN NaN \n", "\n", " Reference simulation: TraPPE Carbon Dioxide \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " X-ray Photoelectron Spectroscopy Database, version 5.0 \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " NIST / TRC Web Thermo Tables, \"lite\" edition (thermophysical and thermochemical data) \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", " NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data) \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "... ... \n", "129340 NaN \n", "129341 NaN \n", "129342 NaN \n", "129343 NaN \n", "129344 NaN \n", "\n", "[129345 rows x 39 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import nistchempy as nist\n", "import pandas as pd\n", "\n", "pd.set_option('display.max_columns', None)\n", "df = nist.get_all_data()\n", "df" ] }, { "cell_type": "markdown", "id": "eed873f7-7df9-44f9-b19f-7acf7da28dc6", "metadata": {}, "source": [ "Its columns can be divided in 5 groups:\n", "\n", "1. General properties:\n", "\n", " - `ID`: NIST Compound ID\n", "\n", " - `name`: chemical name\n", "\n", " - `synonyms`: synonyms\n", "\n", " - `formula`: chemical formula\n", "\n", " - `mol_weight`: molecular weigth\n", "\n", " - `inchi` / `inchi_key`: InChI / InChIKey strings\n", "\n", " - `cas_rn`: CAS Registry Number\n", "\n", "2. Molecular files:\n", "\n", " - `mol2D` / `mol3D`: 2D and 3D MOL-files\n", "\n", "3. NIST Chemistry WebBook data:\n", "\n", " - Gas phase thermochemistry data\n", "\n", " - Condensed phase thermochemistry data\n", "\n", " - Phase change data\n", "\n", " - Reaction thermochemistry data\n", "\n", " - Gas phase ion energetics data\n", "\n", " - Ion clustering data\n", "\n", " - IR Spectrum\n", "\n", " - THz IR spectrum\n", "\n", " - Mass spectrum (electron ionization)\n", "\n", " - UV/Visible spectrum\n", "\n", " - Gas Chromatography\n", "\n", " - Vibrational and/or electronic energy levels\n", "\n", " - Constants of diatomic molecules\n", "\n", " - Henry's Law data\n", "\n", " - Fluid Properties\n", "\n", "4. NIST public data:\n", "\n", " - Computational Chemistry Comparison and Benchmark Database\n", "\n", " - Electron-Impact Ionization Cross Sections (on physics web site)\n", "\n", " - Gas Phase Kinetics Database\n", "\n", " - Microwave spectra (on physics lab web site)\n", "\n", " - NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)\n", "\n", " - NIST Atomic Spectra Database - Levels Holdings (on physics web site)\n", "\n", " - NIST Atomic Spectra Database - Lines Holdings (on physics web site)\n", "\n", " - NIST Polycyclic Aromatic Hydrocarbon Structure Index\n", "\n", " - Reference simulation\n", "\n", " - Reference simulation: SPC/E Water\n", "\n", " - Reference simulation: TraPPE Carbon Dioxide\n", "\n", " - X-ray Photoelectron Spectroscopy Database, version 5.0\n", "\n", "5. NIST subscription data:\n", "\n", " - NIST / TRC Web Thermo Tables, \"lite\" edition (thermophysical and thermochemical data)\n", "\n", " - NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)\n", "\n", "\n", "All columns except for the first group contain URLs for the corresponding data, allowing one to parse the relevant pages without the need to preload the compounds themselves:" ] }, { "cell_type": "code", "execution_count": 2, "id": "a6dc7b7a-65f4-493f-bdfc-917fa3ef76d2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDinchiNIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)
11510C10028145InChI=1S/Nohttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
11593C10043922InChI=1S/Rnhttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
11749C10097322InChI=1S/Brhttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
16928C12385136InChI=1S/Hhttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
18624C13494809InChI=1S/Tehttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
............
59700C7440735InChI=1S/Frhttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
59701C7440746InChI=1S/Inhttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
60912C7704349InChI=1S/Shttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
61010C7723140InChI=1S/Phttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
61272C7782492InChI=1S/Sehttps://physics.nist.gov/cgi-bin/ASD/ie.pl?spe...
\n", "

101 rows × 3 columns

\n", "
" ], "text/plain": [ " ID inchi \\\n", "11510 C10028145 InChI=1S/No \n", "11593 C10043922 InChI=1S/Rn \n", "11749 C10097322 InChI=1S/Br \n", "16928 C12385136 InChI=1S/H \n", "18624 C13494809 InChI=1S/Te \n", "... ... ... \n", "59700 C7440735 InChI=1S/Fr \n", "59701 C7440746 InChI=1S/In \n", "60912 C7704349 InChI=1S/S \n", "61010 C7723140 InChI=1S/P \n", "61272 C7782492 InChI=1S/Se \n", "\n", " NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) \n", "11510 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "11593 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "11749 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "16928 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "18624 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "... ... \n", "59700 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "59701 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "60912 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "61010 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "61272 https://physics.nist.gov/cgi-bin/ASD/ie.pl?spe... \n", "\n", "[101 rows x 3 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col = 'NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)'\n", "df.loc[~df[col].isna(), ['ID', 'inchi', col]]" ] }, { "cell_type": "markdown", "id": "cbfd6766-a8bb-4817-8df4-e7680beceb7a", "metadata": {}, "source": [ "This dataframe can be used to limit all entries to those ones with desired properties. To use short names for NIST Chemistry WebBook properties, one can use the `nist.get_search_parameters` function:" ] }, { "cell_type": "code", "execution_count": 3, "id": "9a6f4527-66c8-4606-bc91-536be0ced28c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'use_SI': 'Units for thermodynamic data, \"SI\" if True and \"calories\" if False',\n", " 'match_isotopes': 'Exactly match the specified isotopes (formula search only)',\n", " 'allow_other': 'Allow elements not specified in formula (formula search only)',\n", " 'allow_extra': 'Allow more atoms of elements in formula than specified (formula search only)',\n", " 'no_ion': 'Exclude ions from the search (formula search only)',\n", " 'cTG': 'Gas phase thermochemistry data',\n", " 'cTC': 'Condensed phase thermochemistry data',\n", " 'cTP': 'Phase change data',\n", " 'cTR': 'Reaction thermochemistry data',\n", " 'cIE': 'Gas phase ion energetics data',\n", " 'cIC': 'Ion clustering data',\n", " 'cIR': 'IR Spectrum',\n", " 'cTZ': 'THz IR spectrum',\n", " 'cMS': 'Mass spectrum (electron ionization)',\n", " 'cUV': 'UV/Visible spectrum',\n", " 'cGC': 'Gas Chromatography',\n", " 'cES': 'Vibrational and/or electronic energy levels',\n", " 'cDI': 'Constants of diatomic molecules',\n", " 'cSO': \"Henry's Law data\"}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ps = nist.get_search_parameters()\n", "ps" ] }, { "cell_type": "code", "execution_count": 4, "id": "c8912156-a244-45de-bc29-0b832a080806", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDnamesynonymsformulamol_weightinchiinchi_keycas_rnmol2Dmol3D...NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site)NIST Atomic Spectra Database - Levels Holdings (on physics web site)NIST Atomic Spectra Database - Lines Holdings (on physics web site)NIST Polycyclic Aromatic Hydrocarbon Structure IndexReference simulationReference simulation: SPC/E WaterReference simulation: TraPPE Carbon DioxideX-ray Photoelectron Spectroscopy Database, version 5.0NIST / TRC Web Thermo Tables, \"lite\" edition (thermophysical and thermochemical data)NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data)
11398C100016p-NitroanilineBenzenamine, 4-nitro-\\nAniline, p-nitro-\\np-Am...C6H6N2O2138.1240InChI=1S/C6H6N2O2/c7-5-1-3-6(4-2-5)8(9)10/h1-4...TYMLOMAKGOJONV-UHFFFAOYSA-N100-01-6https://webbook.nist.gov/cgi/inchi?Str2File=C1...https://webbook.nist.gov/cgi/inchi?Str3File=C1......NaNNaNNaNNaNNaNNaNNaNhttps://srdata.nist.gov/xps/SpectralByCompdDd/...NaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11399C100027Phenol, 4-nitro-Phenol, p-nitro-\\np-Hydroxynitrobenzene\\np-Nit...C6H5NO3139.1088InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8HBTJIUGUIPKRLHP-UHFFFAOYSA-N100-02-7https://webbook.nist.gov/cgi/inchi?Str2File=C1...https://webbook.nist.gov/cgi/inchi?Str3File=C1......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11418C100094Benzoic acid, 4-methoxy-p-Anisic acid\\np-Methoxybenzoic acid\\nDraconic...C8H8O3152.1473InChI=1S/C8H8O3/c1-11-7-4-2-6(3-5-7)8(9)10/h2-...ZEYHEAKUIGZSGI-UHFFFAOYSA-N100-09-4https://webbook.nist.gov/cgi/inchi?Str2File=C1...https://webbook.nist.gov/cgi/inchi?Str3File=C1......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11423C100107Benzaldehyde, 4-(dimethylamino)-Benzaldehyde, p-(dimethylamino)-\\np-(Dimethyla...C9H11NO149.1897InChI=1S/C9H11NO/c1-10(2)9-5-3-8(7-11)4-6-9/h3...BGNGWHSBYQYVRX-UHFFFAOYSA-N100-10-7https://webbook.nist.gov/cgi/inchi?Str2File=C1...https://webbook.nist.gov/cgi/inchi?Str3File=C1......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
11428C100129Benzene, 1-ethyl-4-nitro-p-Ethylnitrobenzene\\np-Nitroethylbenzene\\np-Ni...C8H9NO2151.1626InChI=1S/C8H9NO2/c1-2-7-3-5-8(6-4-7)9(10)11/h3...RESTWAHJFMZUIZ-UHFFFAOYSA-N100-12-9https://webbook.nist.gov/cgi/inchi?Str2File=C1...https://webbook.nist.gov/cgi/inchi?Str3File=C1......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
..................................................................
66269C99934Acetophenone, 4'-hydroxy-Ethanone, 1-(4-hydroxyphenyl)-\\np-Hydroxyaceto...C8H8O2136.1479InChI=1S/C8H8O2/c1-6(9)7-2-4-8(10)5-3-7/h2-5,1...TXFPEBPIARQUIG-UHFFFAOYSA-N99-93-4https://webbook.nist.gov/cgi/inchi?Str2File=C9...https://webbook.nist.gov/cgi/inchi?Str3File=C9......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66272C99945Benzoic acid, 4-methyl-p-Toluic acid\\np-Methylbenzoic acid\\np-Toluyli...C8H8O2136.1479InChI=1S/C8H8O2/c1-6-2-4-7(5-3-6)8(9)10/h2-5H,...LPNBBFKOUUSUDB-UHFFFAOYSA-N99-94-5https://webbook.nist.gov/cgi/inchi?Str2File=C9...https://webbook.nist.gov/cgi/inchi?Str3File=C9......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66279C99967Benzoic acid, 4-hydroxy-Benzoic acid, p-hydroxy-\\np-Hydroxybenzoic aci...C7H6O3138.1207InChI=1S/C7H6O3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8...FJKROLUGYXJWQN-UHFFFAOYSA-N99-96-7https://webbook.nist.gov/cgi/inchi?Str2File=C9...https://webbook.nist.gov/cgi/inchi?Str3File=C9......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66284C99978Benzenamine, N,N,4-trimethyl-p-Toluidine, N,N-dimethyl-\\np-Methyl-N,N-dimet...C9H13N135.2062InChI=1S/C9H13N/c1-8-4-6-9(7-5-8)10(2)3/h4-7H,...GYVGXEWAOAAJEU-UHFFFAOYSA-N99-97-8https://webbook.nist.gov/cgi/inchi?Str2File=C9...https://webbook.nist.gov/cgi/inchi?Str3File=C9......NaNNaNNaNNaNNaNNaNNaNNaNNaNhttps://wtt-pro.nist.gov/wtt-pro/index.html?cm...
66292C99990Benzene, 1-methyl-4-nitro-Toluene, p-nitro-\\np-Methylnitrobenzene\\np-Nit...C7H7NO2137.1360InChI=1S/C7H7NO2/c1-6-2-4-7(5-3-6)8(9)10/h2-5H...ZPTVNYMJQHSSEA-UHFFFAOYSA-N99-99-0https://webbook.nist.gov/cgi/cbook.cgi?Str2Fil...https://webbook.nist.gov/cgi/cbook.cgi?Str3Fil......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

1555 rows × 39 columns

\n", "
" ], "text/plain": [ " ID name \\\n", "11398 C100016 p-Nitroaniline \n", "11399 C100027 Phenol, 4-nitro- \n", "11418 C100094 Benzoic acid, 4-methoxy- \n", "11423 C100107 Benzaldehyde, 4-(dimethylamino)- \n", "11428 C100129 Benzene, 1-ethyl-4-nitro- \n", "... ... ... \n", "66269 C99934 Acetophenone, 4'-hydroxy- \n", "66272 C99945 Benzoic acid, 4-methyl- \n", "66279 C99967 Benzoic acid, 4-hydroxy- \n", "66284 C99978 Benzenamine, N,N,4-trimethyl- \n", "66292 C99990 Benzene, 1-methyl-4-nitro- \n", "\n", " synonyms formula \\\n", "11398 Benzenamine, 4-nitro-\\nAniline, p-nitro-\\np-Am... C6H6N2O2 \n", "11399 Phenol, p-nitro-\\np-Hydroxynitrobenzene\\np-Nit... C6H5NO3 \n", "11418 p-Anisic acid\\np-Methoxybenzoic acid\\nDraconic... C8H8O3 \n", "11423 Benzaldehyde, p-(dimethylamino)-\\np-(Dimethyla... C9H11NO \n", "11428 p-Ethylnitrobenzene\\np-Nitroethylbenzene\\np-Ni... C8H9NO2 \n", "... ... ... \n", "66269 Ethanone, 1-(4-hydroxyphenyl)-\\np-Hydroxyaceto... C8H8O2 \n", "66272 p-Toluic acid\\np-Methylbenzoic acid\\np-Toluyli... C8H8O2 \n", "66279 Benzoic acid, p-hydroxy-\\np-Hydroxybenzoic aci... C7H6O3 \n", "66284 p-Toluidine, N,N-dimethyl-\\np-Methyl-N,N-dimet... C9H13N \n", "66292 Toluene, p-nitro-\\np-Methylnitrobenzene\\np-Nit... C7H7NO2 \n", "\n", " mol_weight inchi \\\n", "11398 138.1240 InChI=1S/C6H6N2O2/c7-5-1-3-6(4-2-5)8(9)10/h1-4... \n", "11399 139.1088 InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8H \n", "11418 152.1473 InChI=1S/C8H8O3/c1-11-7-4-2-6(3-5-7)8(9)10/h2-... \n", "11423 149.1897 InChI=1S/C9H11NO/c1-10(2)9-5-3-8(7-11)4-6-9/h3... \n", "11428 151.1626 InChI=1S/C8H9NO2/c1-2-7-3-5-8(6-4-7)9(10)11/h3... \n", "... ... ... \n", "66269 136.1479 InChI=1S/C8H8O2/c1-6(9)7-2-4-8(10)5-3-7/h2-5,1... \n", "66272 136.1479 InChI=1S/C8H8O2/c1-6-2-4-7(5-3-6)8(9)10/h2-5H,... \n", "66279 138.1207 InChI=1S/C7H6O3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8... \n", "66284 135.2062 InChI=1S/C9H13N/c1-8-4-6-9(7-5-8)10(2)3/h4-7H,... \n", "66292 137.1360 InChI=1S/C7H7NO2/c1-6-2-4-7(5-3-6)8(9)10/h2-5H... \n", "\n", " inchi_key cas_rn \\\n", "11398 TYMLOMAKGOJONV-UHFFFAOYSA-N 100-01-6 \n", "11399 BTJIUGUIPKRLHP-UHFFFAOYSA-N 100-02-7 \n", "11418 ZEYHEAKUIGZSGI-UHFFFAOYSA-N 100-09-4 \n", "11423 BGNGWHSBYQYVRX-UHFFFAOYSA-N 100-10-7 \n", "11428 RESTWAHJFMZUIZ-UHFFFAOYSA-N 100-12-9 \n", "... ... ... \n", "66269 TXFPEBPIARQUIG-UHFFFAOYSA-N 99-93-4 \n", "66272 LPNBBFKOUUSUDB-UHFFFAOYSA-N 99-94-5 \n", "66279 FJKROLUGYXJWQN-UHFFFAOYSA-N 99-96-7 \n", "66284 GYVGXEWAOAAJEU-UHFFFAOYSA-N 99-97-8 \n", "66292 ZPTVNYMJQHSSEA-UHFFFAOYSA-N 99-99-0 \n", "\n", " mol2D \\\n", "11398 https://webbook.nist.gov/cgi/inchi?Str2File=C1... \n", "11399 https://webbook.nist.gov/cgi/inchi?Str2File=C1... \n", "11418 https://webbook.nist.gov/cgi/inchi?Str2File=C1... \n", "11423 https://webbook.nist.gov/cgi/inchi?Str2File=C1... \n", "11428 https://webbook.nist.gov/cgi/inchi?Str2File=C1... \n", "... ... \n", "66269 https://webbook.nist.gov/cgi/inchi?Str2File=C9... \n", "66272 https://webbook.nist.gov/cgi/inchi?Str2File=C9... \n", "66279 https://webbook.nist.gov/cgi/inchi?Str2File=C9... \n", "66284 https://webbook.nist.gov/cgi/inchi?Str2File=C9... \n", "66292 https://webbook.nist.gov/cgi/cbook.cgi?Str2Fil... \n", "\n", " mol3D ... \\\n", "11398 https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... \n", "11399 https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... \n", "11418 https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... \n", "11423 https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... \n", "11428 https://webbook.nist.gov/cgi/inchi?Str3File=C1... ... \n", "... ... ... \n", "66269 https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... \n", "66272 https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... \n", "66279 https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... \n", "66284 https://webbook.nist.gov/cgi/inchi?Str3File=C9... ... \n", "66292 https://webbook.nist.gov/cgi/cbook.cgi?Str3Fil... ... \n", "\n", " NIST Atomic Spectra Database - Ground states and ionization energies (on physics web site) \\\n", "11398 NaN \n", "11399 NaN \n", "11418 NaN \n", "11423 NaN \n", "11428 NaN \n", "... ... \n", "66269 NaN \n", "66272 NaN \n", "66279 NaN \n", "66284 NaN \n", "66292 NaN \n", "\n", " NIST Atomic Spectra Database - Levels Holdings (on physics web site) \\\n", "11398 NaN \n", "11399 NaN \n", "11418 NaN \n", "11423 NaN \n", "11428 NaN \n", "... ... \n", "66269 NaN \n", "66272 NaN \n", "66279 NaN \n", "66284 NaN \n", "66292 NaN \n", "\n", " NIST Atomic Spectra Database - Lines Holdings (on physics web site) \\\n", "11398 NaN \n", "11399 NaN \n", "11418 NaN \n", "11423 NaN \n", "11428 NaN \n", "... ... \n", "66269 NaN \n", "66272 NaN \n", "66279 NaN \n", "66284 NaN \n", "66292 NaN \n", "\n", " NIST Polycyclic Aromatic Hydrocarbon Structure Index \\\n", "11398 NaN \n", "11399 NaN \n", "11418 NaN \n", "11423 NaN \n", "11428 NaN \n", "... ... \n", "66269 NaN \n", "66272 NaN \n", "66279 NaN \n", "66284 NaN \n", "66292 NaN \n", "\n", " Reference simulation Reference simulation: SPC/E Water \\\n", "11398 NaN NaN \n", "11399 NaN NaN \n", "11418 NaN NaN \n", "11423 NaN NaN \n", "11428 NaN NaN \n", "... ... ... \n", "66269 NaN NaN \n", "66272 NaN NaN \n", "66279 NaN NaN \n", "66284 NaN NaN \n", "66292 NaN NaN \n", "\n", " Reference simulation: TraPPE Carbon Dioxide \\\n", "11398 NaN \n", "11399 NaN \n", "11418 NaN \n", "11423 NaN \n", "11428 NaN \n", "... ... \n", "66269 NaN \n", "66272 NaN \n", "66279 NaN \n", "66284 NaN \n", "66292 NaN \n", "\n", " X-ray Photoelectron Spectroscopy Database, version 5.0 \\\n", "11398 https://srdata.nist.gov/xps/SpectralByCompdDd/... \n", "11399 NaN \n", "11418 NaN \n", "11423 NaN \n", "11428 NaN \n", "... ... \n", "66269 NaN \n", "66272 NaN \n", "66279 NaN \n", "66284 NaN \n", "66292 NaN \n", "\n", " NIST / TRC Web Thermo Tables, \"lite\" edition (thermophysical and thermochemical data) \\\n", "11398 NaN \n", "11399 NaN \n", "11418 NaN \n", "11423 NaN \n", "11428 NaN \n", "... ... \n", "66269 NaN \n", "66272 NaN \n", "66279 NaN \n", "66284 NaN \n", "66292 NaN \n", "\n", " NIST / TRC Web Thermo Tables, professional edition (thermophysical and thermochemical data) \n", "11398 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "11399 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "11418 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "11423 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "11428 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "... ... \n", "66269 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "66272 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "66279 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "66284 https://wtt-pro.nist.gov/wtt-pro/index.html?cm... \n", "66292 NaN \n", "\n", "[1555 rows x 39 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option('display.max_columns', 20)\n", "sub = df.loc[~df.inchi.isna() & ~df.mol2D.isna() & ~df[ps['cMS']].isna() & ~df[ps['cUV']].isna()]\n", "sub" ] }, { "cell_type": "markdown", "id": "32a852f5-190c-4ad6-897f-3a0bd11b29f1", "metadata": {}, "source": [ "Also one can run a substructure search, e.g. to get only non-aromatic compounds:" ] }, { "cell_type": "code", "execution_count": 5, "id": "4e92f88d-c88c-4703-8a38-b4d079f7654a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "338 of 1555 compounds were selected\n" ] } ], "source": [ "from rdkit import Chem\n", "\n", "# supress rdkit warnings\n", "from rdkit import RDLogger\n", "RDLogger.DisableLog('rdApp.*')\n", "\n", "# prepare molecules for search\n", "mols = [(ID, Chem.MolFromInchi(inchi)) for ID, inchi in zip(sub.ID, sub.inchi)]\n", "mols = [(ID, mol) for ID, mol in mols if mol]\n", "\n", "# search\n", "pat = Chem.MolFromSmarts('[a]')\n", "hits = [ID for ID, mol in mols if not mol.HasSubstructMatch(pat)]\n", "print(f'{len(hits)} of {len(sub)} compounds were selected')" ] }, { "cell_type": "markdown", "id": "0b0c6c3d-0887-472c-94ee-2cb161aecb3f", "metadata": {}, "source": [ "Those compounds can be retrieved via `nist.get_compound` function." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }