NistChemPy API

nistchempy

This package is a Python interface for the NIST Chemistry WebBook database that provides additional data for the efficient compound search and automatic retrievement of the stored physico-chemical data

nistchempy.get_all_data() DataFrame

Returns pandas dataframe containing info on all NIST Chem WebBook compounds

Returns:

dataframe containing pre-extracted compound info

Return type:

_pd.core.frame.DataFrame

nistchempy.get_compound(ID: str, request_config: RequestConfig | None = None) NistCompound | None

Loads the main info on the given NIST compound

Parameters:
  • ID (str) – NIST compound ID, CAS RN or InChI

  • request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters

Returns:

NistCompound object, and None if there are several compounds corresponding to the given ID

Return type:

_tp.Optional[NistCompound]

nistchempy.run_search(identifier: str, search_type: str, search_parameters: NistSearchParameters | None = None, request_config: RequestConfig | None = None, use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False) NistSearch

Searches compounds in NIST Chemistry WebBook

Parameters:
  • identifier (str) – NIST compound ID / formula / name / inchi / CAS RN

  • search_type (str) – identifier type, available options are: - ‘formula’ - ‘name’ - ‘inchi’ - ‘cas’ - ‘id’

  • search_parameters (_tp.Optional[NistSearchParameters]) – search parameters; if provided, the following search parameter arguments are ignored

  • request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters

  • use_SI (bool) – if True, returns results in SI units. otherwise calories are used

  • match_isotopes (bool) – if True, exactly matches the specified isotopes (formula search only)

  • allow_other (bool) – if True, allows elements not specified in formula (formula search only)

  • allow_extra (bool) – if True, allows more atoms of elements in formula than specified (formula search only)

  • no_ion (bool) – if True, excludes ions from the search (formula search only)

  • cTG (bool) – if True, returns entries containing gas-phase thermodynamic data

  • cTC (bool) – if True, returns entries containing condensed-phase thermodynamic data

  • cTP (bool) – if True, returns entries containing phase-change thermodynamic data

  • cTR (bool) – if True, returns entries containing reaction thermodynamic data

  • cIE (bool) – if True, returns entries containing ion energetics thermodynamic data

  • cIC (bool) – if True, returns entries containing ion cluster thermodynamic data

  • cIR (bool) – if True, returns entries containing IR data

  • cTZ (bool) – if True, returns entries containing THz IR data

  • cMS (bool) – if True, returns entries containing MS data

  • cUV (bool) – if True, returns entries containing UV/Vis data

  • cGC (bool) – if True, returns entries containing gas chromatography data

  • cES (bool) – if True, returns entries containing vibrational and electronic energy levels

  • cDI (bool) – if True, returns entries containing constants of diatomic molecules

  • cSO (bool) – if True, returns entries containing info on Henry’s law

Returns:

search object containing info on found compounds

Return type:

NistSearch

class nistchempy.NistSearchParameters(use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False)

Bases: object

GET parameters for compound search of NIST Chemistry WebBook

use_SI

if True, returns results in SI units. otherwise calories are used

Type:

bool

match_isotopes

if True, exactly matches the specified isotopes (formula search only)

Type:

bool

allow_other

if True, allows elements not specified in formula (formula search only)

Type:

bool

allow_extra

if True, allows more atoms of elements in formula than specified (formula search only)

Type:

bool

no_ion

if True, excludes ions from the search (formula search only)

Type:

bool

cTG

if True, returns entries containing gas-phase thermodynamic data

Type:

bool

cTC

if True, returns entries containing condensed-phase thermodynamic data

Type:

bool

cTP

if True, returns entries containing phase-change thermodynamic data

Type:

bool

cTR

if True, returns entries containing reaction thermodynamic data

Type:

bool

cIE

if True, returns entries containing ion energetics thermodynamic data

Type:

bool

cIC

if True, returns entries containing ion cluster thermodynamic data

Type:

bool

cIR

if True, returns entries containing IR data

Type:

bool

cTZ

if True, returns entries containing THz IR data

Type:

bool

cMS

if True, returns entries containing MS data

Type:

bool

cUV

if True, returns entries containing UV/Vis data

Type:

bool

cGC

if True, returns entries containing gas chromatography data

Type:

bool

cES

if True, returns entries containing vibrational and electronic energy levels

Type:

bool

cDI

if True, returns entries containing constants of diatomic molecules

Type:

bool

cSO

if True, returns entries containing info on Henry’s law

Type:

bool

use_SI: bool = True
match_isotopes: bool = False
allow_other: bool = False
allow_extra: bool = False
no_ion: bool = False
cTG: bool = False
cTC: bool = False
cTP: bool = False
cTR: bool = False
cIE: bool = False
cIC: bool = False
cIR: bool = False
cTZ: bool = False
cMS: bool = False
cUV: bool = False
cGC: bool = False
cES: bool = False
cDI: bool = False
cSO: bool = False
get_request_parameters() dict

Returns dictionary containing GET parameters

Returns:

dictionary of GET parameters relevant to the search

Return type:

dict

nistchempy.get_search_parameters() Dict[str, str]

Returns search parameters and the corresponding keys

Returns:

{short_key => search_parameter}

Return type:

_tp.Dict[str, str]

nistchempy.print_search_parameters() None

Prints available search parameters

class nistchempy.RequestConfig(delay: float = 0.0, max_attempts: int | None = 1, kwargs: dict = <factory>)

Bases: object

Contains parameters used by make_nist_request function

Attrubutes:

delay (float): time delay in seconds after getting response from NIST max_attempts (_tp.Optional[int]): if > 1, enables reattempting of getting response

in case of request errors or non-OK response

kwargs (dict): kwargs for requests.get inside of make_nist_request

delay: float = 0.0
max_attempts: int | None = 1
kwargs: dict
nistchempy.get_crawl_delay(useragent: str = '*', config: RequestConfig | None = None) float

Returns NIST Chemistry Webbook’s crawl delay for the given user agent

nistchempy.useragent

user agent

Type:

str

Returns:

crawl delay in seconds

Return type:

float

nistchempy.compound

The module contains compound-related functionality

nistchempy.compound.SPEC_TYPES

dictionary containing abbreviations for spectra types used in compound page (keys) or urls for downloading JDX-files (values)

Type:

dict

nistchempy.compound.urlparse(url, scheme='', allow_fragments=True)

Parse a URL into 6 components: <scheme>://<netloc>/<path>;<params>?<query>#<fragment>

The result is a named 6-tuple with fields corresponding to the above. It is either a ParseResult or ParseResultBytes object, depending on the type of the url parameter.

The username, password, hostname, and port sub-components of netloc can also be accessed as attributes of the returned object.

The scheme argument provides the default value of the scheme component when no scheme is found in url.

If allow_fragments is False, no attempt is made to separate the fragment component from the previous component, which can be either path or query.

Note that % escapes are not expanded.

nistchempy.compound.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Parse a query given as a string argument.

Arguments:

qs: percent-encoded query string to be parsed

keep_blank_values: flag indicating whether blank values in

percent-encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included.

strict_parsing: flag indicating what to do with parsing errors.

If false (the default), errors are silently ignored. If true, errors raise a ValueError exception.

encoding and errors: specify how to decode percent-encoded sequences

into Unicode characters, as accepted by the bytes.decode() method.

max_num_fields: int. If set, then throws a ValueError if there

are more than n fields read by parse_qsl().

separator: str. The symbol to use for separating the query arguments.

Defaults to &.

Returns a dictionary.

class nistchempy.compound.Spectrum(compound: NistCompound, spec_type: str, spec_idx: str, jdx_text: str)

Bases: object

Wrapper for IR, MS, and UV-Vis extracted from NIST Chemistry WebBook

compound

parent NistCompound object

Type:

NistCompound

spec_type

IR / TZ (THz IR) / MS / UV (UV-Vis)

Type:

str

spec_idx

index of the spectrum

Type:

str

jdx_text

text block of the corresponding JDX-file

Type:

str

compound: NistCompound
spec_type: str
spec_idx: str
jdx_text: str
save(name: str = None, path_dir: str = None) None

Saves spectrum in JDX format

name

custom filename (default name is formed from compound ID, spectrum type and index)

Type:

str

path_dir

directory where output file will be saved

Type:

str

class nistchempy.compound.Chromatogram(compound: NistCompound, ri_type: str, column_type: str, temp_regime: str, data: DataFrame)

Bases: object

Wrapper chromatography data extracted from NIST Chemistry WebBook

compound

parent NistCompound object

Type:

NistCompound

ri_type

type of retention index: Kovatz, van den Dool & Kratz, etc.

Type:

str

column_type

polar / non-polar

Type:

str

temp_regime

temperature regime: isothermal / ramp / custom

Type:

str

data

experimental data

Type:

_pd.core.frame.DataFrame

compound: NistCompound
ri_type: str
column_type: str
temp_regime: str
data: DataFrame
save(name: str = None, path_dir: str = None, **kwargs) None

Saves chromatograms in CSV format

name

custom filename (default name is formed from compound ID, spectrum type and index)

Type:

str

path_dir

directory where output file will be saved

Type:

str

kwargs

parameters for pandas DataFrame to_csv method

class nistchempy.compound.NistCompound(_request_config: RequestConfig, _nist_response: NistResponse, ID: str | None, name: str | None, synonyms: List[str], formula: str | None, mol_weight: float | None, inchi: str | None, inchi_key: str | None, cas_rn: str | None, mol_refs: Dict[str, str], data_refs: Dict[str, str], nist_public_refs: Dict[str, str], nist_subscription_refs: Dict[str, str])

Bases: object

Stores info on NIST Chemistry WebBook compound

_request_config

additional requests.get parameters

Type:

_ncpr.RequestConfig

_nist_response

response to the GET request

Type:

_ncpr.NistResponse

ID

NIST compound ID

Type:

_tp.Optional[str]

name

chemical name

Type:

_tp.Optional[str]

synonyms

synonyms of the chemical name

Type:

_tp.List[str]

formula

chemical formula

Type:

_tp.Optional[str]

mol_weight

molecular weigth, g/cm^3

Type:

_tp.Optional[float]

inchi

InChI string

Type:

_tp.Optional[str]

inchi_key

InChI key string

Type:

_tp.Optional[str]

cas_rn

CAS registry number

Type:

_tp.Optional[str]

mol_refs

references to 2D and 3D MOL-files

Type:

_tp.Dict[str, str]

data_refs

references to the webpages containing physical chemical data for the given compound

Type:

_tp.Dict[str, str]

nist_public_refs

references to webpages of other public NIST databases containing data for the given compound

Type:

_tp.Dict[str, str]

nist_subscription_refs

references to webpages of subscription NIST databases containing data for the given compound

Type:

_tp.Dict[str, str]

mol2D

text block of a MOL-file containing 2D atomic coordinates

Type:

_tp.Optional[str]

mol3D

text block of a MOL-file containing 3D atomic coordinates

Type:

_tp.Optional[str]

ir_specs

list pf IR Spectrum objects

Type:

_tp.List[Spectrum]

thz_specs

list pf THz Spectrum objects

Type:

_tp.List[Spectrum]

ms_specs

list pf MS Spectrum objects

Type:

_tp.List[Spectrum]

uv_specs

list pf UV-Vis Spectrum objects

Type:

_tp.List[Spectrum]

gas_chromat

list of Chromatogram objects

Type:

_tp.List[Chromatogram]

ID: str | None
name: str | None
synonyms: List[str]
formula: str | None
mol_weight: float | None
inchi: str | None
inchi_key: str | None
cas_rn: str | None
mol_refs: Dict[str, str]
data_refs: Dict[str, str]
nist_public_refs: Dict[str, str]
nist_subscription_refs: Dict[str, str]
mol2D: str | None
mol3D: str | None
ir_specs: List[Spectrum]
thz_specs: List[Spectrum]
ms_specs: List[Spectrum]
uv_specs: List[Spectrum]
gas_chromat: List[Chromatogram]
get_molfile(dim: int) None

Loads text block of 2D / 3D molfile

Parameters:

dim (int) – dimensionality of molfile (2D / 3D)

get_mol2D() None

Loads text block of 2D molfile

get_mol3D() None

Loads text block of 2D molfile

get_molfiles() None

Loads text block of all available molfiles

get_spectrum(spec_type: str, spec_idx: str) Spectrum

Loads spectrum of given type (IR / TZ / MS / UV) and index

Parameters:
  • spec_type (str) – spectrum type [ IR / TZ / MS / UV ]

  • spec_idx (str) – spectrum index

Returns:

wrapper for the text block of JDX-formatted spectrum

Return type:

Spectrum

get_spectra(spec_type: str) None

Loads all available spectra of given type (IR / TZ / MS / UV)

Parameters:

spec_type (str) – spectrum type [ IR / TZ / MS / UV ]

get_ir_spectra() None

Loads all available IR spectra

get_thz_spectra() None

Loads all available THz spectra

get_ms_spectra() None

Loads all available MS spectra

get_uv_spectra() None

Loads all available UV-Vis spectra

get_all_spectra() None

Loads all available spectra

save_spectra(spec_type: str, path_dir: str = './') None

Saves all spectra of given type to the specified folder

Parameters:
  • spec_type (str) – spectrum type [ IR / TZ / MS / UV ]

  • path_dir (str) – directory to save spectra

save_ir_spectra(path_dir: str = './') None

Saves IR spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_thz_spectra(path_dir: str = './') None

Saves IR spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_ms_spectra(path_dir: str = './') None

Saves mass spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_uv_spectra(path_dir: str = './') None

Saves all UV-Vis spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_all_spectra(path_dir: str = './') None

Saves all UV-Vis spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

get_gas_chromatography() None

Loads info on gas chromatography

save_gas_chromatography(path_dir: str = './', **kwargs) None

Saves all tables with data on gas chromatohraphy experiments

Parameters:

path_dir (str) – directory to save spectra

nistchempy.compound.compound_from_response(nr: NistResponse, request_config: RequestConfig | None = None) NistCompound | None

Initializes NistCompound object from the corresponding response

Parameters:
  • nr (_ncpr.NistResponse) – response to the GET request for a compound

  • request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters

Returns:

NistCompound object, and None if there are several compounds corresponding to the given ID

Return type:

_tp.Optional[NistCompound]

nistchempy.compound.get_compound(ID: str, request_config: RequestConfig | None = None) NistCompound | None

Loads the main info on the given NIST compound

Parameters:
  • ID (str) – NIST compound ID, CAS RN or InChI

  • request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters

Returns:

NistCompound object, and None if there are several compounds corresponding to the given ID

Return type:

_tp.Optional[NistCompound]

nistchempy.compound_list

Loads pre-prepared info on compounds structure and data availability

nistchempy.compound_list.get_all_data() DataFrame

Returns pandas dataframe containing info on all NIST Chem WebBook compounds

Returns:

dataframe containing pre-extracted compound info

Return type:

_pd.core.frame.DataFrame

nistchempy.requests

Request wrappers for NIST Chemistry WebBook APIs

nistchempy.requests.BASE_URL

base URL of the NIST Chemistry WebBook database

Type:

str

nistchempy.requests.SEARCH_URL

relative URL for the search API

Type:

str

nistchempy.requests.INCHI_URL

relative URL for obtaining NIST compounds via InChI

Type:

str

class nistchempy.requests.RequestConfig(delay: float = 0.0, max_attempts: int | None = 1, kwargs: dict = <factory>)

Bases: object

Contains parameters used by make_nist_request function

Attrubutes:

delay (float): time delay in seconds after getting response from NIST max_attempts (_tp.Optional[int]): if > 1, enables reattempting of getting response

in case of request errors or non-OK response

kwargs (dict): kwargs for requests.get inside of make_nist_request

delay: float = 0.0
max_attempts: int | None = 1
kwargs: dict
nistchempy.requests.fix_html(html: str) str

Fixes detected typos in html code of NIST Chem WebBook web pages

Parameters:

html (str) – text of html-file

Returns:

fixed html-file

Return type:

str

class nistchempy.requests.NistResponse(response: Response)

Bases: object

Describes response to the GET request to the NIST Chemistry WebBook

response

request’s response

Type:

_requests.models.Response

ok

True if request’s status code is less than 400

Type:

bool

content_type

content type of the response

Type:

_tp.Optional[str]

text

text of the response

Type:

_tp.Optional[str]

soup

BeautifulSoup object of the html response

Type:

_tp.Optional[_bs4.BeautifulSoup]

response: Response
ok: bool
content_type: str | None
text: str | None
soup: BeautifulSoup | None = None
nistchempy.requests.make_nist_request(url: str, params: dict = {}, config: RequestConfig | None = None) NistResponse

Dummy request to the NIST Chemistry WebBook

Parameters:
  • url (str) – URL of the NIST webpage

  • params (str) – GET request parameters

  • config (_tp.Optional[RequestConfig]) – additional requests.get parameters

Returns:

wrapper for the request’s response

Return type:

NistResponse

nistchempy.parsing

The module contains parsing-related functionality

nistchempy.parsing.is_compound_page(soup: BeautifulSoup) bool

Checks if html is a single compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

True for a single compound page

Return type:

bool

nistchempy.parsing.get_found_compounds(soup: BeautifulSoup) dict

Extracts IDs of found compounds for NIST Chemistry WebBook search

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

extracted NIST search parameters

Return type:

dict

nistchempy.parsing.parse_compound_page(soup: BeautifulSoup) dict | None

Parses Nist compound webpage and returns dictionary with extracted info

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

dictionary with extracted info and None if webpage does not correspond to single compound

Return type:

_tp.Optional[dict]

nistchempy.parsing.get_chromatography_table_refs(soup: BeautifulSoup) List[str]

Extracts references to large format tables containing info on chromatographic experiments

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

list of URLs

Return type:

_tp.List[str]

nistchempy.parsing.parse_chromatography_table(soup: BeautifulSoup) dict

Extracts references to large format tables containing info on chromatographic experiments

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

contains info to initialize nistchempy.compound.Chromatogram

Return type:

dict

nistchempy.parsing.compound

The module contains functionality to parse basic compound properties

nistchempy.parsing.compound.get_found_compounds(soup: BeautifulSoup) dict

Extracts IDs of found compounds for NIST Chemistry WebBook search

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

extracted NIST search parameters

Return type:

dict

nistchempy.parsing.compound.is_compound_page(soup: BeautifulSoup) bool

Checks if html is a single compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

True for a single compound page

Return type:

bool

nistchempy.parsing.compound.get_compound_id_from_comment(soup: BeautifulSoup) str | None

Extracts compound ID from commented field in Notes section

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

NIST compound ID, None if not detected

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_id_from_units_switch(soup: BeautifulSoup) str | None

Extracts compound ID from url to switch energy units

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

NIST compound ID, None if not detected

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_id_from_data_refs(soup: BeautifulSoup) str | None

Extracts compound ID from urls to compound data

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

NIST compound ID, None if not detected

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_id(soup: BeautifulSoup) str | None

Checks if html is a single compound page and returns NIST compound ID if so

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

NIST compound ID for single compound webpage and None otherwise

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_name(soup: BeautifulSoup) str

Extracts chemical name from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

chemical name of a NIST compound

Return type:

str

nistchempy.parsing.compound.get_compound_synonyms(soup: BeautifulSoup) List[str]

Extracts synonyms of chemical name from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

list of alternative chemical names

Return type:

_tp.List[str]

nistchempy.parsing.compound.get_compound_formula(soup: BeautifulSoup) str | None

Extracts chemical formula from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

chemical formula, and None if not found

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_mol_weight(soup: BeautifulSoup) float | None

Extracts molecular weight from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

molecular weight, and None if not found

Return type:

_tp.Optional[float]

nistchempy.parsing.compound.get_compound_inchi(soup: BeautifulSoup) str | None

Extracts InChI from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

InChI string, and None if not found

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_inchi_key(soup: BeautifulSoup) str | None

Extracts InChI key from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

InChI key string, and None if not found

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_casrn(soup: BeautifulSoup) str | None

Extracts CAS registry number from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

CAS RN, and None if not found

Return type:

_tp.Optional[str]

nistchempy.parsing.compound.get_compound_mol_refs(soup: BeautifulSoup) Dict[str, str]

Extracts dictionary of URLs for compound MOL-files from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

mol2D / mol3D are keys, URLs are values

Return type:

_tp.Dict[str, str]

nistchempy.parsing.compound.get_compound_data_refs(soup: BeautifulSoup) Dict[str, str]

Extracts dictionary of URLs for compound properties from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

property names are keys, URLs are values

Return type:

_tp.Dict[str, str]

nistchempy.parsing.compound.get_compound_nist_public_refs(soup: BeautifulSoup) Dict[str, str]

Extracts dictionary of URLs for compound properties stored at other public NIST sites from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

property names are keys, URLs are values

Return type:

_tp.Dict[str, str]

nistchempy.parsing.compound.get_compound_nist_subscription_refs(soup: BeautifulSoup) Dict[str, str]

Extracts dictionary of URLs for compound properties stored at other subscription NIST sites from compound page

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

property names are keys, URLs are values

Return type:

_tp.Dict[str, str]

nistchempy.parsing.compound.parse_compound_page(soup: BeautifulSoup) dict | None

Parses Nist compound webpage and returns dictionary with extracted info

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

dictionary with extracted info and None if webpage does not correspond to single compound

Return type:

_tp.Optional[dict]

nistchempy.parsing.gas_chromatography

The module contains functionality to parse gas chromatography info

nistchempy.parsing.gas_chromatography.get_chromatography_table_refs(soup: BeautifulSoup) List[str]

Extracts references to large format tables containing info on chromatographic experiments

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

list of URLs

Return type:

_tp.List[str]

nistchempy.parsing.gas_chromatography.get_literature_references(soup: BeautifulSoup) Dict[str, str]

Extracts literature references from the corresponding section

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

ref’s span id => full reference text

Return type:

_tp.Dict

nistchempy.parsing.gas_chromatography.parse_chromatography_table(soup: BeautifulSoup) dict

Extracts references to large format tables containing info on chromatographic experiments

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

contains info to initialize nistchempy.compound.Chromatogram

Return type:

dict

nistchempy.utils

Utility functions

nistchempy.utils.get_crawl_delay(useragent: str = '*', config: RequestConfig | None = None) float

Returns NIST Chemistry Webbook’s crawl delay for the given user agent

nistchempy.utils.useragent

user agent

Type:

str

Returns:

crawl delay in seconds

Return type:

float