NistChemPy API
nistchempy
This package is a Python interface for the NIST Chemistry WebBook database that provides additional data for the efficient compound search and automatic retrievement of the stored physico-chemical data
- nistchempy.get_all_data() DataFrame
Returns pandas dataframe containing info on all NIST Chem WebBook compounds
- Returns:
dataframe containing pre-extracted compound info
- Return type:
_pd.core.frame.DataFrame
- nistchempy.get_compound(ID: str, request_config: RequestConfig | None = None) NistCompound | None
Loads the main info on the given NIST compound
- Parameters:
ID (str) – NIST compound ID, CAS RN or InChI
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
- Returns:
NistCompound object, and None if there are several compounds corresponding to the given ID
- Return type:
_tp.Optional[NistCompound]
- nistchempy.run_search(identifier: str, search_type: str, search_parameters: NistSearchParameters | None = None, request_config: RequestConfig | None = None, use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False) NistSearch
Searches compounds in NIST Chemistry WebBook
- Parameters:
identifier (str) – NIST compound ID / formula / name / inchi / CAS RN
search_type (str) – identifier type, available options are: - ‘formula’ - ‘name’ - ‘inchi’ - ‘cas’ - ‘id’
search_parameters (_tp.Optional[NistSearchParameters]) – search parameters; if provided, the following search parameter arguments are ignored
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
use_SI (bool) – if True, returns results in SI units. otherwise calories are used
match_isotopes (bool) – if True, exactly matches the specified isotopes (formula search only)
allow_other (bool) – if True, allows elements not specified in formula (formula search only)
allow_extra (bool) – if True, allows more atoms of elements in formula than specified (formula search only)
no_ion (bool) – if True, excludes ions from the search (formula search only)
cTG (bool) – if True, returns entries containing gas-phase thermodynamic data
cTC (bool) – if True, returns entries containing condensed-phase thermodynamic data
cTP (bool) – if True, returns entries containing phase-change thermodynamic data
cTR (bool) – if True, returns entries containing reaction thermodynamic data
cIE (bool) – if True, returns entries containing ion energetics thermodynamic data
cIC (bool) – if True, returns entries containing ion cluster thermodynamic data
cIR (bool) – if True, returns entries containing IR data
cTZ (bool) – if True, returns entries containing THz IR data
cMS (bool) – if True, returns entries containing MS data
cUV (bool) – if True, returns entries containing UV/Vis data
cGC (bool) – if True, returns entries containing gas chromatography data
cES (bool) – if True, returns entries containing vibrational and electronic energy levels
cDI (bool) – if True, returns entries containing constants of diatomic molecules
cSO (bool) – if True, returns entries containing info on Henry’s law
- Returns:
search object containing info on found compounds
- Return type:
NistSearch
- class nistchempy.NistSearchParameters(use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False)
Bases:
object
GET parameters for compound search of NIST Chemistry WebBook
- use_SI
if True, returns results in SI units. otherwise calories are used
- Type:
bool
- match_isotopes
if True, exactly matches the specified isotopes (formula search only)
- Type:
bool
- allow_other
if True, allows elements not specified in formula (formula search only)
- Type:
bool
- allow_extra
if True, allows more atoms of elements in formula than specified (formula search only)
- Type:
bool
- no_ion
if True, excludes ions from the search (formula search only)
- Type:
bool
- cTG
if True, returns entries containing gas-phase thermodynamic data
- Type:
bool
- cTC
if True, returns entries containing condensed-phase thermodynamic data
- Type:
bool
- cTP
if True, returns entries containing phase-change thermodynamic data
- Type:
bool
- cTR
if True, returns entries containing reaction thermodynamic data
- Type:
bool
- cIE
if True, returns entries containing ion energetics thermodynamic data
- Type:
bool
- cIC
if True, returns entries containing ion cluster thermodynamic data
- Type:
bool
- cIR
if True, returns entries containing IR data
- Type:
bool
- cTZ
if True, returns entries containing THz IR data
- Type:
bool
- cMS
if True, returns entries containing MS data
- Type:
bool
- cUV
if True, returns entries containing UV/Vis data
- Type:
bool
- cGC
if True, returns entries containing gas chromatography data
- Type:
bool
- cES
if True, returns entries containing vibrational and electronic energy levels
- Type:
bool
- cDI
if True, returns entries containing constants of diatomic molecules
- Type:
bool
- cSO
if True, returns entries containing info on Henry’s law
- Type:
bool
- use_SI: bool = True
- match_isotopes: bool = False
- allow_other: bool = False
- allow_extra: bool = False
- no_ion: bool = False
- cTG: bool = False
- cTC: bool = False
- cTP: bool = False
- cTR: bool = False
- cIE: bool = False
- cIC: bool = False
- cIR: bool = False
- cTZ: bool = False
- cMS: bool = False
- cUV: bool = False
- cGC: bool = False
- cES: bool = False
- cDI: bool = False
- cSO: bool = False
- get_request_parameters() dict
Returns dictionary containing GET parameters
- Returns:
dictionary of GET parameters relevant to the search
- Return type:
dict
- nistchempy.get_search_parameters() Dict[str, str]
Returns search parameters and the corresponding keys
- Returns:
{short_key => search_parameter}
- Return type:
_tp.Dict[str, str]
- nistchempy.print_search_parameters() None
Prints available search parameters
- class nistchempy.RequestConfig(delay: float = 0.0, max_attempts: int | None = 1, kwargs: dict = <factory>)
Bases:
object
Contains parameters used by make_nist_request function
- Attrubutes:
delay (float): time delay in seconds after getting response from NIST max_attempts (_tp.Optional[int]): if > 1, enables reattempting of getting response
in case of request errors or non-OK response
kwargs (dict): kwargs for requests.get inside of make_nist_request
- delay: float = 0.0
- max_attempts: int | None = 1
- kwargs: dict
- nistchempy.get_crawl_delay(useragent: str = '*', config: RequestConfig | None = None) float
Returns NIST Chemistry Webbook’s crawl delay for the given user agent
- nistchempy.useragent
user agent
- Type:
str
- Returns:
crawl delay in seconds
- Return type:
float
nistchempy.compound
The module contains compound-related functionality
- nistchempy.compound.SPEC_TYPES
dictionary containing abbreviations for spectra types used in compound page (keys) or urls for downloading JDX-files (values)
- Type:
dict
- nistchempy.compound.urlparse(url, scheme='', allow_fragments=True)
Parse a URL into 6 components: <scheme>://<netloc>/<path>;<params>?<query>#<fragment>
The result is a named 6-tuple with fields corresponding to the above. It is either a ParseResult or ParseResultBytes object, depending on the type of the url parameter.
The username, password, hostname, and port sub-components of netloc can also be accessed as attributes of the returned object.
The scheme argument provides the default value of the scheme component when no scheme is found in url.
If allow_fragments is False, no attempt is made to separate the fragment component from the previous component, which can be either path or query.
Note that % escapes are not expanded.
- nistchempy.compound.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
Parse a query given as a string argument.
Arguments:
qs: percent-encoded query string to be parsed
- keep_blank_values: flag indicating whether blank values in
percent-encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included.
- strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored. If true, errors raise a ValueError exception.
- encoding and errors: specify how to decode percent-encoded sequences
into Unicode characters, as accepted by the bytes.decode() method.
- max_num_fields: int. If set, then throws a ValueError if there
are more than n fields read by parse_qsl().
- separator: str. The symbol to use for separating the query arguments.
Defaults to &.
Returns a dictionary.
- class nistchempy.compound.Spectrum(compound: NistCompound, spec_type: str, spec_idx: str, jdx_text: str)
Bases:
object
Wrapper for IR, MS, and UV-Vis extracted from NIST Chemistry WebBook
- compound
parent NistCompound object
- Type:
NistCompound
- spec_type
IR / TZ (THz IR) / MS / UV (UV-Vis)
- Type:
str
- spec_idx
index of the spectrum
- Type:
str
- jdx_text
text block of the corresponding JDX-file
- Type:
str
- compound: NistCompound
- spec_type: str
- spec_idx: str
- jdx_text: str
- save(name: str = None, path_dir: str = None) None
Saves spectrum in JDX format
- name
custom filename (default name is formed from compound ID, spectrum type and index)
- Type:
str
- path_dir
directory where output file will be saved
- Type:
str
- class nistchempy.compound.Chromatogram(compound: NistCompound, ri_type: str, column_type: str, temp_regime: str, data: DataFrame)
Bases:
object
Wrapper chromatography data extracted from NIST Chemistry WebBook
- compound
parent NistCompound object
- Type:
NistCompound
- ri_type
type of retention index: Kovatz, van den Dool & Kratz, etc.
- Type:
str
- column_type
polar / non-polar
- Type:
str
- temp_regime
temperature regime: isothermal / ramp / custom
- Type:
str
- data
experimental data
- Type:
_pd.core.frame.DataFrame
- compound: NistCompound
- ri_type: str
- column_type: str
- temp_regime: str
- data: DataFrame
- save(name: str = None, path_dir: str = None, **kwargs) None
Saves chromatograms in CSV format
- name
custom filename (default name is formed from compound ID, spectrum type and index)
- Type:
str
- path_dir
directory where output file will be saved
- Type:
str
- kwargs
parameters for pandas DataFrame to_csv method
- class nistchempy.compound.NistCompound(_request_config: RequestConfig, _nist_response: NistResponse, ID: str | None, name: str | None, synonyms: List[str], formula: str | None, mol_weight: float | None, inchi: str | None, inchi_key: str | None, cas_rn: str | None, mol_refs: Dict[str, str], data_refs: Dict[str, str], nist_public_refs: Dict[str, str], nist_subscription_refs: Dict[str, str])
Bases:
object
Stores info on NIST Chemistry WebBook compound
- _request_config
additional requests.get parameters
- Type:
_ncpr.RequestConfig
- _nist_response
response to the GET request
- Type:
_ncpr.NistResponse
- ID
NIST compound ID
- Type:
_tp.Optional[str]
- name
chemical name
- Type:
_tp.Optional[str]
- synonyms
synonyms of the chemical name
- Type:
_tp.List[str]
- formula
chemical formula
- Type:
_tp.Optional[str]
- mol_weight
molecular weigth, g/cm^3
- Type:
_tp.Optional[float]
- inchi
InChI string
- Type:
_tp.Optional[str]
- inchi_key
InChI key string
- Type:
_tp.Optional[str]
- cas_rn
CAS registry number
- Type:
_tp.Optional[str]
- mol_refs
references to 2D and 3D MOL-files
- Type:
_tp.Dict[str, str]
- data_refs
references to the webpages containing physical chemical data for the given compound
- Type:
_tp.Dict[str, str]
- nist_public_refs
references to webpages of other public NIST databases containing data for the given compound
- Type:
_tp.Dict[str, str]
- nist_subscription_refs
references to webpages of subscription NIST databases containing data for the given compound
- Type:
_tp.Dict[str, str]
- mol2D
text block of a MOL-file containing 2D atomic coordinates
- Type:
_tp.Optional[str]
- mol3D
text block of a MOL-file containing 3D atomic coordinates
- Type:
_tp.Optional[str]
- ir_specs
list pf IR Spectrum objects
- Type:
_tp.List[Spectrum]
- thz_specs
list pf THz Spectrum objects
- Type:
_tp.List[Spectrum]
- ms_specs
list pf MS Spectrum objects
- Type:
_tp.List[Spectrum]
- uv_specs
list pf UV-Vis Spectrum objects
- Type:
_tp.List[Spectrum]
- gas_chromat
list of Chromatogram objects
- Type:
_tp.List[Chromatogram]
- ID: str | None
- name: str | None
- synonyms: List[str]
- formula: str | None
- mol_weight: float | None
- inchi: str | None
- inchi_key: str | None
- cas_rn: str | None
- mol_refs: Dict[str, str]
- data_refs: Dict[str, str]
- nist_public_refs: Dict[str, str]
- nist_subscription_refs: Dict[str, str]
- mol2D: str | None
- mol3D: str | None
- ir_specs: List[Spectrum]
- thz_specs: List[Spectrum]
- ms_specs: List[Spectrum]
- uv_specs: List[Spectrum]
- gas_chromat: List[Chromatogram]
- get_molfile(dim: int) None
Loads text block of 2D / 3D molfile
- Parameters:
dim (int) – dimensionality of molfile (2D / 3D)
- get_mol2D() None
Loads text block of 2D molfile
- get_mol3D() None
Loads text block of 2D molfile
- get_molfiles() None
Loads text block of all available molfiles
- get_spectrum(spec_type: str, spec_idx: str) Spectrum
Loads spectrum of given type (IR / TZ / MS / UV) and index
- Parameters:
spec_type (str) – spectrum type [ IR / TZ / MS / UV ]
spec_idx (str) – spectrum index
- Returns:
wrapper for the text block of JDX-formatted spectrum
- Return type:
Spectrum
- get_spectra(spec_type: str) None
Loads all available spectra of given type (IR / TZ / MS / UV)
- Parameters:
spec_type (str) – spectrum type [ IR / TZ / MS / UV ]
- get_ir_spectra() None
Loads all available IR spectra
- get_thz_spectra() None
Loads all available THz spectra
- get_ms_spectra() None
Loads all available MS spectra
- get_uv_spectra() None
Loads all available UV-Vis spectra
- get_all_spectra() None
Loads all available spectra
- save_spectra(spec_type: str, path_dir: str = './') None
Saves all spectra of given type to the specified folder
- Parameters:
spec_type (str) – spectrum type [ IR / TZ / MS / UV ]
path_dir (str) – directory to save spectra
- save_ir_spectra(path_dir: str = './') None
Saves IR spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_thz_spectra(path_dir: str = './') None
Saves IR spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_ms_spectra(path_dir: str = './') None
Saves mass spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_uv_spectra(path_dir: str = './') None
Saves all UV-Vis spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_all_spectra(path_dir: str = './') None
Saves all UV-Vis spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- get_gas_chromatography() None
Loads info on gas chromatography
- save_gas_chromatography(path_dir: str = './', **kwargs) None
Saves all tables with data on gas chromatohraphy experiments
- Parameters:
path_dir (str) – directory to save spectra
- nistchempy.compound.compound_from_response(nr: NistResponse, request_config: RequestConfig | None = None) NistCompound | None
Initializes NistCompound object from the corresponding response
- Parameters:
nr (_ncpr.NistResponse) – response to the GET request for a compound
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
- Returns:
NistCompound object, and None if there are several compounds corresponding to the given ID
- Return type:
_tp.Optional[NistCompound]
- nistchempy.compound.get_compound(ID: str, request_config: RequestConfig | None = None) NistCompound | None
Loads the main info on the given NIST compound
- Parameters:
ID (str) – NIST compound ID, CAS RN or InChI
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
- Returns:
NistCompound object, and None if there are several compounds corresponding to the given ID
- Return type:
_tp.Optional[NistCompound]
nistchempy.search
The module contains search-related functionality
- nistchempy.search.get_search_parameters() Dict[str, str]
Returns search parameters and the corresponding keys
- Returns:
{short_key => search_parameter}
- Return type:
_tp.Dict[str, str]
- nistchempy.search.print_search_parameters() None
Prints available search parameters
- class nistchempy.search.NistSearchParameters(use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False)
Bases:
object
GET parameters for compound search of NIST Chemistry WebBook
- use_SI
if True, returns results in SI units. otherwise calories are used
- Type:
bool
- match_isotopes
if True, exactly matches the specified isotopes (formula search only)
- Type:
bool
- allow_other
if True, allows elements not specified in formula (formula search only)
- Type:
bool
- allow_extra
if True, allows more atoms of elements in formula than specified (formula search only)
- Type:
bool
- no_ion
if True, excludes ions from the search (formula search only)
- Type:
bool
- cTG
if True, returns entries containing gas-phase thermodynamic data
- Type:
bool
- cTC
if True, returns entries containing condensed-phase thermodynamic data
- Type:
bool
- cTP
if True, returns entries containing phase-change thermodynamic data
- Type:
bool
- cTR
if True, returns entries containing reaction thermodynamic data
- Type:
bool
- cIE
if True, returns entries containing ion energetics thermodynamic data
- Type:
bool
- cIC
if True, returns entries containing ion cluster thermodynamic data
- Type:
bool
- cIR
if True, returns entries containing IR data
- Type:
bool
- cTZ
if True, returns entries containing THz IR data
- Type:
bool
- cMS
if True, returns entries containing MS data
- Type:
bool
- cUV
if True, returns entries containing UV/Vis data
- Type:
bool
- cGC
if True, returns entries containing gas chromatography data
- Type:
bool
- cES
if True, returns entries containing vibrational and electronic energy levels
- Type:
bool
- cDI
if True, returns entries containing constants of diatomic molecules
- Type:
bool
- cSO
if True, returns entries containing info on Henry’s law
- Type:
bool
- use_SI: bool = True
- match_isotopes: bool = False
- allow_other: bool = False
- allow_extra: bool = False
- no_ion: bool = False
- cTG: bool = False
- cTC: bool = False
- cTP: bool = False
- cTR: bool = False
- cIE: bool = False
- cIC: bool = False
- cIR: bool = False
- cTZ: bool = False
- cMS: bool = False
- cUV: bool = False
- cGC: bool = False
- cES: bool = False
- cDI: bool = False
- cSO: bool = False
- get_request_parameters() dict
Returns dictionary containing GET parameters
- Returns:
dictionary of GET parameters relevant to the search
- Return type:
dict
- class nistchempy.search.NistSearch(_request_config: RequestConfig, _nist_response: NistResponse, search_parameters: NistSearchParameters, compound_ids: List[str], success: bool, lost: bool)
Bases:
object
Results of the compound search in NIST Chemistry WebBook
- _request_config
additional requests.get parameters
- Type:
_ncpr.RequestConfig
- _nist_response
NIST search response
- Type:
NistResponse
- search_parameters
used search parameters
- Type:
NistSearchParameters
- compound_ids
NIST IDs of found compounds
- Type:
_tp.List[str]
- compounds
NistCompound objects of found compounds
- Type:
_tp.List[_compound.NistCompound]
- success
True if search request was successful
- Type:
bool
- num_compounds
number of found compounds
- Type:
int
- lost
True if search returns less compounds than there are in the database
- Type:
bool
- search_parameters: NistSearchParameters
- compound_ids: List[str]
- compounds: List[NistCompound]
- success: bool
- num_compounds: int
- lost: bool
- load_found_compounds() None
Loads found compounds
- nistchempy.search.run_search(identifier: str, search_type: str, search_parameters: NistSearchParameters | None = None, request_config: RequestConfig | None = None, use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False) NistSearch
Searches compounds in NIST Chemistry WebBook
- Parameters:
identifier (str) – NIST compound ID / formula / name / inchi / CAS RN
search_type (str) – identifier type, available options are: - ‘formula’ - ‘name’ - ‘inchi’ - ‘cas’ - ‘id’
search_parameters (_tp.Optional[NistSearchParameters]) – search parameters; if provided, the following search parameter arguments are ignored
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
use_SI (bool) – if True, returns results in SI units. otherwise calories are used
match_isotopes (bool) – if True, exactly matches the specified isotopes (formula search only)
allow_other (bool) – if True, allows elements not specified in formula (formula search only)
allow_extra (bool) – if True, allows more atoms of elements in formula than specified (formula search only)
no_ion (bool) – if True, excludes ions from the search (formula search only)
cTG (bool) – if True, returns entries containing gas-phase thermodynamic data
cTC (bool) – if True, returns entries containing condensed-phase thermodynamic data
cTP (bool) – if True, returns entries containing phase-change thermodynamic data
cTR (bool) – if True, returns entries containing reaction thermodynamic data
cIE (bool) – if True, returns entries containing ion energetics thermodynamic data
cIC (bool) – if True, returns entries containing ion cluster thermodynamic data
cIR (bool) – if True, returns entries containing IR data
cTZ (bool) – if True, returns entries containing THz IR data
cMS (bool) – if True, returns entries containing MS data
cUV (bool) – if True, returns entries containing UV/Vis data
cGC (bool) – if True, returns entries containing gas chromatography data
cES (bool) – if True, returns entries containing vibrational and electronic energy levels
cDI (bool) – if True, returns entries containing constants of diatomic molecules
cSO (bool) – if True, returns entries containing info on Henry’s law
- Returns:
search object containing info on found compounds
- Return type:
NistSearch
nistchempy.compound_list
Loads pre-prepared info on compounds structure and data availability
- nistchempy.compound_list.get_all_data() DataFrame
Returns pandas dataframe containing info on all NIST Chem WebBook compounds
- Returns:
dataframe containing pre-extracted compound info
- Return type:
_pd.core.frame.DataFrame
nistchempy.requests
Request wrappers for NIST Chemistry WebBook APIs
- nistchempy.requests.BASE_URL
base URL of the NIST Chemistry WebBook database
- Type:
str
- nistchempy.requests.SEARCH_URL
relative URL for the search API
- Type:
str
- nistchempy.requests.INCHI_URL
relative URL for obtaining NIST compounds via InChI
- Type:
str
- class nistchempy.requests.RequestConfig(delay: float = 0.0, max_attempts: int | None = 1, kwargs: dict = <factory>)
Bases:
object
Contains parameters used by make_nist_request function
- Attrubutes:
delay (float): time delay in seconds after getting response from NIST max_attempts (_tp.Optional[int]): if > 1, enables reattempting of getting response
in case of request errors or non-OK response
kwargs (dict): kwargs for requests.get inside of make_nist_request
- delay: float = 0.0
- max_attempts: int | None = 1
- kwargs: dict
- nistchempy.requests.fix_html(html: str) str
Fixes detected typos in html code of NIST Chem WebBook web pages
- Parameters:
html (str) – text of html-file
- Returns:
fixed html-file
- Return type:
str
- class nistchempy.requests.NistResponse(response: Response)
Bases:
object
Describes response to the GET request to the NIST Chemistry WebBook
- response
request’s response
- Type:
_requests.models.Response
- ok
True if request’s status code is less than 400
- Type:
bool
- content_type
content type of the response
- Type:
_tp.Optional[str]
- text
text of the response
- Type:
_tp.Optional[str]
- soup
BeautifulSoup object of the html response
- Type:
_tp.Optional[_bs4.BeautifulSoup]
- response: Response
- ok: bool
- content_type: str | None
- text: str | None
- soup: BeautifulSoup | None = None
- nistchempy.requests.make_nist_request(url: str, params: dict = {}, config: RequestConfig | None = None) NistResponse
Dummy request to the NIST Chemistry WebBook
- Parameters:
url (str) – URL of the NIST webpage
params (str) – GET request parameters
config (_tp.Optional[RequestConfig]) – additional requests.get parameters
- Returns:
wrapper for the request’s response
- Return type:
NistResponse
nistchempy.parsing
The module contains parsing-related functionality
- nistchempy.parsing.is_compound_page(soup: BeautifulSoup) bool
Checks if html is a single compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
True for a single compound page
- Return type:
bool
- nistchempy.parsing.get_found_compounds(soup: BeautifulSoup) dict
Extracts IDs of found compounds for NIST Chemistry WebBook search
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
extracted NIST search parameters
- Return type:
dict
- nistchempy.parsing.parse_compound_page(soup: BeautifulSoup) dict | None
Parses Nist compound webpage and returns dictionary with extracted info
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
dictionary with extracted info and None if webpage does not correspond to single compound
- Return type:
_tp.Optional[dict]
- nistchempy.parsing.get_chromatography_table_refs(soup: BeautifulSoup) List[str]
Extracts references to large format tables containing info on chromatographic experiments
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
list of URLs
- Return type:
_tp.List[str]
- nistchempy.parsing.parse_chromatography_table(soup: BeautifulSoup) dict
Extracts references to large format tables containing info on chromatographic experiments
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
contains info to initialize nistchempy.compound.Chromatogram
- Return type:
dict
nistchempy.parsing.compound
The module contains functionality to parse basic compound properties
- nistchempy.parsing.compound.get_found_compounds(soup: BeautifulSoup) dict
Extracts IDs of found compounds for NIST Chemistry WebBook search
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
extracted NIST search parameters
- Return type:
dict
- nistchempy.parsing.compound.is_compound_page(soup: BeautifulSoup) bool
Checks if html is a single compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
True for a single compound page
- Return type:
bool
- nistchempy.parsing.compound.get_compound_id_from_comment(soup: BeautifulSoup) str | None
Extracts compound ID from commented field in Notes section
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
NIST compound ID, None if not detected
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_id_from_units_switch(soup: BeautifulSoup) str | None
Extracts compound ID from url to switch energy units
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
NIST compound ID, None if not detected
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_id_from_data_refs(soup: BeautifulSoup) str | None
Extracts compound ID from urls to compound data
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
NIST compound ID, None if not detected
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_id(soup: BeautifulSoup) str | None
Checks if html is a single compound page and returns NIST compound ID if so
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
NIST compound ID for single compound webpage and None otherwise
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_name(soup: BeautifulSoup) str
Extracts chemical name from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
chemical name of a NIST compound
- Return type:
str
- nistchempy.parsing.compound.get_compound_synonyms(soup: BeautifulSoup) List[str]
Extracts synonyms of chemical name from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
list of alternative chemical names
- Return type:
_tp.List[str]
- nistchempy.parsing.compound.get_compound_formula(soup: BeautifulSoup) str | None
Extracts chemical formula from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
chemical formula, and None if not found
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_mol_weight(soup: BeautifulSoup) float | None
Extracts molecular weight from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
molecular weight, and None if not found
- Return type:
_tp.Optional[float]
- nistchempy.parsing.compound.get_compound_inchi(soup: BeautifulSoup) str | None
Extracts InChI from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
InChI string, and None if not found
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_inchi_key(soup: BeautifulSoup) str | None
Extracts InChI key from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
InChI key string, and None if not found
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_casrn(soup: BeautifulSoup) str | None
Extracts CAS registry number from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
CAS RN, and None if not found
- Return type:
_tp.Optional[str]
- nistchempy.parsing.compound.get_compound_mol_refs(soup: BeautifulSoup) Dict[str, str]
Extracts dictionary of URLs for compound MOL-files from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
mol2D / mol3D are keys, URLs are values
- Return type:
_tp.Dict[str, str]
- nistchempy.parsing.compound.get_compound_data_refs(soup: BeautifulSoup) Dict[str, str]
Extracts dictionary of URLs for compound properties from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
property names are keys, URLs are values
- Return type:
_tp.Dict[str, str]
- nistchempy.parsing.compound.get_compound_nist_public_refs(soup: BeautifulSoup) Dict[str, str]
Extracts dictionary of URLs for compound properties stored at other public NIST sites from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
property names are keys, URLs are values
- Return type:
_tp.Dict[str, str]
- nistchempy.parsing.compound.get_compound_nist_subscription_refs(soup: BeautifulSoup) Dict[str, str]
Extracts dictionary of URLs for compound properties stored at other subscription NIST sites from compound page
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
property names are keys, URLs are values
- Return type:
_tp.Dict[str, str]
- nistchempy.parsing.compound.parse_compound_page(soup: BeautifulSoup) dict | None
Parses Nist compound webpage and returns dictionary with extracted info
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
dictionary with extracted info and None if webpage does not correspond to single compound
- Return type:
_tp.Optional[dict]
nistchempy.parsing.gas_chromatography
The module contains functionality to parse gas chromatography info
- nistchempy.parsing.gas_chromatography.get_chromatography_table_refs(soup: BeautifulSoup) List[str]
Extracts references to large format tables containing info on chromatographic experiments
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
list of URLs
- Return type:
_tp.List[str]
- nistchempy.parsing.gas_chromatography.get_literature_references(soup: BeautifulSoup) Dict[str, str]
Extracts literature references from the corresponding section
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
ref’s span id => full reference text
- Return type:
_tp.Dict
- nistchempy.parsing.gas_chromatography.parse_chromatography_table(soup: BeautifulSoup) dict
Extracts references to large format tables containing info on chromatographic experiments
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
contains info to initialize nistchempy.compound.Chromatogram
- Return type:
dict
nistchempy.utils
Utility functions
- nistchempy.utils.get_crawl_delay(useragent: str = '*', config: RequestConfig | None = None) float
Returns NIST Chemistry Webbook’s crawl delay for the given user agent
- nistchempy.utils.useragent
user agent
- Type:
str
- Returns:
crawl delay in seconds
- Return type:
float