Requests Configuration

If a user intends to download a large amount of information, they may want to utilize additional features of the requests.get function, which is employed within the nistchempy package. These features include setting HTTP headers, configuring proxies, specifying timeouts for webpage loading, and introducing a delay between requests to the NIST Chemistry WebBook website. In this case, the user can utilize the nistchempy.RequestConfig object.

The RequestConfig object includes three attributes:

  • delay: Specifies the time delay (in seconds) between requests to the website.

  • max_attempts: Specifies maximal number of attempts to get a correct server response (reruns request in case of raised exceptions or bad HTTP response status codes).

  • kwargs: A dictionary of keyword arguments for the requests.get function, excluding params.

[1]:
import nistchempy as nist

cfg = nist.RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
cfg
[1]:
RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})

The default values are as follows: a delay of 0.0 seconds, 1 for max_attempts, and timeout of 30 seconds kwargs. These defaults are utilized by all relevant functions and objects:

[2]:
nist.RequestConfig()
[2]:
RequestConfig(delay=0.0, max_attempts=1, kwargs={'timeout': 30.0})

A delay of 1 second is generally acceptable for most applications, but it’s advisable to retrieve this information from the robots.txt file:

[3]:
delay = nist.get_crawl_delay()
delay
[3]:
5

If one uses nistchempy.RequestConfig to initalize nistchempy.compound.Compound, all methods of the latter one will use the same config to download relevant data, including spectra, MOL-files, gas chromatography data, etc. Also, configuration can be altered post factum. We can show it with Exception raised with low timeout value:

When using nistchempy.RequestConfig to initialize nistchempy.compound.Compound, all methods of the Compound class will utilize the same configuration to download relevant data, such as spectra, MOL files, gas chromatography data, and more. Additionally, the configuration can be modified after initialization. This can be demonstrated by raising an exception when a low timeout value is set:

[4]:
X = nist.get_compound('C1871585', cfg)
X._request_config.kwargs['timeout'] = 0.01
try:
    X.get_gas_chromatography()
except Exception as e:
    print(e)
HTTPSConnectionPool(host='webbook.nist.gov', port=443): Read timed out. (read timeout=0.01)

The same applies to the nistchempy.run_search function:

[5]:
cfg = nist.RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
s = nist.run_search('1,2*butadiene', 'name', request_config=cfg)
s._request_config
[5]:
RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
[6]:
s.load_found_compounds()
s.compounds
[6]:
[NistCompound(ID=C590192), NistCompound(ID=C806713), NistCompound(ID=C1573586)]
[7]:
X = s.compounds[0]
X._request_config
[7]:
RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})

Please note that RequestConfig is safe for use in both multithreaded and multiprocess scenarios.