Requests Configuration
If a user intends to download a large amount of information, they may want to utilize additional features of the requests.get
function, which is employed within the nistchempy
package. These features include setting HTTP headers, configuring proxies, specifying timeouts for webpage loading, and introducing a delay between requests to the NIST Chemistry WebBook website. In this case, the user can utilize the nistchempy.RequestConfig
object.
The RequestConfig
object includes three attributes:
delay: Specifies the time delay (in seconds) between requests to the website.
max_attempts: Specifies maximal number of attempts to get a correct server response (reruns request in case of raised exceptions or bad HTTP response status codes).
kwargs: A dictionary of keyword arguments for the
requests.get
function, excludingparams
.
[1]:
import nistchempy as nist
cfg = nist.RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
cfg
[1]:
RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
The default values are as follows: a delay
of 0.0 seconds, 1 for max_attempts
, and timeout
of 30 seconds kwargs
. These defaults are utilized by all relevant functions and objects:
[2]:
nist.RequestConfig()
[2]:
RequestConfig(delay=0.0, max_attempts=1, kwargs={'timeout': 30.0})
A delay of 1 second is generally acceptable for most applications, but it’s advisable to retrieve this information from the robots.txt
file:
[3]:
delay = nist.get_crawl_delay()
delay
[3]:
5
If one uses nistchempy.RequestConfig
to initalize nistchempy.compound.Compound
, all methods of the latter one will use the same config to download relevant data, including spectra, MOL-files, gas chromatography data, etc. Also, configuration can be altered post factum. We can show it with Exception raised with low timeout value:
When using nistchempy.RequestConfig
to initialize nistchempy.compound.Compound
, all methods of the Compound
class will utilize the same configuration to download relevant data, such as spectra, MOL files, gas chromatography data, and more. Additionally, the configuration can be modified after initialization. This can be demonstrated by raising an exception when a low timeout value is set:
[4]:
X = nist.get_compound('C1871585', cfg)
X._request_config.kwargs['timeout'] = 0.01
try:
X.get_gas_chromatography()
except Exception as e:
print(e)
HTTPSConnectionPool(host='webbook.nist.gov', port=443): Read timed out. (read timeout=0.01)
The same applies to the nistchempy.run_search
function:
[5]:
cfg = nist.RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
s = nist.run_search('1,2*butadiene', 'name', request_config=cfg)
s._request_config
[5]:
RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
[6]:
s.load_found_compounds()
s.compounds
[6]:
[NistCompound(ID=C590192), NistCompound(ID=C806713), NistCompound(ID=C1573586)]
[7]:
X = s.compounds[0]
X._request_config
[7]:
RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})
Please note that RequestConfig
is safe for use in both multithreaded and multiprocess scenarios.