{ "cells": [ { "cell_type": "markdown", "id": "969f0613-03a2-416e-84c7-c5560b73ae46", "metadata": {}, "source": [ "# Requests Configuration" ] }, { "cell_type": "markdown", "id": "fbd5817e-ef92-43d8-8704-bd2a6c4d0003", "metadata": {}, "source": [ "If a user intends to download a large amount of information, they may want to utilize additional features of the `requests.get` function, which is employed within the `nistchempy` package. These features include setting HTTP headers, configuring proxies, specifying timeouts for webpage loading, and introducing a delay between requests to the NIST Chemistry WebBook website. In this case, the user can utilize the `nistchempy.RequestConfig` object." ] }, { "cell_type": "markdown", "id": "ea572fb0-bf5d-4b37-bc3e-70f663bf6416", "metadata": {}, "source": [ "The `RequestConfig` object includes three attributes:\n", "\n", "- **delay**: Specifies the time delay (in seconds) between requests to the website.\n", "\n", "- **max_attempts**: Specifies maximal number of attempts to get a correct server response (reruns request in case of raised exceptions or bad HTTP response status codes).\n", "\n", "- **kwargs**: A dictionary of keyword arguments for the `requests.get` function, excluding `params`." ] }, { "cell_type": "code", "execution_count": 1, "id": "6984cc38-3913-4307-889d-d6c2291d4594", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import nistchempy as nist\n", "\n", "cfg = nist.RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})\n", "cfg" ] }, { "cell_type": "markdown", "id": "b1ea5ac5-aab9-43f3-a458-db967cec01f8", "metadata": {}, "source": [ "The default values are as follows: a `delay` of 0.0 seconds, 1 for `max_attempts`, and `timeout` of 30 seconds `kwargs`. These defaults are utilized by all relevant functions and objects:" ] }, { "cell_type": "code", "execution_count": 2, "id": "bca56d94-7771-4b52-b567-416f4e13e158", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RequestConfig(delay=0.0, max_attempts=1, kwargs={'timeout': 30.0})" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nist.RequestConfig()" ] }, { "cell_type": "markdown", "id": "b1d8beb6-763e-4dd0-9c5a-10054a47f516", "metadata": {}, "source": [ "A delay of 1 second is generally acceptable for most applications, but it's advisable to retrieve this information from the `robots.txt` file:" ] }, { "cell_type": "code", "execution_count": 3, "id": "50b5764d-3c09-4540-85c1-b3180029828d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delay = nist.get_crawl_delay()\n", "delay" ] }, { "cell_type": "markdown", "id": "bd06b8d4-f0a0-4381-a45d-26acc688cafa", "metadata": {}, "source": [ "If one uses `nistchempy.RequestConfig` to initalize `nistchempy.compound.Compound`, all methods of the latter one will use the same config to download relevant data, including spectra, MOL-files, gas chromatography data, etc. Also, configuration can be altered post factum. We can show it with Exception raised with low timeout value:" ] }, { "cell_type": "markdown", "id": "0f466c42-d208-4cd7-824f-df5374f25388", "metadata": {}, "source": [ "When using `nistchempy.RequestConfig` to initialize `nistchempy.compound.Compound`, all methods of the `Compound` class will utilize the same configuration to download relevant data, such as spectra, MOL files, gas chromatography data, and more. Additionally, the configuration can be modified after initialization. This can be demonstrated by raising an exception when a low timeout value is set:" ] }, { "cell_type": "code", "execution_count": 4, "id": "4669389f-d01a-47d7-bf48-30555d3da969", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "HTTPSConnectionPool(host='webbook.nist.gov', port=443): Read timed out. (read timeout=0.01)\n" ] } ], "source": [ "X = nist.get_compound('C1871585', cfg)\n", "X._request_config.kwargs['timeout'] = 0.01\n", "try:\n", " X.get_gas_chromatography()\n", "except Exception as e:\n", " print(e)" ] }, { "cell_type": "markdown", "id": "c582d5ef-5484-427f-b3fa-16bcfb0eb803", "metadata": {}, "source": [ "The same applies to the `nistchempy.run_search` function:" ] }, { "cell_type": "code", "execution_count": 5, "id": "f232dfc1-696b-4872-a866-dfceb9d25946", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cfg = nist.RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})\n", "s = nist.run_search('1,2*butadiene', 'name', request_config=cfg)\n", "s._request_config" ] }, { "cell_type": "code", "execution_count": 6, "id": "053b1483-20a5-43cc-bdfa-ea0454c342ca", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[NistCompound(ID=C590192), NistCompound(ID=C806713), NistCompound(ID=C1573586)]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.load_found_compounds()\n", "s.compounds" ] }, { "cell_type": "code", "execution_count": 7, "id": "fc83eafd-099c-47ee-84ad-60044bda307f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RequestConfig(delay=1.0, max_attempts=3, kwargs={'timeout': 30.0})" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = s.compounds[0]\n", "X._request_config" ] }, { "cell_type": "markdown", "id": "b720a6ef-fbd2-4d6d-a955-5abbb93af179", "metadata": {}, "source": [ "Please note that `RequestConfig` is safe for use in both multithreaded and multiprocess scenarios." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }