{ "cells": [ { "cell_type": "markdown", "id": "239e7050-7764-4555-8ccd-59a4d434e1d8", "metadata": {}, "source": [ "# Basic Search" ] }, { "cell_type": "markdown", "id": "31604519-9cfe-4045-b8bf-9da518253d1b", "metadata": {}, "source": [ "## Basic search\n", "\n", "There are five available search types:\n", "\n", " - by [name](https://webbook.nist.gov/chemistry/name-ser/) (`search_type = 'name'`);\n", "\n", " - by [InChI](https://webbook.nist.gov/chemistry/inchi-ser/) (`search_type = 'inchi'`);\n", "\n", " - by [CAS RN](https://webbook.nist.gov/chemistry/cas-ser/) (`search_type = 'cas'`);\n", "\n", " - by [chemical formula](https://webbook.nist.gov/chemistry/form-ser/) (`search_type = 'formula'`);\n", "\n", " - and by NIST Compound ID (`search_type = 'id'`):\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "aa15ff6a-cb87-45e7-8048-e3f4a81bdbe7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NistSearch(success=True, num_compounds=10, lost=False)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import nistchempy as nist\n", "\n", "s = nist.run_search(identifier = '1,2,3*-butane', search_type = 'name')\n", "s" ] }, { "cell_type": "markdown", "id": "4778b207-50c9-42f4-9607-dd9c2014d6a4", "metadata": {}, "source": [ "List of found compounds is stored in the `compound_ids` attribute, and the compounds can be retrieved via the `load_found_compounds` method:" ] }, { "cell_type": "code", "execution_count": 2, "id": "54164007-8e62-40da-9be6-4ab08e9b0802", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['C1871585',\n", " 'C18338404',\n", " 'C298180',\n", " 'C1529686',\n", " 'C632053',\n", " 'C13138517',\n", " 'C62521691',\n", " 'C76397234',\n", " 'C101257798',\n", " 'C1464535']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.compound_ids" ] }, { "cell_type": "code", "execution_count": 3, "id": "9a2f8642-03f7-42d5-9736-7d56ac151633", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[NistCompound(ID=C1871585),\n", " NistCompound(ID=C18338404),\n", " NistCompound(ID=C298180),\n", " NistCompound(ID=C1529686),\n", " NistCompound(ID=C632053),\n", " NistCompound(ID=C13138517),\n", " NistCompound(ID=C62521691),\n", " NistCompound(ID=C76397234),\n", " NistCompound(ID=C101257798),\n", " NistCompound(ID=C1464535)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.load_found_compounds()\n", "s.compounds" ] }, { "cell_type": "markdown", "id": "f53f8bfc-f14e-4acc-93c0-91753944dd6d", "metadata": {}, "source": [ "## Search Parameters" ] }, { "cell_type": "markdown", "id": "26fb4af0-5bf0-4081-bfb5-fd8128df0cb6", "metadata": {}, "source": [ "In addition to the main identifier, you can limit the search using several parameters, which can be using the `print_search_params` function:" ] }, { "cell_type": "code", "execution_count": 4, "id": "bfc9a41e-0574-4667-a710-7845287345c8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "use_SI : Units for thermodynamic data, \"SI\" if True and \"calories\" if False\n", "match_isotopes : Exactly match the specified isotopes (formula search only)\n", "allow_other : Allow elements not specified in formula (formula search only)\n", "allow_extra : Allow more atoms of elements in formula than specified (formula search only)\n", "no_ion : Exclude ions from the search (formula search only)\n", "cTG : Gas phase thermochemistry data\n", "cTC : Condensed phase thermochemistry data\n", "cTP : Phase change data\n", "cTR : Reaction thermochemistry data\n", "cIE : Gas phase ion energetics data\n", "cIC : Ion clustering data\n", "cIR : IR Spectrum\n", "cTZ : THz IR spectrum\n", "cMS : Mass spectrum (electron ionization)\n", "cUV : UV/Visible spectrum\n", "cGC : Gas Chromatography\n", "cES : Vibrational and/or electronic energy levels\n", "cDI : Constants of diatomic molecules\n", "cSO : Henry's Law data\n" ] } ], "source": [ "nist.print_search_parameters()" ] }, { "cell_type": "markdown", "id": "73eccd8e-d428-4665-b684-dc0d90ba6a2c", "metadata": {}, "source": [ "These options can be specified as arguments of the `nist.search` function or defined in `nist.NistSearchParameters` object:" ] }, { "cell_type": "code", "execution_count": 5, "id": "7c02fd86-2ef5-437d-8e1c-fa8fad99a886", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['C110565', 'C110576', 'C1190223', 'C4028562', 'C4279225', 'C541333', 'C594376', 'C616217', 'C7581977', 'C760236', 'C764410', 'C821103', 'C926578']\n", "['C110565', 'C110576', 'C1190223', 'C4028562', 'C4279225', 'C541333', 'C594376', 'C616217', 'C7581977', 'C760236', 'C764410', 'C821103', 'C926578']\n" ] } ], "source": [ "# query\n", "identifier = 'C4H?Cl2'\n", "search_type = 'formula'\n", "\n", "# direct search (entries with IR spectra)\n", "s1 = nist.run_search(identifier, search_type, cIR = True)\n", "\n", "# search with NistSearchParameters\n", "params = nist.NistSearchParameters(cIR = True)\n", "s2 = nist.run_search(identifier, search_type, params)\n", "\n", "# compare searches\n", "print(sorted(s1.compound_ids))\n", "print(sorted(s2.compound_ids))" ] }, { "cell_type": "markdown", "id": "b49baaf8-95a3-4460-b633-da6ed52055aa", "metadata": {}, "source": [ "## Limit of Found Compounds\n", "\n", "NIST Chemistry WebBook limits the search results by 400 compounds. To check if that happened for your search, you need to check the `lost` property:" ] }, { "cell_type": "code", "execution_count": 6, "id": "c36b7e24-0d50-42fe-b7a9-04599a47951a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NistSearch(success=True, num_compounds=400, lost=True)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = nist.NistSearchParameters(no_ion = True, cMS = True)\n", "s = nist.run_search('C6H?O?', 'formula', params)\n", "s" ] }, { "cell_type": "markdown", "id": "aea437a5-a825-4207-9ae3-ee0fac61969f", "metadata": {}, "source": [ "To overcome that when searching for a large number of substances, try to break the chemical formula into subsets:" ] }, { "cell_type": "code", "execution_count": 7, "id": "14e641b8-7360-470c-b12a-cdeee7deb6ae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(170, False), (178, False), (80, False), (42, False), (7, False), (24, False)]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sub_searches = []\n", "for i in range(1, 7):\n", " s = nist.run_search(f'C6H?O{i}', 'formula', params)\n", " sub_searches.append( (len(s.compound_ids), s.lost) )\n", "sub_searches" ] }, { "cell_type": "markdown", "id": "40506388-ba9f-412a-99fc-d24360817966", "metadata": {}, "source": [ "The better way is to overcome this problem is to use the pre-prepared compound list. For more details see the `Structure Search` page of the CookBook." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }