{ "cells": [ { "cell_type": "markdown", "id": "028c89f1", "metadata": {}, "source": [ "## ILThermoPy CookBook" ] }, { "cell_type": "markdown", "id": "8ed61a27", "metadata": {}, "source": [ "ILThermo 2.0 is the biggest curated database, containing a wide varieties of experimental data on ionic liquids. This cookbook shows how this database can be accessed in automatic mode via the ILThermoPy package." ] }, { "cell_type": "markdown", "id": "7f0c95a7", "metadata": {}, "source": [ "### Basic search" ] }, { "cell_type": "markdown", "id": "76be0d30", "metadata": {}, "source": [ "Functionality of ILThermo 2.0 allows one to search experimental data on ionic liquids by the name/CAS RN/chemical formula of compounds, number of components, measured property, and parameters of the source article (for more details see the description of the `ilt.Search` function):" ] }, { "cell_type": "code", "execution_count": 1, "id": "07feac03", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function Search in module ilthermopy.search:\n", "\n", "Search(compound: Optional[str] = None, n_compounds: Literal[None, 1, 2, 3] = None, prop: Optional[str] = None, prop_key: Optional[str] = None, year: Optional[int] = None, author: Optional[str] = None, keywords: Optional[str] = None) -> pandas.core.frame.DataFrame\n", " Runs ILThermo search and returns results as a dataframe\n", " \n", " Arguments:\n", " compound: chemical formula, CAS registry number, or name (part or full)\n", " n_compounds: number of mixture compounds\n", " prop: name of physico-chemical property, only used if prop_key is not specified\n", " prop_key: key of physico-chemical property (view available via GetPropertyList)\n", " year: publication year\n", " author: author's last name\n", " keywords: keywords presumably specified in paper's title\n", " \n", " Returns:\n", " dataframe containing main info on found entries\n", "\n" ] } ], "source": [ "import ilthermopy as ilt\n", "\n", "help(ilt.Search)" ] }, { "cell_type": "code", "execution_count": 2, "id": "98a87aa8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idreferencepropertyphasesnum_phasesnum_componentsnum_data_pointscmp1cmp1_idcmp1_smilescmp2cmp2_idcmp2_smilescmp3cmp3_idcmp3_smiles
0vffQKKabo et al. (2004)Heat capacity at constant pressureLiquid1115281-butyl-3-methylimidazolium hexafluorophosphateABChJvCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
1aNLkcKabo et al. (2004)Heat capacity at vapor saturation pressureCrystal;Gas211371-butyl-3-methylimidazolium hexafluorophosphateABChJvCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
2bNHLGKabo et al. (2004)Heat capacity at vapor saturation pressureGlass;Gas211111-butyl-3-methylimidazolium hexafluorophosphateABChJvCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
3HzjQFSun et al. (2004)Heat capacity at constant pressureCrystal111114,6-dimethyl-N-phenylpyrimidin-2-amine dodecan...ABYrQWCCCCCCCCCCCC(=O)[O-].Cc1cc(C)nc(Nc2ccccc2)[nH+]1NoneNoneNoneNoneNoneNone
4sBpKfKabo et al. (2004)Heat capacity at vapor saturation pressureLiquid;Gas21421-butyl-3-methylimidazolium hexafluorophosphateABChJvCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
...................................................
272CymfLZhao et al. (2004b)ViscosityLiquid1111-(4-cyanobutyl)-3-methylimidazolium tetrafluo...AAwOgSCn1cc[n+](CCCCC#N)c1.F[B-](F)(F)FNoneNoneNoneNoneNoneNone
273onFYhZhao et al. (2004b)DensityLiquid1111-butyl-3-methylimidazolium hexafluorophosphateABChJvCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
274NaZFEZhao et al. (2004b)ViscosityLiquid1111-butyl-3-methylimidazolium hexafluorophosphateABChJvCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
275YbIcoZhao et al. (2004b)DensityLiquid1111-butyl-3-methylimidazolium tetrafluoroborateAArYBFCCCC[n+]1ccn(C)c1.F[B-](F)(F)FNoneNoneNoneNoneNoneNone
276HwTsDZhao et al. (2004b)Equilibrium temperatureCrystal;Liquid2113-(3-cyanopropyl)-1-methylimidazolium chlorideAAjklsCCCC[n+]1cccc(C)c1.[Cl-]NoneNoneNoneNoneNoneNone
\n", "

277 rows × 16 columns

\n", "
" ], "text/plain": [ " id reference property \\\n", "0 vffQK Kabo et al. (2004) Heat capacity at constant pressure \n", "1 aNLkc Kabo et al. (2004) Heat capacity at vapor saturation pressure \n", "2 bNHLG Kabo et al. (2004) Heat capacity at vapor saturation pressure \n", "3 HzjQF Sun et al. (2004) Heat capacity at constant pressure \n", "4 sBpKf Kabo et al. (2004) Heat capacity at vapor saturation pressure \n", ".. ... ... ... \n", "272 CymfL Zhao et al. (2004b) Viscosity \n", "273 onFYh Zhao et al. (2004b) Density \n", "274 NaZFE Zhao et al. (2004b) Viscosity \n", "275 YbIco Zhao et al. (2004b) Density \n", "276 HwTsD Zhao et al. (2004b) Equilibrium temperature \n", "\n", " phases num_phases num_components num_data_points \\\n", "0 Liquid 1 1 1528 \n", "1 Crystal;Gas 2 1 137 \n", "2 Glass;Gas 2 1 111 \n", "3 Crystal 1 1 111 \n", "4 Liquid;Gas 2 1 42 \n", ".. ... ... ... ... \n", "272 Liquid 1 1 1 \n", "273 Liquid 1 1 1 \n", "274 Liquid 1 1 1 \n", "275 Liquid 1 1 1 \n", "276 Crystal;Liquid 2 1 1 \n", "\n", " cmp1 cmp1_id \\\n", "0 1-butyl-3-methylimidazolium hexafluorophosphate ABChJv \n", "1 1-butyl-3-methylimidazolium hexafluorophosphate ABChJv \n", "2 1-butyl-3-methylimidazolium hexafluorophosphate ABChJv \n", "3 4,6-dimethyl-N-phenylpyrimidin-2-amine dodecan... ABYrQW \n", "4 1-butyl-3-methylimidazolium hexafluorophosphate ABChJv \n", ".. ... ... \n", "272 1-(4-cyanobutyl)-3-methylimidazolium tetrafluo... AAwOgS \n", "273 1-butyl-3-methylimidazolium hexafluorophosphate ABChJv \n", "274 1-butyl-3-methylimidazolium hexafluorophosphate ABChJv \n", "275 1-butyl-3-methylimidazolium tetrafluoroborate AArYBF \n", "276 3-(3-cyanopropyl)-1-methylimidazolium chloride AAjkls \n", "\n", " cmp1_smiles cmp2 cmp2_id \\\n", "0 CCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None None \n", "1 CCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None None \n", "2 CCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None None \n", "3 CCCCCCCCCCCC(=O)[O-].Cc1cc(C)nc(Nc2ccccc2)[nH+]1 None None \n", "4 CCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None None \n", ".. ... ... ... \n", "272 Cn1cc[n+](CCCCC#N)c1.F[B-](F)(F)F None None \n", "273 CCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None None \n", "274 CCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None None \n", "275 CCCC[n+]1ccn(C)c1.F[B-](F)(F)F None None \n", "276 CCCC[n+]1cccc(C)c1.[Cl-] None None \n", "\n", " cmp2_smiles cmp3 cmp3_id cmp3_smiles \n", "0 None None None None \n", "1 None None None None \n", "2 None None None None \n", "3 None None None None \n", "4 None None None None \n", ".. ... ... ... ... \n", "272 None None None None \n", "273 None None None None \n", "274 None None None None \n", "275 None None None None \n", "276 None None None None \n", "\n", "[277 rows x 16 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# search individual compounds published in 2004\n", "df = ilt.Search(n_compounds = 1, year = 2004)\n", "df" ] }, { "cell_type": "markdown", "id": "4ff9b20e", "metadata": {}, "source": [ "Output dataframe contains main information for each found entry, namely:\n", "\n", "- **id**: ILThermo entry ID, can be used to retrieve data via `ilt.GetEntry` function;\n", "\n", "- **reference**: short notation of the source paper;\n", "\n", "- **property**: measured property;\n", "\n", "- **phases**: semicolon-separated list of the system's phases;\n", "\n", "- **num_phases**, **num_components**, **num_data_points**: number of system's components and phases, and number of measured data points;\n", "- **cmp**, **cmp_id**, **cmp_smiles**: compound name, ILThermo compound ID, and SMILES (all enumerated for compounds #1 - #3)." ] }, { "cell_type": "markdown", "id": "0906b42c", "metadata": {}, "source": [ "### Properties" ] }, { "cell_type": "markdown", "id": "2c293b87", "metadata": {}, "source": [ "To search by property, one need either property name or ILThermo property ID. Since they are internal parameters of the ILThermo 2.0 web interface, they must be obtained via the `ilt.ShowPropertyList` function:" ] }, { "cell_type": "code", "execution_count": 3, "id": "a45524a6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "# Activity, fugacity, and osmotic properties\n", "BPpY: Activity\n", "VjHv: Osmotic coefficient\n", "\n", "# Composition at phase equilibrium\n", "dNip: Composition at phase equilibrium\n", "MbEq: Eutectic composition\n", "lIUh: Henry's Law constant\n", "eCTp: Ostwald coefficient\n", "neae: Tieline\n", "WbZo: Upper consolute composition\n", "\n", "# Critical properties\n", "BPNz: Critical pressure\n", "rDNz: Critical temperature\n", "qpSz: Lower consolute temperature\n", "MvMG: Upper consolute pressure\n", "bRXE: Upper consolute temperature\n", "\n", "# Excess, partial, and apparent energetic properties\n", "cpbY: Apparent enthalpy\n", "teHk: Apparent molar heat capacity\n", "rTYh: Enthalpy of dilution\n", "aeiA: Enthalpy of mixing of a binary solvent with component\n", "VTiT: Enthalpy of solution\n", "brzp: Excess enthalpy\n", "Sqxi: Partial molar enthalpy\n", "mFmK: Partial molar heat capacity\n", "\n", "# Heat capacity and derived properties\n", "tnYd: Enthalpy\n", "kthO: Enthalpy function {H(T)-H(0)}/T\n", "qdUt: Entropy\n", "IZSt: Heat capacity at constant pressure\n", "KvgF: Heat capacity at constant volume\n", "zJIE: Heat capacity at vapor saturation pressure\n", "\n", "# Phase transition properties\n", "CXUw: Enthalpy of transition or fusion\n", "iaOF: Enthalpy of vaporization or sublimation\n", "SwyC: Equilibrium pressure\n", "ghKa: Equilibrium temperature\n", "lnrs: Eutectic temperature\n", "LUaF: Monotectic temperature\n", "NmYB: Normal melting temperature\n", "\n", "# Refraction, surface tension, and speed of sound\n", "YQDr: Interfacial tension\n", "bNnk: Refractive index\n", "imdq: Relative permittivity\n", "NlQd: Speed of sound\n", "ETUw: Surface tension liquid-gas\n", "\n", "# Transport properties\n", "HooV: Binary diffusion coefficient\n", "Ylwl: Electrical conductivity\n", "jjnq: Self diffusion coefficient\n", "pAFI: Thermal conductivity\n", "KTcm: Thermal diffusivity\n", "vBeU: Tracer diffusion coefficient\n", "PusA: Viscosity\n", "\n", "# Vapor pressure, boiling temperature, and azeotropic T & P\n", "hkog: Normal boiling temperature\n", "HwfJ: Vapor or sublimation pressure\n", "\n", "# Volumetric properties\n", "WxCH: Adiabatic compressibility\n", "zNjL: Apparent molar volume\n", "JkYu: Density\n", "psRu: Excess volume\n", "hXfd: Isobaric coefficient of volume expansion\n", "Bvon: Isothermal compressibility\n", "LNxL: Partial molar volume\n", "\n", "\n" ] } ], "source": [ "ilt.ShowPropertyList()" ] }, { "cell_type": "markdown", "id": "861ea5ab", "metadata": {}, "source": [ "If your old code raises `ValueError` during the property search, this indicates that the property ID and/or property name have changed in ILThermo 2.0 after update(s). In this case, simply correct the value to the actual one." ] }, { "cell_type": "markdown", "id": "429cf440-b051-4570-93b9-20f1d992b931", "metadata": {}, "source": [ "Another way around this issue is to use `ilt.PropertyList` object:" ] }, { "cell_type": "code", "execution_count": 4, "id": "df2fb84d-93a5-4231-bea3-ffaca2c1d872", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on PropertyList in module ilthermopy.data_structs object:\n", "\n", "class PropertyList(builtins.object)\n", " | Contains info on available physico-chemical properties and their API keys\n", " | \n", " | Attributes:\n", " | properties (dict): two-level organized dictionary, interconnecting property\n", " | types, properties, and their API keys\n", " | key2prop (dict): maps API keys to property names\n", " | prop2key (dict): maps property names to their API keys\n", " | \n", " | Methods defined here:\n", " | \n", " | Show(self) -> None\n", " | Prints list of properties available in ILThermo 2.0 database\n", " | formatted as api_key: property_name\n", " | \n", " | __init__(self)\n", " | Initialize self. See help(type(self)) for accurate signature.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " | \n", " | __dict__\n", " | dictionary for instance variables (if defined)\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", "\n" ] } ], "source": [ "plist = ilt.PropertyList()\n", "help(plist)" ] }, { "cell_type": "code", "execution_count": 5, "id": "7e719c37-5cfa-4d0c-8644-fc1a59351d04", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BPpY\n" ] } ], "source": [ "prop_name = 'Activity'\n", "prop_key = plist.prop2key.get(prop_name, None)\n", "print(prop_key)" ] }, { "cell_type": "markdown", "id": "b9756ec9-ec97-4292-8182-43000bf15672", "metadata": {}, "source": [ "However, in most cases you do not need this functionality since the `ilt.Search` function supports the `prop` argument." ] }, { "cell_type": "markdown", "id": "2c1a3612", "metadata": {}, "source": [ "### Retrieving data" ] }, { "cell_type": "markdown", "id": "8be8c167", "metadata": {}, "source": [ "To load data on the found entries use the `ilt.GetEntry` function, which takes entry ID as input:" ] }, { "cell_type": "code", "execution_count": 6, "id": "2ee7cb2d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Entry(id='srPOo', ref=Reference(full='Rebelo, L. P. N.; Najdanovic-Visak, V.; Visak, Z. P.; Nunes da Ponte, M.; Szydlowski, J.; Cerdeirina, C. A.; Troncoso, J.; Romani, L.; Esperanca, J. M. S. S.; Guedes, H. J. R.; de Sousa, H. C. (2004) Green Chem. 6(8), 369-381.'), property='Excess volume', property_type='Volumetric properties', phases=['Liquid'], components=[Compound(id='AADYJk', name='water', smiles='O'), Compound(id='AArYBF', name='1-butyl-3-methylimidazolium tetrafluoroborate', smiles='CCCC[n+]1ccn(C)c1.F[B-](F)(F)F')], num_data_points=185)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# random search\n", "df = ilt.Search(n_compounds = 2, year = 2004)\n", "# downloading first 10 entries\n", "data = [ilt.GetEntry(idx) for idx in df.id.iloc[:10]]\n", "# get first entry\n", "entry = data[0]\n", "entry" ] }, { "cell_type": "markdown", "id": "8162f47b", "metadata": {}, "source": [ "Entry object contains detailed information on the data entry, including:\n", "\n", "- **id**: data entry ID;\n", "- **ref**: reference to the source article;\n", "- **property**, **property_type**: measured property and its type;\n", "- **phases**: list of system's phases;\n", "- **components**: list of system's components;\n", "- **num_phases**, **num_components**, **num_data_points**: number of system's components and phases, and number of measured data points; \n", "- **expmeth**: experimental method used to obtain the physchemical data;\n", "- **solvent**: solvent used in the experiment;\n", "- **constraints**: list of experimental constraints;\n", "- **data**: dataframe containing measured data;\n", "- **header**: full column names to the **data**, containing info on the measured property, measurement units, component, and phase;\n", "- **footnotes**: notes to the **data**;\n", "- **response**: original response." ] }, { "cell_type": "markdown", "id": "a9baf06f", "metadata": {}, "source": [ "Let's illustrate the main attributes. Reference contains reference itself and the article's title:" ] }, { "cell_type": "code", "execution_count": 7, "id": "5be53c65", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('Rebelo, L. P. N.; Najdanovic-Visak, V.; Visak, Z. P.; Nunes da Ponte, M.; Szydlowski, J.; Cerdeirina, C. A.; Troncoso, J.; Romani, L.; Esperanca, J. M. S. S.; Guedes, H. J. R.; de Sousa, H. C. (2004) Green Chem. 6(8), 369-381.',\n", " 'A detailed thermodynamic analysis of [C4mim][BF4] + water as a case study to model ionic liquid aqueous solutions')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "entry.ref.full, entry.ref.title" ] }, { "cell_type": "markdown", "id": "4ace6e5e", "metadata": {}, "source": [ "Each component is a Compound object and contains the following fields:\n", "\n", "- **id**: compound id;\n", "- **name**: compound name;\n", "- **formula**: chemical formula;\n", "- **smiles**: compound SMILES;\n", "- **smiles_error**: if compounds SMILES was not retrieved, this field describes the reason;\n", "- **sample**: dictionary containing info on compound's source, purity, etc.;\n", "- **mw**: molar weight of the compound, g/mol." ] }, { "cell_type": "code", "execution_count": 8, "id": "74f539ac", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('AADYJk',\n", " 'water',\n", " 'H2 O',\n", " 'O',\n", " None,\n", " {'Source': 'commercial source',\n", " 'Purification': 'estimated by the compiler',\n", " 'Purity': '99.8 mass %(fractional distillation)'},\n", " 18.02)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cmp1 = entry.components[0]\n", "cmp1.id, cmp1.name, cmp1.formula, cmp1.smiles, cmp1.smiles_error, cmp1.sample, cmp1.mw" ] }, { "cell_type": "markdown", "id": "e9d089a6", "metadata": {}, "source": [ "**data** field contains dataframe with measured physchemical data. Its columns has short names V1, V2, V3, etc. for all variables. If for some variable the measurement error was provided, the corresponding column will be `d` concatanated to the column name of the original value, e.g. `V1` and `dV1`:" ] }, { "cell_type": "code", "execution_count": 9, "id": "3c8aac5f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
V1V2V3V4dV4
0100.00.0040278.15-3.600000e-081.000000e-08
1100.00.0040283.15-3.350000e-081.000000e-08
2100.00.0040288.15-3.070000e-081.000000e-08
3100.00.0040293.15-2.830000e-081.000000e-08
4100.00.0040298.15-2.580000e-081.000000e-08
..................
18060000.00.5905298.154.620000e-071.100000e-08
18160000.00.5905303.154.550000e-071.100000e-08
18260000.00.5905313.154.510000e-071.100000e-08
18360000.00.5905323.155.420000e-071.200000e-08
18460000.00.5905333.155.830000e-071.200000e-08
\n", "

185 rows × 5 columns

\n", "
" ], "text/plain": [ " V1 V2 V3 V4 dV4\n", "0 100.0 0.0040 278.15 -3.600000e-08 1.000000e-08\n", "1 100.0 0.0040 283.15 -3.350000e-08 1.000000e-08\n", "2 100.0 0.0040 288.15 -3.070000e-08 1.000000e-08\n", "3 100.0 0.0040 293.15 -2.830000e-08 1.000000e-08\n", "4 100.0 0.0040 298.15 -2.580000e-08 1.000000e-08\n", ".. ... ... ... ... ...\n", "180 60000.0 0.5905 298.15 4.620000e-07 1.100000e-08\n", "181 60000.0 0.5905 303.15 4.550000e-07 1.100000e-08\n", "182 60000.0 0.5905 313.15 4.510000e-07 1.100000e-08\n", "183 60000.0 0.5905 323.15 5.420000e-07 1.200000e-08\n", "184 60000.0 0.5905 333.15 5.830000e-07 1.200000e-08\n", "\n", "[185 rows x 5 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "entry.data" ] }, { "cell_type": "markdown", "id": "a6b8edd0", "metadata": {}, "source": [ "**header** field contains full names of the corresponding columns, including measured property, its measurement unit, and optionally compound and phase:" ] }, { "cell_type": "code", "execution_count": 10, "id": "82ea9814", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'V1': 'Pressure, kPa',\n", " 'V2': 'Mole fraction of 1-butyl-3-methylimidazolium tetrafluoroborate => Liquid',\n", " 'V3': 'Temperature, K',\n", " 'V4': 'Excess volume, m3/mol => Liquid',\n", " 'dV4': 'Error of excess volume, m3/mol => Liquid'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "entry.header" ] }, { "cell_type": "markdown", "id": "903dde67", "metadata": {}, "source": [ "Combining data entries formatted in this way is possible, albeit difficult. However, that is a problem of a particular database, and such a task is beyound the scope of this API." ] }, { "cell_type": "markdown", "id": "7debecad", "metadata": {}, "source": [ "### Substructure search" ] }, { "cell_type": "markdown", "id": "86843285", "metadata": {}, "source": [ "#### Update status" ] }, { "cell_type": "markdown", "id": "5cd3aa75", "metadata": {}, "source": [ "Structural information of the ILThermo compounds is a crucial part of `ilthermopy`. This information is stored in a table format, linking together ILThermo compound ID, compound name, and verified SMILES string. `ilthermopy` usually uses compound IDs to retrieve SMILES, however each update of the ILThermo 2.0 database changes all compound IDs. In this case `ilthermopy` still can retrieve SMILES using compound name, however, that is not a full-proof way and some SMILES can be missing.\n", "\n", "Therefore, it is a good idea to check if `ilthermopy` is up-to-date before you start exploring ILThermo database:" ] }, { "cell_type": "code", "execution_count": 11, "id": "90eeeedb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ILThermo 2.0 database was last updated on June 04, 2024\n", "ilthermopy package was last updated on May 03, 2025\n", "\n", "ilthermopy package is up-to-date\n" ] } ], "source": [ "ilt.CheckLastUpdate()" ] }, { "cell_type": "markdown", "id": "df0c632b", "metadata": {}, "source": [ "#### Substructure search" ] }, { "cell_type": "markdown", "id": "a0afe696", "metadata": {}, "source": [ "The easiest way to search by substructure is to filter all available compounds, and than filter the search output by compound IDs or compound names (if ilthermopy is not up-to-date). Imagine that we want to get all entries containing guanidinium cation. In this case we start with loading preliminary info on all abailable ILThermo entries:" ] }, { "cell_type": "code", "execution_count": 12, "id": "7192e320", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idreferencepropertyphasesnum_phasesnum_componentsnum_data_pointscmp1cmp1_idcmp1_smilescmp2cmp2_idcmp2_smilescmp3cmp3_idcmp3_smiles
0vffQKKabo et al. (2004)Heat capacity at constant pressureLiquid1115281-butyl-3-methylimidazolium hexafluorophosphateABChJvCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
1cFdQSStrechan et al. (2008a)Heat capacity at vapor saturation pressureCrystal 2;Gas216131-butyl-3-methylimidazolium trifluoroacetateAAwaEiCCCC[n+]1ccn(C)c1.O=C([O-])C(F)(F)FNoneNoneNoneNoneNoneNone
2eOKLZSafarov et al. (2021b)ViscosityLiquid115001-ethyl-3-methylimidazolium dicyanamideAAiEIECC[n+]1ccn(C)c1.N#C[N-]C#NNoneNoneNoneNoneNoneNone
3UkHsxPolikhronidi et al. (2014)Heat capacity at constant pressureLiquid114221-hexyl-3-methylimidazolium bis[(trifluorometh...ABiCtACCCCCC[n+]1ccn(C)c1.O=S(=O)([N-]S(=O)(=O)C(F)(...NoneNoneNoneNoneNoneNone
4BvazUSafarov et al. (2018c)ViscosityLiquid113941-octyl-3-methylimidazolium hexafluorophosphateABNWoRCCCCCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)FNoneNoneNoneNoneNoneNone
...................................................
54204kxodnZhao et al. (2010a)Composition at phase equilibriumLiquid;Gas231carbon dioxideAAIYzlO=C=OwaterAADYJkO2-hydroxy-N-(2-hydroxyethyl)-N-methylethanamin...AAdwMTC[NH+](CCO)CCO.[Cl-]
54205WDULWZhao et al. (2010a)Composition at phase equilibriumLiquid;Gas231carbon dioxideAAIYzlO=C=OwaterAADYJkO2-hydroxy-N-(2-hydroxyethyl)-N-methylethanamin...AAnpCDC[NH+](CCO)CCO.F[B-](F)(F)F
54206trauRZhao et al. (2010a)Composition at phase equilibriumLiquid;Gas231carbon dioxideAAIYzlO=C=OwaterAADYJkO2-hydroxyethanaminium tetrafluoroborateAAcgPCF[B-](F)(F)F.[NH3+]CCO
54207YtOdrZhao et al. (2010a)Composition at phase equilibriumLiquid;Gas231carbon dioxideAAIYzlO=C=OwaterAADYJkO1-butyl-3-methylimidazolium tetrafluoroborateAArYBFCCCC[n+]1ccn(C)c1.F[B-](F)(F)F
54208xKyGCZhao et al. (2010a)Composition at phase equilibriumLiquid;Gas231triethanolamineAAcjhvOCCN(CCO)CCOcarbon dioxideAAIYzlO=C=O1-butyl-3-methylimidazolium tetrafluoroborateAArYBFCCCC[n+]1ccn(C)c1.F[B-](F)(F)F
\n", "

54209 rows × 16 columns

\n", "
" ], "text/plain": [ " id reference \\\n", "0 vffQK Kabo et al. (2004) \n", "1 cFdQS Strechan et al. (2008a) \n", "2 eOKLZ Safarov et al. (2021b) \n", "3 UkHsx Polikhronidi et al. (2014) \n", "4 BvazU Safarov et al. (2018c) \n", "... ... ... \n", "54204 kxodn Zhao et al. (2010a) \n", "54205 WDULW Zhao et al. (2010a) \n", "54206 trauR Zhao et al. (2010a) \n", "54207 YtOdr Zhao et al. (2010a) \n", "54208 xKyGC Zhao et al. (2010a) \n", "\n", " property phases num_phases \\\n", "0 Heat capacity at constant pressure Liquid 1 \n", "1 Heat capacity at vapor saturation pressure Crystal 2;Gas 2 \n", "2 Viscosity Liquid 1 \n", "3 Heat capacity at constant pressure Liquid 1 \n", "4 Viscosity Liquid 1 \n", "... ... ... ... \n", "54204 Composition at phase equilibrium Liquid;Gas 2 \n", "54205 Composition at phase equilibrium Liquid;Gas 2 \n", "54206 Composition at phase equilibrium Liquid;Gas 2 \n", "54207 Composition at phase equilibrium Liquid;Gas 2 \n", "54208 Composition at phase equilibrium Liquid;Gas 2 \n", "\n", " num_components num_data_points \\\n", "0 1 1528 \n", "1 1 613 \n", "2 1 500 \n", "3 1 422 \n", "4 1 394 \n", "... ... ... \n", "54204 3 1 \n", "54205 3 1 \n", "54206 3 1 \n", "54207 3 1 \n", "54208 3 1 \n", "\n", " cmp1 cmp1_id \\\n", "0 1-butyl-3-methylimidazolium hexafluorophosphate ABChJv \n", "1 1-butyl-3-methylimidazolium trifluoroacetate AAwaEi \n", "2 1-ethyl-3-methylimidazolium dicyanamide AAiEIE \n", "3 1-hexyl-3-methylimidazolium bis[(trifluorometh... ABiCtA \n", "4 1-octyl-3-methylimidazolium hexafluorophosphate ABNWoR \n", "... ... ... \n", "54204 carbon dioxide AAIYzl \n", "54205 carbon dioxide AAIYzl \n", "54206 carbon dioxide AAIYzl \n", "54207 carbon dioxide AAIYzl \n", "54208 triethanolamine AAcjhv \n", "\n", " cmp1_smiles cmp2 \\\n", "0 CCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None \n", "1 CCCC[n+]1ccn(C)c1.O=C([O-])C(F)(F)F None \n", "2 CC[n+]1ccn(C)c1.N#C[N-]C#N None \n", "3 CCCCCC[n+]1ccn(C)c1.O=S(=O)([N-]S(=O)(=O)C(F)(... None \n", "4 CCCCCCCC[n+]1ccn(C)c1.F[P-](F)(F)(F)(F)F None \n", "... ... ... \n", "54204 O=C=O water \n", "54205 O=C=O water \n", "54206 O=C=O water \n", "54207 O=C=O water \n", "54208 OCCN(CCO)CCO carbon dioxide \n", "\n", " cmp2_id cmp2_smiles cmp3 \\\n", "0 None None None \n", "1 None None None \n", "2 None None None \n", "3 None None None \n", "4 None None None \n", "... ... ... ... \n", "54204 AADYJk O 2-hydroxy-N-(2-hydroxyethyl)-N-methylethanamin... \n", "54205 AADYJk O 2-hydroxy-N-(2-hydroxyethyl)-N-methylethanamin... \n", "54206 AADYJk O 2-hydroxyethanaminium tetrafluoroborate \n", "54207 AADYJk O 1-butyl-3-methylimidazolium tetrafluoroborate \n", "54208 AAIYzl O=C=O 1-butyl-3-methylimidazolium tetrafluoroborate \n", "\n", " cmp3_id cmp3_smiles \n", "0 None None \n", "1 None None \n", "2 None None \n", "3 None None \n", "4 None None \n", "... ... ... \n", "54204 AAdwMT C[NH+](CCO)CCO.[Cl-] \n", "54205 AAnpCD C[NH+](CCO)CCO.F[B-](F)(F)F \n", "54206 AAcgPC F[B-](F)(F)F.[NH3+]CCO \n", "54207 AArYBF CCCC[n+]1ccn(C)c1.F[B-](F)(F)F \n", "54208 AArYBF CCCC[n+]1ccn(C)c1.F[B-](F)(F)F \n", "\n", "[54209 rows x 16 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = ilt.GetAllEntries()\n", "df" ] }, { "cell_type": "markdown", "id": "971f1b3b", "metadata": {}, "source": [ "Next we get pre-stored list of all ILThermo compounds:" ] }, { "cell_type": "code", "execution_count": 13, "id": "eaf23ad7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamesmiles
0AAAUumhydrogen (normal)[HH]
1AAAomxhelium[He]
2AADEIKmethaneC
3AADOkxammoniaN
4AADYJkwaterO
............
4110AFHNfv1,3-bis(3-((3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,...FC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F...
4111AFSFfi3,3',3''-((propane-1,2,3-triyltris(oxy))tris(m...O=S(=O)([N-]S(=O)(=O)C(F)(F)F)C(F)(F)F.O=S(=O)...
4112AFmxJi3,3',3''-((propane-1,2,3-triyltris(oxy))tris(m...CCCCCCCC[n+]1ccn(COCC(COCn2cc[n+](CCCCCCCC)c2C...
4113AGbVrm1,1'-(ethane-1,2-diyl)bis(3-(3-((3,3,4,4,5,5,6...FC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F...
4114AGwzVJ1,1'-(decane-1,10-diyl)bis(3-(3-((3,3,4,4,5,5,...FC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F...
\n", "

4115 rows × 3 columns

\n", "
" ], "text/plain": [ " id name \\\n", "0 AAAUum hydrogen (normal) \n", "1 AAAomx helium \n", "2 AADEIK methane \n", "3 AADOkx ammonia \n", "4 AADYJk water \n", "... ... ... \n", "4110 AFHNfv 1,3-bis(3-((3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,... \n", "4111 AFSFfi 3,3',3''-((propane-1,2,3-triyltris(oxy))tris(m... \n", "4112 AFmxJi 3,3',3''-((propane-1,2,3-triyltris(oxy))tris(m... \n", "4113 AGbVrm 1,1'-(ethane-1,2-diyl)bis(3-(3-((3,3,4,4,5,5,6... \n", "4114 AGwzVJ 1,1'-(decane-1,10-diyl)bis(3-(3-((3,3,4,4,5,5,... \n", "\n", " smiles \n", "0 [HH] \n", "1 [He] \n", "2 C \n", "3 N \n", "4 O \n", "... ... \n", "4110 FC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F... \n", "4111 O=S(=O)([N-]S(=O)(=O)C(F)(F)F)C(F)(F)F.O=S(=O)... \n", "4112 CCCCCCCC[n+]1ccn(COCC(COCn2cc[n+](CCCCCCCC)c2C... \n", "4113 FC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F... \n", "4114 FC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F... \n", "\n", "[4115 rows x 3 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cmps = ilt.GetSavedCompounds().data\n", "cmps" ] }, { "cell_type": "markdown", "id": "18f6eb6b", "metadata": {}, "source": [ "And filter them with RDKit package using a substucture search:" ] }, { "cell_type": "code", "execution_count": 14, "id": "698399f4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamesmiles
595AAavNvGuanidinium bromideNC(N)=[NH2+].[Br-]
1159AAoLEAguanidinium trifluoromethanesulfonateNC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F
1243AApejXguanidinium sulfateNC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-]
2728ABUxBzguanidinium tetraphenylborateNC(N)=[NH2+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2c...
3923ACTsEpguanidinium bis(bis(perfluoroethyl)phosphoryl)...NC(N)=[NH2+].O=P([N-]P(=O)(C(F)(F)C(F)(F)F)C(F...
\n", "
" ], "text/plain": [ " id name \\\n", "595 AAavNv Guanidinium bromide \n", "1159 AAoLEA guanidinium trifluoromethanesulfonate \n", "1243 AApejX guanidinium sulfate \n", "2728 ABUxBz guanidinium tetraphenylborate \n", "3923 ACTsEp guanidinium bis(bis(perfluoroethyl)phosphoryl)... \n", "\n", " smiles \n", "595 NC(N)=[NH2+].[Br-] \n", "1159 NC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F \n", "1243 NC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-] \n", "2728 NC(N)=[NH2+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2c... \n", "3923 NC(N)=[NH2+].O=P([N-]P(=O)(C(F)(F)C(F)(F)F)C(F... " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from rdkit import Chem\n", "from rdkit import RDLogger\n", "RDLogger.DisableLog('rdApp.*') # hides warnings\n", "\n", "mols = [Chem.MolFromSmiles(smi) for smi in cmps.smiles]\n", "pat = Chem.MolFromSmarts('[NX3;H2]~[CX3](~[NX3;H2])~[NX3;H2]')\n", "cmps = cmps.loc[[m.HasSubstructMatch(pat) for m in mols]]\n", "cmps" ] }, { "cell_type": "markdown", "id": "b8c47fe8", "metadata": {}, "source": [ "Now we can use obtained compound IDs to filter the search data:" ] }, { "cell_type": "code", "execution_count": 15, "id": "6e3b2c96", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idreferencepropertyphasesnum_phasesnum_componentsnum_data_pointscmp1cmp1_idcmp1_smilescmp2cmp2_idcmp2_smilescmp3cmp3_idcmp3_smiles
22465FTOEmKumar (2007)Osmotic coefficientLiquid;Gas2228tetramethylammonium chlorideAAVDUEC[N+](C)(C)C.[Cl-]guanidinium sulfateAApejXNC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-]NoneNoneNone
22503eBeqeSairi et al. (2015)ViscosityLiquid1228waterAADYJkOguanidinium trifluoromethanesulfonateAAoLEANC(N)=[NH2+].O=S(=O)([O-])C(F)(F)FNoneNoneNone
23959KDBmVSairi et al. (2015)DensityLiquid1218waterAADYJkOguanidinium trifluoromethanesulfonateAAoLEANC(N)=[NH2+].O=S(=O)([O-])C(F)(F)FNoneNoneNone
24985vQDiOKumar (2001)DensityLiquid1214waterAADYJkOGuanidinium bromideAAavNvNC(N)=[NH2+].[Br-]NoneNoneNone
24986MNutDKumar (2001)Speed of soundLiquid1214waterAADYJkOGuanidinium bromideAAavNvNC(N)=[NH2+].[Br-]NoneNoneNone
25429jASaUKumar (2001)Speed of soundLiquid1213guanidinium sulfateAApejXNC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-]waterAADYJkONoneNoneNone
25430kKajmKumar (2001)DensityLiquid1213guanidinium sulfateAApejXNC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-]waterAADYJkONoneNoneNone
45701MhEvzOndo and Dohnal (2014)Composition at phase equilibriumLiquid;Crystal of intercomponent compound 1221waterAADYJkOguanidinium tetraphenylborateABUxBzNC(N)=[NH2+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2c...NoneNoneNone
45702BVhqXOndo and Dohnal (2014)Composition at phase equilibriumLiquid;Crystal of pure component 2221waterAADYJkOguanidinium bis(bis(perfluoroethyl)phosphoryl)...ACTsEpNC(N)=[NH2+].O=P([N-]P(=O)(C(F)(F)C(F)(F)F)C(F...NoneNoneNone
45709hRmeZOndo and Dohnal (2014)Apparent enthalpyLiquid121waterAADYJkOguanidinium tetraphenylborateABUxBzNC(N)=[NH2+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2c...NoneNoneNone
45710arrKjOndo and Dohnal (2014)Apparent enthalpyLiquid121waterAADYJkOguanidinium bis(bis(perfluoroethyl)phosphoryl)...ACTsEpNC(N)=[NH2+].O=P([N-]P(=O)(C(F)(F)C(F)(F)F)C(F...NoneNoneNone
47776aWGeFSairi et al. (2015)DensityLiquid1381N-methyldiethanolamineAAWveBCN(CCO)CCOwaterAADYJkOguanidinium trifluoromethanesulfonateAAoLEANC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F
48534woUWvSairi et al. (2015)ViscosityLiquid1348N-methyldiethanolamineAAWveBCN(CCO)CCOwaterAADYJkOguanidinium trifluoromethanesulfonateAAoLEANC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F
50285zoJjSSairi et al. (2011)Composition at phase equilibriumLiquid;Gas2318carbon dioxideAAIYzlO=C=OwaterAADYJkOguanidinium trifluoromethanesulfonateAAoLEANC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F
52324tmQgxSairi et al. (2015)Composition at phase equilibriumLiquid;Gas236carbon dioxideAAIYzlO=C=OwaterAADYJkOguanidinium trifluoromethanesulfonateAAoLEANC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F
\n", "
" ], "text/plain": [ " id reference property \\\n", "22465 FTOEm Kumar (2007) Osmotic coefficient \n", "22503 eBeqe Sairi et al. (2015) Viscosity \n", "23959 KDBmV Sairi et al. (2015) Density \n", "24985 vQDiO Kumar (2001) Density \n", "24986 MNutD Kumar (2001) Speed of sound \n", "25429 jASaU Kumar (2001) Speed of sound \n", "25430 kKajm Kumar (2001) Density \n", "45701 MhEvz Ondo and Dohnal (2014) Composition at phase equilibrium \n", "45702 BVhqX Ondo and Dohnal (2014) Composition at phase equilibrium \n", "45709 hRmeZ Ondo and Dohnal (2014) Apparent enthalpy \n", "45710 arrKj Ondo and Dohnal (2014) Apparent enthalpy \n", "47776 aWGeF Sairi et al. (2015) Density \n", "48534 woUWv Sairi et al. (2015) Viscosity \n", "50285 zoJjS Sairi et al. (2011) Composition at phase equilibrium \n", "52324 tmQgx Sairi et al. (2015) Composition at phase equilibrium \n", "\n", " phases num_phases \\\n", "22465 Liquid;Gas 2 \n", "22503 Liquid 1 \n", "23959 Liquid 1 \n", "24985 Liquid 1 \n", "24986 Liquid 1 \n", "25429 Liquid 1 \n", "25430 Liquid 1 \n", "45701 Liquid;Crystal of intercomponent compound 1 2 \n", "45702 Liquid;Crystal of pure component 2 2 \n", "45709 Liquid 1 \n", "45710 Liquid 1 \n", "47776 Liquid 1 \n", "48534 Liquid 1 \n", "50285 Liquid;Gas 2 \n", "52324 Liquid;Gas 2 \n", "\n", " num_components num_data_points cmp1 cmp1_id \\\n", "22465 2 28 tetramethylammonium chloride AAVDUE \n", "22503 2 28 water AADYJk \n", "23959 2 18 water AADYJk \n", "24985 2 14 water AADYJk \n", "24986 2 14 water AADYJk \n", "25429 2 13 guanidinium sulfate AApejX \n", "25430 2 13 guanidinium sulfate AApejX \n", "45701 2 1 water AADYJk \n", "45702 2 1 water AADYJk \n", "45709 2 1 water AADYJk \n", "45710 2 1 water AADYJk \n", "47776 3 81 N-methyldiethanolamine AAWveB \n", "48534 3 48 N-methyldiethanolamine AAWveB \n", "50285 3 18 carbon dioxide AAIYzl \n", "52324 3 6 carbon dioxide AAIYzl \n", "\n", " cmp1_smiles \\\n", "22465 C[N+](C)(C)C.[Cl-] \n", "22503 O \n", "23959 O \n", "24985 O \n", "24986 O \n", "25429 NC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-] \n", "25430 NC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-] \n", "45701 O \n", "45702 O \n", "45709 O \n", "45710 O \n", "47776 CN(CCO)CCO \n", "48534 CN(CCO)CCO \n", "50285 O=C=O \n", "52324 O=C=O \n", "\n", " cmp2 cmp2_id \\\n", "22465 guanidinium sulfate AApejX \n", "22503 guanidinium trifluoromethanesulfonate AAoLEA \n", "23959 guanidinium trifluoromethanesulfonate AAoLEA \n", "24985 Guanidinium bromide AAavNv \n", "24986 Guanidinium bromide AAavNv \n", "25429 water AADYJk \n", "25430 water AADYJk \n", "45701 guanidinium tetraphenylborate ABUxBz \n", "45702 guanidinium bis(bis(perfluoroethyl)phosphoryl)... ACTsEp \n", "45709 guanidinium tetraphenylborate ABUxBz \n", "45710 guanidinium bis(bis(perfluoroethyl)phosphoryl)... ACTsEp \n", "47776 water AADYJk \n", "48534 water AADYJk \n", "50285 water AADYJk \n", "52324 water AADYJk \n", "\n", " cmp2_smiles \\\n", "22465 NC(N)=[NH2+].NC(N)=[NH2+].O=S(=O)([O-])[O-] \n", "22503 NC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F \n", "23959 NC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F \n", "24985 NC(N)=[NH2+].[Br-] \n", "24986 NC(N)=[NH2+].[Br-] \n", "25429 O \n", "25430 O \n", "45701 NC(N)=[NH2+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2c... \n", "45702 NC(N)=[NH2+].O=P([N-]P(=O)(C(F)(F)C(F)(F)F)C(F... \n", "45709 NC(N)=[NH2+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2c... \n", "45710 NC(N)=[NH2+].O=P([N-]P(=O)(C(F)(F)C(F)(F)F)C(F... \n", "47776 O \n", "48534 O \n", "50285 O \n", "52324 O \n", "\n", " cmp3 cmp3_id \\\n", "22465 None None \n", "22503 None None \n", "23959 None None \n", "24985 None None \n", "24986 None None \n", "25429 None None \n", "25430 None None \n", "45701 None None \n", "45702 None None \n", "45709 None None \n", "45710 None None \n", "47776 guanidinium trifluoromethanesulfonate AAoLEA \n", "48534 guanidinium trifluoromethanesulfonate AAoLEA \n", "50285 guanidinium trifluoromethanesulfonate AAoLEA \n", "52324 guanidinium trifluoromethanesulfonate AAoLEA \n", "\n", " cmp3_smiles \n", "22465 None \n", "22503 None \n", "23959 None \n", "24985 None \n", "24986 None \n", "25429 None \n", "25430 None \n", "45701 None \n", "45702 None \n", "45709 None \n", "45710 None \n", "47776 NC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F \n", "48534 NC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F \n", "50285 NC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F \n", "52324 NC(N)=[NH2+].O=S(=O)([O-])C(F)(F)F " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sub = df.loc[df.cmp1_id.isin(cmps.id) | df.cmp2_id.isin(cmps.id) | df.cmp3_id.isin(cmps.id)]\n", "sub" ] }, { "cell_type": "markdown", "id": "3fbd37f4", "metadata": {}, "source": [ "#### Resonance" ] }, { "cell_type": "markdown", "id": "70324663", "metadata": {}, "source": [ "Sometimes, substructure search will not work as expected unless resonance was considered. This can be demonstrated on non-symmetrical imidazolium cations:" ] }, { "cell_type": "code", "execution_count": 16, "id": "8f656185", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol1 = Chem.MolFromSmiles('C[n+]1cc[n](CC)c1')\n", "mol2 = Chem.MolFromSmiles('C[n]1cc[n+](CC)c1')\n", "mol1" ] }, { "cell_type": "code", "execution_count": 17, "id": "b120a4e7", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol2" ] }, { "cell_type": "code", "execution_count": 18, "id": "057d7a33", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol1.HasSubstructMatch(mol2)" ] }, { "cell_type": "markdown", "id": "e7d146ca", "metadata": {}, "source": [ "Despite the fact that both molecules are equivalent, their graph representation is not, and current SMILES normalization (as implemented in RDKit) does not change that:" ] }, { "cell_type": "code", "execution_count": 19, "id": "a7e380ec", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('CCn1cc[n+](C)c1', 'CC[n+]1ccn(C)c1')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Chem.MolToSmiles(mol1), Chem.MolToSmiles(mol2)" ] }, { "cell_type": "markdown", "id": "c5f3eafb", "metadata": {}, "source": [ "This problem can be adressed in two ways:\n", "\n", "1. choosing SMARTS for the substructure search more accurately, e.g. ignoring atomic charges (see Daylight manuals on SMARTS for some ideas: [1](https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html), [2](https://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html));\n", "\n", "2. considering resonance via `ResonanceMolSupplier` class." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }