{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### The easy way\n", "\n", "#### The Groups class\n", "`ugropy` is relatively straightforward to use, but let's explore what it has to \n", "offer. Now, let's start with the easy methods...\n", "\n", "We'll utilize the Groups class to retrieve the subgroups of all the models \n", "supported by `ugropy`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from ugropy import Groups\n", "\n", "carvone = Groups(\"carvone\")\n", "\n", "carvone.unifac.subgroups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, that was easy... `ugropy` utilizes `PubChemPy` \n", "([link](https://github.com/mcs07/PubChemPy)) to access `PubChem` and \n", "retrieve the SMILES representation of the molecule. `ugropy` then employs the \n", "SMILES representation along with the `rdkit` \n", "([link](https://github.com/rdkit/rdkit)) library to identify the \n", "functional groups of the molecules.\n", "\n", "The complete signature of the Groups class is as follows:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "carvone = Groups(\n", " identifier=\"carvone\",\n", " identifier_type=\"name\",\n", " normal_boiling_temperature=None\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The identifier_type argument (default: \"name\") can be set to \"name\", \"smiles\"\n", "or \"mol\".\n", "\n", "When \"name\" is set, `ugropy` will use the identifier argument to search in\n", "pubchem for the canonical SMILES of the molecule.\n", "\n", "When \"smiles\" is set, `ugropy` uses it directly, this also means that the \n", "library will not suffer the overhead of searching on pubchem. Try it yourself:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "carvone = Groups(\n", " identifier=\"CC1=CCC(CC1=O)C(=C)C\",\n", " identifier_type=\"smiles\",\n", " normal_boiling_temperature=None\n", ")\n", "\n", "carvone.unifac.subgroups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are familiar with the `rdkit` library, you'll know that there are\n", "numerous ways to define a molecule (e.g., SMILES, SMARTS, PDB file, InChIKey,\n", "etc.). `ugropy` supports the provision of a Mol object from the `rdkit`\n", "library." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from rdkit import Chem\n", "\n", "mol_obj = Chem.MolFromInchi(\"InChI=1S/C10H14O/c1-7(2)9-5-4-8(3)10(11)6-9/h4,9H,1,5-6H2,2-3H3\")\n", "\n", "carvone = Groups(\n", " identifier=mol_obj,\n", " identifier_type=\"mol\",\n", " normal_boiling_temperature=None\n", ")\n", "\n", "carvone.unifac.subgroups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The current supported models are the classic liquid-vapor UNIFAC, Predictive\n", "Soave-Redlich-Kwong (PSRK) and Joback. You can access the functional groups\n", "this way:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}\n", "{'CH3': 2, 'CH2': 1, 'CH': 1, 'CH2=C': 1, 'CH=C': 1, 'CH2CO': 1}\n", "{'-CH3': 2, '=CH2': 1, '=C<': 1, 'ring-CH2-': 2, 'ring>CH-': 1, 'ring=CH-': 1, 'ring=C<': 1, '>C=O (ring)': 1}\n" ] } ], "source": [ "carvone = Groups(\"carvone\")\n", "\n", "print(carvone.unifac.subgroups)\n", "\n", "print(carvone.psrk.subgroups)\n", "\n", "print(carvone.joback.subgroups)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may notice that the joback attribute is a different object. That's because\n", "it's a JobackProperties object, which contains all the properties that the\n", "Joback model can estimate. This will be discussed later in the Joback tutorial.\n", "As an example:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "516.47" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "carvone.joback.normal_boiling_point" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, if the normal_boiling_temperature parameter is provided, it is used in\n", "the Joback properties calculations instead of the Joback-estimated normal\n", "boiling temperature (refer to the Joback tutorial)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full documentation of the `Groups` class may be accessed in the API\n", "documentation. Or you can do..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0;31mInit signature:\u001b[0m\n", "\u001b[0mGroups\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0midentifier\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0midentifier_type\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'name'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mnormal_boiling_temperature\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mDocstring:\u001b[0m \n", "Group class.\n", "\n", "Stores the solved FragmentationModels subgroups of a molecule.\n", "\n", "Parameters\n", "----------\n", "identifier : str or rdkit.Chem.rdchem.Mol\n", " Identifier of a molecule (name, SMILES or Chem.rdchem.Mol). Example:\n", " hexane or CCCCCC.\n", "identifier_type : str, optional\n", " Use 'name' to search a molecule by name, 'smiles' to provide the\n", " molecule SMILES representation or 'mol' to provide a\n", " rdkit.Chem.rdchem.Mol object, by default \"name\".\n", "normal_boiling_temperature : float, optional\n", " If provided, will be used to estimate critical temperature, acentric\n", " factor, and vapor pressure instead of the estimated normal boiling\n", " point in the Joback group contribution model, by default None.\n", "\n", "Attributes\n", "----------\n", "identifier : str\n", " Identifier of a molecule. Example: hexane or CCCCCC.\n", "identifier_type : str, optional\n", " Use 'name' to search a molecule by name or 'smiles' to provide the\n", " molecule SMILES representation, by default \"name\".\n", "mol_object : rdkit.Chem.rdchem.Mol\n", " RDKit Mol object.\n", "molecular_weight : float\n", " Molecule's molecular weight from rdkit.Chem.Descriptors.MolWt [g/mol].\n", "unifac : Fragmentation\n", " Classic LV-UNIFAC subgroups.\n", "psrk : Fragmentation\n", " Predictive Soave-Redlich-Kwong subgroups.\n", "joback : JobackProperties\n", " JobackProperties object that contains the Joback subgroups and the\n", " estimated properties of the molecule.\n", "\u001b[0;31mFile:\u001b[0m ~/code/ugropy/ugropy/groups.py\n", "\u001b[0;31mType:\u001b[0m type\n", "\u001b[0;31mSubclasses:\u001b[0m " ] } ], "source": [ "Groups?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, you can visualize the fragmentation result simply doing:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "CH3CH2CHCH2=CCH=CCH2CO" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import SVG\n", "\n", "svg = carvone.unifac.draw(width=600)\n", "\n", "SVG(svg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can save the figure by doing:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "with open(\"figure.svg\", \"w\") as f:\n", " f.write(svg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check the full documentation of the draw funcion:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0;31mSignature:\u001b[0m\n", "\u001b[0mcarvone\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munifac\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdraw\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mtitle\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mwidth\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m400\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mheight\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m200\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mtitle_font_size\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m12\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mlegend_font_size\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mfloat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m12\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mfont\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'Helvetica'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0mUnion\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mList\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mDocstring:\u001b[0m\n", "Create a svg representation of the fragmentation result.\n", "\n", "Parameters\n", "----------\n", "title : str, optional\n", " Graph title, by default \"\"\n", "width : int, optional\n", " Graph width, by default 400\n", "height : int, optional\n", " Graph height, by default 200\n", "title_font_size : int, optional\n", " Font size of graph's title, by default 12\n", "legend_font_size : int, optional\n", " Legend font size, by default 12\n", "font : str, optional\n", " Text font, by default \"Helvetica\"\n", "\n", "Returns\n", "-------\n", "Union[str, List[str]]\n", " SVG of the fragmentation solution/s.\n", "\u001b[0;31mFile:\u001b[0m ~/code/ugropy/ugropy/core/fragmentation_object.py\n", "\u001b[0;31mType:\u001b[0m method" ] } ], "source": [ "carvone.unifac.draw?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### WARNING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the UNIFAC, and PSRK groups the aldehyde group is change to HCO according\n", "to the discussion: https://github.com/ClapeyronThermo/Clapeyron.jl/issues/225\n", "\n", "This is more consistent with the ether groups and formate group." ] } ], "metadata": { "kernelspec": { "display_name": "ugropy", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 2 }