with switch for passing through or dropping not bridged columns

IndEcol · May 3, 2024 · 1fad0b2 · 1fad0b2
1 parent 6b5649e
commit 1fad0b2
Show file tree

Hide file tree

Showing 8 changed files with 450 additions and 50 deletions.
diff --git a/doc/source/notebooks/convert.ipynb b/doc/source/notebooks/convert.ipynb
@@ -0,0 +1,61 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "60fd7d0e-a46e-4db0-8bbe-00256058ee71",
+   "metadata": {},
+   "source": [
+    "# Convert and Characterize"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "850708b0-66c3-4ca8-a50b-f7396ec4c1a7",
+   "metadata": {},
+   "source": [
+    "Pymrio contains several possibilities to convert data from one system to another."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bde3cf89-6c36-47dd-b9d5-48433f4473b5",
+   "metadata": {},
+   "source": [
+    "The term *convert* is meant very general here, it contains \n",
+    "    - finding and extracting data based on indicies across a table or an mrio(-extension) system based on name and potentially constrained by sector/region or any other specification\n",
+    "    - converting the names of the found indicies\n",
+    "    - adjusting the numerical values of the data, e.g. for unit conversion or characterisation\n",
+    "    - aggregating the extracted data, e.g. for the purpose of characterization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "74d2a195-5e5f-4798-9aa6-4136a4b84342",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/source/notebooks/convert.py b/doc/source/notebooks/convert.py
@@ -0,0 +1,66 @@
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.15.2
+#   kernelspec:
+#     display_name: Python 3 (ipykernel)
+#     language: python
+#     name: python3
+# ---
+
+# %% [markdown]
+# # Convert and Characterize
+
+# %% [markdown]
+# Pymrio contains several possibilities to convert data from one system to another.
+
+# %% [markdown]
+# The term *convert* is meant very general here, it contains 
+#     - finding and extracting data based on indicies across a table or an mrio(-extension) system based on name and potentially constrained by sector/region or any other specification
+#     - converting the names of the found indicies
+#     - adjusting the numerical values of the data, e.g. for unit conversion or characterisation
+#     - aggregating the extracted data, e.g. for the purpose of characterization
+
+# %% [markdown]
+# Pymrio allows these convert function either on one specific table (which not necessaryly has to be a table of the mrio system) or on the whole mrio(-extension) system.
+
+# %% [markdown]
+# ## Structure of the bridge table
+
+
+# %% [markdown]
+# Irrespectively of the table or the mrio system, the convert function always follows the same pattern. 
+# It requires a bridge table, which contains the mapping of the indicies of the source data to the indicies of the target data.
+# This bridge table has to follow a specific format, depending on the table to be converted.
+
+
+# %% [markdown]
+# Lets assume a table with the following structure (the table to be converted):
+
+# %% [markdown]
+# TODO: table from the test cases
+
+# %% [markdown]
+# A potential bridge table for this table could look like this:
+
+# %% [markdown]
+# TODO: table from the test cases
+
+# %% [markdown]
+# Describe the column names, and which entries can be regular expressions
+
+# %% [markdown]
+# Once everything is set up, we can continue with the actual conversion.
+
+# %% [markdown]
+# ## Converting a single data table
+
+
+# %% [markdown]
+# ## Converting a pymrio extension
+
+
diff --git a/doc/source/notebooks/extract_data.ipynb b/doc/source/notebooks/extract_data.ipynb
@@ -901,8 +901,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "rows_to_extract =[('emission_type1',   'air'),\n",
-    "                  ('emission_type2', 'water')]"
+    "rows_to_extract = [(\"emission_type1\", \"air\"), (\"emission_type2\", \"water\")]"
    ]
   },
   {
@@ -1001,14 +1000,17 @@
    "id": "68f6f3e8",
    "metadata": {},
    "source": [
-    "Extracting to dataframes is also a convienient way to convert an extension object to a dictionary:"
+    "Extracting to dataframes is also a convienient\n",
+    "way to convert an extension object to a dictionary:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 47,
    "id": "b23d7415",
-   "metadata": {},
+   "metadata": {
+    "lines_to_next_cell": 2
+   },
    "outputs": [
     {
      "data": {
@@ -1023,18 +1025,204 @@
    ],
    "source": [
     "df_all = mrio.emissions.extract(mrio.emissions.get_rows(), return_type=\"dfs\")\n",
-    "df_all.keys()"
+    "df_all.keys()\n",
+    "\n",
+    "\n",
+    "# The method also allows to only extract some of the accounts:\n",
+    "df_some = mrio.emissions.extract(\n",
+    "    mrio.emissions.get_rows(), dataframes=[\"D_cba\", \"D_pba\"], return_type=\"dfs\"\n",
+    ")\n",
+    "df_some.keys()"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "4357fd67",
+   "metadata": {},
+   "source": [
+    "### Extracting from all extensions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d49af58b",
+   "metadata": {},
+   "source": [
+    "We can also extract data from all extensions at once.\n",
+    "This is done using the `extension_extract` method from the pymrio object.\n",
+    "This expect a dict with keys based on the extension names and values as a list of rows (index) to extract."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db8053ac",
+   "metadata": {},
+   "source": [
+    "Lets assume we want to extract value added and all emissions.\n",
+    "We first define the rows (index) to extract:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd730723",
    "metadata": {
     "lines_to_next_cell": 2
    },
+   "outputs": [],
+   "source": [
+    "to_extract = {\n",
+    "    \"Factor Inputs\": \"Value Added\",\n",
+    "    \"Emissions\": [(\"emission_type1\", \"air\"), (\"emission_type2\", \"water\")],\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0882d1dc",
+   "metadata": {},
+   "source": [
+    "And can then use the `extension_extract` method to extract the data, either as a pandas DataFrame,\n",
+    "which returns a dictionary with the extension names as keys"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dbfe113a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_extract_all = mrio.extension_extract(to_extract, return_type=\"dataframe\")\n",
+    "df_extract_all.keys()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "47393c06",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_extract_all[\"Factor Inputs\"].keys()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5fc1452",
+   "metadata": {},
+   "source": [
+    "We can also extract into a dictionary of extension objects:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b195ef6f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ext_extract_all = mrio.extension_extract(to_extract, return_type=\"extensions\")\n",
+    "ext_extract_all.keys()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee908ec0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "str(ext_extract_all[\"Factor Inputs\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3eed3a5",
+   "metadata": {},
+   "source": [
+    "Or merge the extracted data into a new pymrio Extension object (when passing a new name as return_type):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3690981",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ext_new = mrio.extension_extract(to_extract, return_type=\"new_merged_extension\")\n",
+    "str(ext_new)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "417f397d",
+   "metadata": {},
+   "source": [
+    "CONT: Continue with explaining, mention the work with find_all etc"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2887bce",
+   "metadata": {},
+   "source": [
+    "### Search and extract"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5beffce",
+   "metadata": {},
+   "source": [
+    "The extract methods can also be used in combination with the [search/explore](./explore.ipynb) methods of pymrio.\n",
+    "This allows to search for specific rows and then extract the data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04144a4b",
+   "metadata": {},
+   "source": [
+    "For example, to extract all emissions from the air compartment we can use:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "87303c51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "match_air = mrio.extension_match(find_all=\"air\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e50edc2f",
+   "metadata": {},
+   "source": [
+    "And then make a new extension object with the extracted data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2cac8d8a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "air_emissions = mrio.emissions.extract(match_air, return_type=\"extracted_air_emissions\")\n",
+    "print(air_emissions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b1fef8b",
+   "metadata": {},
    "source": [
-    "CONT: DESRIBE STUFF ABOVE\n",
-    "For example, to extract the total value added for all regions and sectors we can use:"
+    "For more information on the search methods see the [explore notebook](./explore.ipynb)."
    ]
   }
  ],
@@ -1054,7 +1242,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.12.0"
   }
  },
  "nbformat": 4,