New sample: Create Hyper file from pandas DataFrame using Hyper API (#110)

RSchaper10 · web-flow · commit 75913edcb9c3 · 2024-10-04T08:25:08.000+02:00
After running into issues with pantab, I created a simple example for Pandas
DataFrame-to-Hyper with the Hyper API that works without issue, easy to
understand and implement without adding another dependency. Wanted to share
with others!
diff --git a/Community-Supported/pandas-to-hyper/README.md b/Community-Supported/pandas-to-hyper/README.md
@@ -0,0 +1,77 @@
+
+# hyper-from-dataframe
+## Create a Hyper File from a Pandas DataFrame
+
+![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)
+
+This Python script demonstrates how to create a `.hyper` file (Tableau's Hyper database format) from a pandas DataFrame. It uses Tableau's Hyper API to define a table structure, insert the data from the DataFrame, and save it as a `.hyper` file.
+
+This is example shows an alternative to [using pantab](https://tableau.github.io/hyper-db/docs/guides/pandas_integration#loading-data-through-pandas), in case pantab cannot be used.
+## Get Started
+
+### Prerequisites
+
+Before running the script, ensure you have the following installed:
+
+- Python >= 3.6
+- The required dependencies listed in `requirements.txt`.
+
+### Install Dependencies
+
+To install the necessary dependencies, run the following command:
+
+```bash
+pip install -r requirements.txt
+```
+
+### Running the Script
+
+To run the script and generate the `.hyper` file, execute:
+
+```bash
+python create_hyper_from_pandas_dataframe.py
+```
+
+### What the Script Does
+
+1. Creates a pandas DataFrame containing sample customer data.
+2. Defines a table schema for the Hyper file, including columns like Customer ID, Customer Name, Loyalty Points, and Segment.
+3. Inserts the DataFrame data into the Hyper file `customer.hyper`.
+4. Verifies the number of rows inserted and prints a confirmation message.
+
+### Modifying the Script
+
+You can easily modify the script to load your own data by:
+
+1. Changing the data inside the `data` dictionary to match your own structure.
+2. Adjusting the table schema in the `TableDefinition` object accordingly to reflect your columns.
+
+### Example Output
+
+When you run the script, you should see output similar to this:
+
+```
+EXAMPLE - Load data from pandas DataFrame into table in new Hyper file
+The number of rows in table Customer is 3.
+The connection to the Hyper file has been closed.
+The Hyper process has been shut down.
+```
+
+### Error Handling
+
+If any issues occur, such as problems connecting to the Hyper file or inserting data, the script will raise an exception and print an error message to the console.
+
+## Notes
+
+This sample script demonstrates:
+
+- How to use Tableau's `HyperProcess` and `Connection` classes.
+- Defining table schemas using `TableDefinition`.
+- Inserting data into the Hyper table using the `Inserter` class.
+
+### Resources
+
+- [Tableau Hyper API Documentation](https://tableau.github.io/hyper-db/lang_docs/py/index.html)
+- [Tableau Hyper API SQL Reference](https://tableau.github.io/hyper-db/docs/sql/)
+- [pandas Documentation](https://pandas.pydata.org/docs/)
+
diff --git a/Community-Supported/pandas-to-hyper/create_hyper_from_pandas_dataframe.py b/Community-Supported/pandas-to-hyper/create_hyper_from_pandas_dataframe.py
@@ -0,0 +1,115 @@
+# Import necessary standard libraries
+from pathlib import Path  # For file path manipulations
+
+# Import pandas for DataFrame creation and manipulation
+import pandas as pd
+
+# Import necessary classes from the Tableau Hyper API
+from tableauhyperapi import (
+    HyperProcess,
+    Telemetry,
+    Connection,
+    CreateMode,
+    NOT_NULLABLE,
+    NULLABLE,
+    SqlType,
+    TableDefinition,
+    Inserter,
+    HyperException,
+)
+
+def run_create_hyper_file_from_dataframe():
+    """
+    An example demonstrating loading data from a pandas DataFrame into a new Hyper file.
+    """
+
+    print("EXAMPLE - Load data from pandas DataFrame into table in new Hyper file")
+
+    # Step 1: Create a sample pandas DataFrame.
+    data = {
+        "Customer ID": ["DK-13375", "EB-13705", "JH-13600"],
+        "Customer Name": ["John Doe", "Jane Smith", "Alice Johnson"],
+        "Loyalty Reward Points": [100, 200, 300],
+        "Segment": ["Consumer", "Corporate", "Home Office"],
+    }
+    df = pd.DataFrame(data)
+
+    # Step 2: Define the path where the Hyper file will be saved.
+    path_to_database = Path("customer.hyper")
+
+    # Step 3: Optional process parameters.
+    # These settings limit the number of log files and their size.
+    process_parameters = {
+        "log_file_max_count": "2",  # Limit the number of log files to 2
+        "log_file_size_limit": "100M",  # Limit the log file size to 100 megabytes
+    }
+
+    # Step 4: Start the Hyper Process.
+    # Telemetry is set to send usage data to Tableau.
+    with HyperProcess(
+        telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU, parameters=process_parameters
+    ) as hyper:
+
+        # Step 5: Optional connection parameters.
+        # This sets the locale for time formats to 'en_US'.
+        connection_parameters = {"lc_time": "en_US"}
+
+        # Step 6: Create a connection to the Hyper file.
+        # If the file exists, it will be replaced.
+        with Connection(
+            endpoint=hyper.endpoint,
+            database=path_to_database,
+            create_mode=CreateMode.CREATE_AND_REPLACE,
+            parameters=connection_parameters,
+        ) as connection:
+
+            # Step 7: Define the table schema.
+            customer_table = TableDefinition(
+                table_name="Customer",  # Name of the table
+                columns=[
+                    TableDefinition.Column(
+                        "Customer ID", SqlType.text(), NOT_NULLABLE
+                    ),
+                    TableDefinition.Column(
+                        "Customer Name", SqlType.text(), NOT_NULLABLE
+                    ),
+                    TableDefinition.Column(
+                        "Loyalty Reward Points", SqlType.big_int(), NOT_NULLABLE
+                    ),
+                    TableDefinition.Column("Segment", SqlType.text(), NOT_NULLABLE),
+                ],
+            )
+
+            # Step 8: Create the table in the Hyper file.
+            connection.catalog.create_table(table_definition=customer_table)
+
+            # Step 9: Use the Inserter to insert data into the table.
+            with Inserter(connection, customer_table) as inserter:
+                # Iterate over the DataFrame rows as tuples.
+                # 'itertuples' returns an iterator yielding named tuples.
+                for row in df.itertuples(index=False, name=None):
+                    inserter.add_row(row)  # Add each row to the inserter
+                inserter.execute()  # Execute the insertion into the Hyper file
+
+            # Step 10: Verify the number of rows inserted.
+            row_count = connection.execute_scalar_query(
+                f"SELECT COUNT(*) FROM {customer_table.table_name}"
+            )
+            print(
+                f"The number of rows in table {customer_table.table_name} is {row_count}."
+            )
+            print("Data has been successfully inserted into the Hyper file.")
+
+        # The connection is automatically closed when exiting the 'with' block.
+        print("The connection to the Hyper file has been closed.")
+
+    # The Hyper process is automatically shut down when exiting the 'with' block.
+    print("The Hyper process has been shut down.")
+
+
+if __name__ == "__main__":
+    try:
+        run_create_hyper_file_from_dataframe()
+    except HyperException as ex:
+        print(ex)
+        exit(1)
diff --git a/Community-Supported/pandas-to-hyper/requirements.txt b/Community-Supported/pandas-to-hyper/requirements.txt
@@ -0,0 +1,2 @@
+tableauhyperapi
+pandas