Skip to content

Commit 75913ed

Browse files
authored
New sample: Create Hyper file from pandas DataFrame using Hyper API (#110)
After running into issues with pantab, I created a simple example for Pandas DataFrame-to-Hyper with the Hyper API that works without issue, easy to understand and implement without adding another dependency. Wanted to share with others!
1 parent 7ff4308 commit 75913ed

File tree

3 files changed

+194
-0
lines changed

3 files changed

+194
-0
lines changed
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
2+
# hyper-from-dataframe
3+
## Create a Hyper File from a Pandas DataFrame
4+
5+
![Community Supported](https://img.shields.io/badge/Support%20Level-Community%20Supported-53bd92.svg)
6+
7+
This Python script demonstrates how to create a `.hyper` file (Tableau's Hyper database format) from a pandas DataFrame. It uses Tableau's Hyper API to define a table structure, insert the data from the DataFrame, and save it as a `.hyper` file.
8+
9+
This is example shows an alternative to [using pantab](https://tableau.github.io/hyper-db/docs/guides/pandas_integration#loading-data-through-pandas), in case pantab cannot be used.
10+
## Get Started
11+
12+
### Prerequisites
13+
14+
Before running the script, ensure you have the following installed:
15+
16+
- Python >= 3.6
17+
- The required dependencies listed in `requirements.txt`.
18+
19+
### Install Dependencies
20+
21+
To install the necessary dependencies, run the following command:
22+
23+
```bash
24+
pip install -r requirements.txt
25+
```
26+
27+
### Running the Script
28+
29+
To run the script and generate the `.hyper` file, execute:
30+
31+
```bash
32+
python create_hyper_from_pandas_dataframe.py
33+
```
34+
35+
### What the Script Does
36+
37+
1. Creates a pandas DataFrame containing sample customer data.
38+
2. Defines a table schema for the Hyper file, including columns like Customer ID, Customer Name, Loyalty Points, and Segment.
39+
3. Inserts the DataFrame data into the Hyper file `customer.hyper`.
40+
4. Verifies the number of rows inserted and prints a confirmation message.
41+
42+
### Modifying the Script
43+
44+
You can easily modify the script to load your own data by:
45+
46+
1. Changing the data inside the `data` dictionary to match your own structure.
47+
2. Adjusting the table schema in the `TableDefinition` object accordingly to reflect your columns.
48+
49+
### Example Output
50+
51+
When you run the script, you should see output similar to this:
52+
53+
```
54+
EXAMPLE - Load data from pandas DataFrame into table in new Hyper file
55+
The number of rows in table Customer is 3.
56+
The connection to the Hyper file has been closed.
57+
The Hyper process has been shut down.
58+
```
59+
60+
### Error Handling
61+
62+
If any issues occur, such as problems connecting to the Hyper file or inserting data, the script will raise an exception and print an error message to the console.
63+
64+
## Notes
65+
66+
This sample script demonstrates:
67+
68+
- How to use Tableau's `HyperProcess` and `Connection` classes.
69+
- Defining table schemas using `TableDefinition`.
70+
- Inserting data into the Hyper table using the `Inserter` class.
71+
72+
### Resources
73+
74+
- [Tableau Hyper API Documentation](https://tableau.github.io/hyper-db/lang_docs/py/index.html)
75+
- [Tableau Hyper API SQL Reference](https://tableau.github.io/hyper-db/docs/sql/)
76+
- [pandas Documentation](https://pandas.pydata.org/docs/)
77+
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Import necessary standard libraries
2+
from pathlib import Path # For file path manipulations
3+
4+
# Import pandas for DataFrame creation and manipulation
5+
import pandas as pd
6+
7+
# Import necessary classes from the Tableau Hyper API
8+
from tableauhyperapi import (
9+
HyperProcess,
10+
Telemetry,
11+
Connection,
12+
CreateMode,
13+
NOT_NULLABLE,
14+
NULLABLE,
15+
SqlType,
16+
TableDefinition,
17+
Inserter,
18+
HyperException,
19+
)
20+
21+
def run_create_hyper_file_from_dataframe():
22+
"""
23+
An example demonstrating loading data from a pandas DataFrame into a new Hyper file.
24+
"""
25+
26+
print("EXAMPLE - Load data from pandas DataFrame into table in new Hyper file")
27+
28+
# Step 1: Create a sample pandas DataFrame.
29+
data = {
30+
"Customer ID": ["DK-13375", "EB-13705", "JH-13600"],
31+
"Customer Name": ["John Doe", "Jane Smith", "Alice Johnson"],
32+
"Loyalty Reward Points": [100, 200, 300],
33+
"Segment": ["Consumer", "Corporate", "Home Office"],
34+
}
35+
df = pd.DataFrame(data)
36+
37+
# Step 2: Define the path where the Hyper file will be saved.
38+
path_to_database = Path("customer.hyper")
39+
40+
# Step 3: Optional process parameters.
41+
# These settings limit the number of log files and their size.
42+
process_parameters = {
43+
"log_file_max_count": "2", # Limit the number of log files to 2
44+
"log_file_size_limit": "100M", # Limit the log file size to 100 megabytes
45+
}
46+
47+
# Step 4: Start the Hyper Process.
48+
# Telemetry is set to send usage data to Tableau.
49+
with HyperProcess(
50+
telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU, parameters=process_parameters
51+
) as hyper:
52+
53+
# Step 5: Optional connection parameters.
54+
# This sets the locale for time formats to 'en_US'.
55+
connection_parameters = {"lc_time": "en_US"}
56+
57+
# Step 6: Create a connection to the Hyper file.
58+
# If the file exists, it will be replaced.
59+
with Connection(
60+
endpoint=hyper.endpoint,
61+
database=path_to_database,
62+
create_mode=CreateMode.CREATE_AND_REPLACE,
63+
parameters=connection_parameters,
64+
) as connection:
65+
66+
# Step 7: Define the table schema.
67+
customer_table = TableDefinition(
68+
table_name="Customer", # Name of the table
69+
columns=[
70+
TableDefinition.Column(
71+
"Customer ID", SqlType.text(), NOT_NULLABLE
72+
),
73+
TableDefinition.Column(
74+
"Customer Name", SqlType.text(), NOT_NULLABLE
75+
),
76+
TableDefinition.Column(
77+
"Loyalty Reward Points", SqlType.big_int(), NOT_NULLABLE
78+
),
79+
TableDefinition.Column("Segment", SqlType.text(), NOT_NULLABLE),
80+
],
81+
)
82+
83+
# Step 8: Create the table in the Hyper file.
84+
connection.catalog.create_table(table_definition=customer_table)
85+
86+
# Step 9: Use the Inserter to insert data into the table.
87+
with Inserter(connection, customer_table) as inserter:
88+
# Iterate over the DataFrame rows as tuples.
89+
# 'itertuples' returns an iterator yielding named tuples.
90+
for row in df.itertuples(index=False, name=None):
91+
inserter.add_row(row) # Add each row to the inserter
92+
inserter.execute() # Execute the insertion into the Hyper file
93+
94+
# Step 10: Verify the number of rows inserted.
95+
row_count = connection.execute_scalar_query(
96+
f"SELECT COUNT(*) FROM {customer_table.table_name}"
97+
)
98+
print(
99+
f"The number of rows in table {customer_table.table_name} is {row_count}."
100+
)
101+
print("Data has been successfully inserted into the Hyper file.")
102+
103+
# The connection is automatically closed when exiting the 'with' block.
104+
print("The connection to the Hyper file has been closed.")
105+
106+
# The Hyper process is automatically shut down when exiting the 'with' block.
107+
print("The Hyper process has been shut down.")
108+
109+
110+
if __name__ == "__main__":
111+
try:
112+
run_create_hyper_file_from_dataframe()
113+
except HyperException as ex:
114+
print(ex)
115+
exit(1)
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
tableauhyperapi
2+
pandas

0 commit comments

Comments
 (0)