Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions presto-docs/src/main/sphinx/presto-cpp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Note: Presto C++ is in active development. See :doc:`Limitations </presto_cpp/li
.. toctree::
:maxdepth: 1

presto_cpp/installation
presto_cpp/features
presto_cpp/sidecar
presto_cpp/limitations
Expand Down
263 changes: 263 additions & 0 deletions presto-docs/src/main/sphinx/presto_cpp/installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
=======================
Presto C++ Installation
=======================

.. contents::
:local:
:backlinks: none
:depth: 1

This shows how to install and run a lightweight Presto cluster utilizing a PrestoDB Java Coordinator and Prestissimo (C++) Workers using Docker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This shows how to install and run a lightweight Presto cluster utilizing a PrestoDB Java Coordinator and Prestissimo (C++) Workers using Docker.
This shows how to install and run a lightweight Presto cluster utilizing a PrestoDB Java Coordinator and Prestissimo (Presto C++) Workers using Docker.
For more information about Presto C++, see the :ref:`presto-cpp:overview`.

This new page in the Presto documentation does not have to explain what is Prestissimo, we can refer to the existing doc for that, and focus this page on how to deploy Presto with Prestissimo workers.


The setup uses Meta's high-performance Velox engine for worker-side query execution. We will configure a cluster and run a test query with the built-in TPCH connector.

Introducing Prestissimo (Presto C++ Worker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete lines 14-23, as we are leveraging the existing doc with the new sentence in line 10.

-------------------------------------------

Prestissimo is the C++ native implementation of the Presto :ref:`overview/concepts:worker`. It is designed to be a drop-in replacement for the traditional Java worker. It is built using Velox, a high-performance, open-source C++ database acceleration library created by Meta.

A C++ execution engine offers significant advantages for data lake analytics:

* **Massive Performance Boost:** Prestissimo achieves increases in CPU efficiency and reduces query latency by leveraging native C++ execution, vectorization, and SIMD (Single Instruction, Multiple Data) instructions.
* **Eliminates Java Garbage Collection Issues:** By moving the execution engine out of the Java Virtual Machine (JVM), this architecture removes performance spikes and pauses associated with Java Garbage Collection, resulting in more consistent and stable query times.
* **Explicit Memory Control:** The Velox memory management framework offers explicit memory accounting and arbitration, providing finer control over resource consumption than in the JVM.

Prerequisites
-------------

To follow this tutorial, you need:

* Docker installed.
* Basic familiarity with the terminal and shell commands.

Setup Guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this heading, and the formatting line below it.

-----------

The recommended directory structure uses ``presto-lab`` as the root directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this sentence to become the first sentence after the heading Create a Working Directory.


Create a Working Directory
^^^^^^^^^^^^^^^^^^^^^^^^^^

Create a clean root directory to hold all necessary configuration files and the ``docker-compose.yml`` file.

.. code-block:: bash

mkdir -p ~/presto-lab
cd ~/presto-lab

Configure the Presto Java Coordinator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good: separating the steps to configure the Presto Java Coordinator from the steps to configure the Presto C++ worker. Putting all the configuration steps into one topic could make sense from one perspective, but it makes the topic larger than it needs to be. This way, the reader can configure the coordinator, have a breakpoint in the steps to pause, then do the next topic of configuring the Presto C++ worker.

No change suggested, just remarking that I think this way to group the steps is a good choice.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Coordinator requires configuration to define its role, enable the discovery service, and set up a catalog for querying.

A. Create Configuration Directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A. Create Configuration Directory
1. Create Configuration Directory

"""""""""""""""""""""""""""""""""

.. code-block:: bash

mkdir -p coordinator/etc/catalog

This command creates the necessary directories for the coordinator and its catalogs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This command creates the necessary directories for the coordinator and its catalogs.
To create the necessary directories for the coordinator and its catalogs, run the following command:

Make this an imperative, and move it to before mkdir -p coordinator/etc/catalog.


B. Create ``coordinator/etc/config.properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B. Create ``coordinator/etc/config.properties``
2. Create the Coordinator Configuration File

Also, edit the formatting line for the exact number of characters in this line.

"""""""""""""""""""""""""""""""""""""""""""""""

This file enables the coordinator mode, the discovery server, and sets the HTTP port to ``8080``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This file enables the coordinator mode, the discovery server, and sets the HTTP port to ``8080``.
Create the file `coordinator/etc/config.properties` with the following contents. This file enables the coordinator mode, the discovery server, and sets the HTTP port to ``8080``.


.. code-block:: properties

# coordinator/etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080

* ``coordinator=true``: Enables the coordinator mode.
* ``discovery-server.enabled=true``: Designates the coordinator as the host for the worker discovery service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an explanation of http-server.http.port=8080.

C. Create ``coordinator/etc/jvm.config``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
C. Create ``coordinator/etc/jvm.config``
3. Create the JVM Configuration File

""""""""""""""""""""""""""""""""""""""""

These are standard **Java 17** flags for Presto, optimizing the JVM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These are standard **Java 17** flags for Presto, optimizing the JVM.
Create the file ``coordinator/etc/jvm.config`` with the following content. These are standard **Java 17** flags for Presto, optimizing the JVM.

"optimizing the JVM" ...for what?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the explanation of JVM optimization? @steveburnett

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saurabhmahawar That is a good question! I could have explained myself better, let me try again.

Optimizing has a goal or purpose. I'm asking for only something short that will fit in the rest of the same sentence.

What, in general, do these setting optimize the JVM for?

"for better performance with SQL queries" or something like that.


.. code-block:: text

# coordinator/etc/jvm.config
-server
-Xmx1G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.ref=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.security=ALL-UNNAMED
--add-opens=java.base/javax.security.auth=ALL-UNNAMED
--add-opens=java.base/javax.security.auth.login=ALL-UNNAMED
--add-opens=java.base/java.text=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/java.util.regex=ALL-UNNAMED
--add-opens=java.base/jdk.internal.loader=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED

D. Create ``coordinator/etc/node.properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D. Create ``coordinator/etc/node.properties``
4. Create the Node Properties File

"""""""""""""""""""""""""""""""""""""""""""""

This file sets the node environment and the data directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This file sets the node environment and the data directory.
Create the file ``coordinator/etc/node.properties`` with the following content. This file sets the node environment and the data directory.


.. code-block:: properties

# coordinator/etc/node.properties
node.id=${ENV:HOSTNAME}
node.environment=test
node.data-dir=/var/lib/presto/data

E. Add TPCH Catalog Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
E. Add TPCH Catalog Configuration
5. Create the TPC-H Catalog Configuration File

Add a dash following the example in https://www.tpc.org/tpch/. It can help to check product and project names for correct capitalization and such. A good example of an area in the Presto documentation this checking helps is in the connector doc, such as PostgreSQL, Delta Lake, MongoDB, and MySQL.

"""""""""""""""""""""""""""""""""

The TPCH catalog enables running test queries against an in-memory dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The TPCH catalog enables running test queries against an in-memory dataset.
The TPC-H catalog enables running test queries against an in-memory dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The TPCH catalog enables running test queries against an in-memory dataset.
Create the file ``coordinator/etc/catalog/tpch.properties`` with the following content. The TPCH catalog enables running test queries against an in-memory dataset.


.. code-block:: properties

# coordinator/etc/catalog/tpch.properties
connector.name=tpch

Configure the Prestissimo (C++) Worker
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Worker must be configured to locate the Coordinator or Discovery service and identify itself within the network.

Repeat this step to add more workers, such as ``worker-2``.

A. Create Worker Configuration Directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A. Create Worker Configuration Directory
1. Create Worker Configuration Directory

""""""""""""""""""""""""""""""""""""""""

.. code-block:: bash

mkdir -p worker-1/etc/catalog

B. Create ``worker-1/etc/config.properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B. Create ``worker-1/etc/config.properties``
2. Create ``worker-1/etc/config.properties``

Please revise these steps in the same way that I suggested revising the coordinator headings: describe the action in the heading, and put the imperative instruction as the first sentence of the text.

""""""""""""""""""""""""""""""""""""""""""""

This configuration points the worker to the discovery service running on the coordinator.

.. code-block:: properties

# worker-1/etc/config.properties
discovery.uri=http://coordinator:8080
presto.version=0.288-15f14bb
http-server.http.port=7777
shutdown-onset-sec=1
runtime-metrics-collection-enabled=true

* ``discovery.uri=http://coordinator:8080``: This uses the coordinator service name as defined in the ``docker-compose.yml`` file for network communication within Docker.

C. Configure ``worker-1/etc/node.properties``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
C. Configure ``worker-1/etc/node.properties``
3. Configure ``worker-1/etc/node.properties``

"""""""""""""""""""""""""""""""""""""""""""""

This defines the worker's internal address for reliable registration.

.. code-block:: properties

# worker-1/etc/node.properties
node.environment=test
node.internal-address=worker-1
node.location=docker
node.id=worker-1

* ``node.internal-address=worker-1``: This setting matches the service name defined in Docker Compose.

D. Add TPCH Catalog Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D. Add TPCH Catalog Configuration
4. Add TPC-H Catalog Configuration

"""""""""""""""""""""""""""""""""

The worker requires the same catalog definition as the coordinator to execute the query stages.

.. code-block:: properties

# worker-1/etc/catalog/tpch.properties
connector.name=tpch

Step 4: Create ``docker-compose.yml``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Step 4: Create ``docker-compose.yml``
Create ``docker-compose.yml``

Please also edit the formatting line below this one.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This file orchestrates both the Java Coordinator and the C++ Worker containers. Create the file ``docker-compose.yml`` in your ``~/presto-lab`` directory.

.. code-block:: yaml

# docker-compose.yml
services:
coordinator:
image: public.ecr.aws/oss-presto/presto:latest
platform: linux/amd64
container_name: presto-coordinator
hostname: coordinator
ports:
- "8080:8080"
volumes:
- ./coordinator/etc:/opt/presto-server/etc:ro
restart: unless-stopped

worker-1:
image: public.ecr.aws/oss-presto/presto-native:latest
platform: linux/amd64
container_name: prestissimo-worker-1
hostname: worker-1
depends_on:
- coordinator
volumes:
- ./worker-1/etc:/opt/presto-server/etc:ro
restart: unless-stopped

worker-2:
image: public.ecr.aws/oss-presto/presto-native:latest
platform: linux/amd64
container_name: prestissimo-worker-2
hostname: worker-2
depends_on:
- coordinator
volumes:
- ./worker-2/etc:/opt/presto-server/etc:ro
restart: unless-stopped

* The **coordinator** service uses the standard **Java Presto image** (presto:latest).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove bold

* The **worker-1** and **worker-2** services use the **Prestissimo (C++ Native) image** (presto-native:latest).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove bold

* The setting ``platform: linux/amd64`` is essential for users running on Apple Silicon Macs.
* The ``volumes`` section mounts your local configuration directories (``./coordinator/etc``, ``./worker-1/etc``) into the container's expected path (``/opt/presto-server/etc``).

Step 5: Start the Cluster and Verify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Step 5: Start the Cluster and Verify
Start the Cluster and Verify

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A. Start the Cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A. Start the Cluster
1. Start the Cluster

""""""""""""""""""""

Use Docker Compose to start the cluster in detached mode (``-d``).

.. code-block:: bash

docker compose up -d

B. Verify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B. Verify
2. Verify

"""""""""

1. **Check the Web UI:** Open the Presto Web UI at http://localhost:8080.

* *Verification Result:* You should see the UI displaying 3 Active Workers (1 Coordinator and 2 Workers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* *Verification Result:* You should see the UI displaying 3 Active Workers (1 Coordinator and 2 Workers).
You should see the UI displaying 3 Active Workers (1 Coordinator and 2 Workers).


2. **Check Detailed Node Status (SQL Query):** Run the following query to check the detailed status and metadata about every node (Coordinator and Workers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. **Check Detailed Node Status (SQL Query):** Run the following query to check the detailed status and metadata about every node (Coordinator and Workers).
2. **Check Detailed Node Status**: Run the following SQL query to check the detailed status and metadata about every node (Coordinator and Workers).


.. code-block:: sql

select * from system.runtime.nodes;

This confirms the cluster nodes are registered and active.
Loading