Skip to content

Conversation

hqbhoho
Copy link
Contributor

@hqbhoho hqbhoho commented Sep 19, 2025

Description

Support exchange spooling on Alluxio

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## General
* Support exchange spooling on Alluxio when fault-tolerant execution is enabled.

@cla-bot cla-bot bot added the cla-signed label Sep 19, 2025
@hqbhoho hqbhoho marked this pull request as draft September 19, 2025 02:08
@hqbhoho hqbhoho force-pushed the feature/support_alluxio_in_fte branch 7 times, most recently from d8008ca to 9084874 Compare September 23, 2025 10:34
@hqbhoho hqbhoho marked this pull request as ready for review September 23, 2025 11:03
@wendigo wendigo requested a review from losipiuk September 23, 2025 11:40
return this;
}

public Optional<@FileExists String> getAlluxioSiteConfPath()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this one? We do not have it for io.trino.filesystem.alluxio.
If we need to be able to change some specific Alluxio config parameters it is better to expose those as specific getters/setters in ExchangeAlluxioConfig

Copy link
Contributor Author

@hqbhoho hqbhoho Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback!

In Alluxio filesystem doc, the Alluxio cluster connection is configured in the alluxio-site.properties properties file. The same config file must be located in /opt/alluxio/conf on all Trino cluster nodes.

Since the Alluxio client automatically loads configurations from /opt/alluxio/conf/alluxio-site.properties, a custom configuration is likely necessary for exchange spooling, as its requirements may differ from those of the Alluxio filesystem used for the catalog. So if Alluixo is used for exchange spooling, should we avoid loading the configuration from /opt/alluxio/conf/alluxio-site.properties which is only used for catalog?

This configuration is similar to hdfs.config.resources which would be used if HDFS serves as the exchange storage.

​​Due to the vast number of configuration settings available in Alluxio, It seems that using configuration files would be more convenient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It makes sense. Thanks

@hqbhoho hqbhoho force-pushed the feature/support_alluxio_in_fte branch from 9084874 to d5375e2 Compare September 23, 2025 16:37
Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.
I am a bit concerned about using syncronous APIs for implementing asynchronous SPI but let's see how it works in practice.

Would you be fine adding some minimal documentation to fault-tolerant-execution.md?

@github-actions github-actions bot added the docs label Sep 24, 2025
@hqbhoho
Copy link
Contributor Author

hqbhoho commented Sep 24, 2025

This looks good. I am a bit concerned about using syncronous APIs for implementing asynchronous SPI but let's see how it works in practice.

Would you be fine adding some minimal documentation to fault-tolerant-execution.md?

Currently, I use Alluxio as exchange storage by HDFS client, yet the P99 query latency decreased by 1/3 compared to using HDFS.

@hqbhoho hqbhoho requested a review from losipiuk September 24, 2025 09:54
@losipiuk losipiuk merged commit e94d266 into trinodb:master Sep 24, 2025
191 of 192 checks passed
@losipiuk
Copy link
Member

Thanks

@github-actions github-actions bot added this to the 477 milestone Sep 24, 2025
@raunaqmorarka
Copy link
Member

Now that this is supported natively, do we still need exchange.hdfs.skip-directory-scheme-validation ?
Since using that requires dropping jars into Trino, that's a more error prone way of doing the same thing

@hqbhoho
Copy link
Contributor Author

hqbhoho commented Sep 25, 2025

Now that this is supported natively, do we still need exchange.hdfs.skip-directory-scheme-validation ? Since using that requires dropping jars into Trino, that's a more error prone way of doing the same thing

The configuration exchange.hdfs.skip-directory-scheme-validation is used to ​​enable the use of Hadoop-Compatible File Systems with the HDFS client​​.
Not only for Alluxio, it enables users with more choices. In that case, I think it should be retained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants