Skip to content

Conversation

@Peiyingy
Copy link
Member

@Peiyingy Peiyingy commented Dec 4, 2025

Description

This PR adds strictRouting flag for routing as proposed in #797.
Routing rules can now support strictRouting: True to force pinning the query to the routing group

When strictRouting = false (default), the routing behavior remains unchanged.
When strictRouting = true, the gateway would not fall back to default clusters if the pinned backend becomes unavailable. Instead, the query should fail immediately with a 404 error.

Testing

  • mvn clean install
  • Added new test cases in TestStochasticRoutingManager
  • Local test to verify the strictRouting flag works
trino> select 1;
Error starting query at http://localhost:8080/v1/statement returned an invalid response: JsonResponse{statusCode=404, headers={content-length=[96], content-type=[text/plain], date=[Thu, 04 Dec 2025 03:45:04 GMT], vary=[Accept-Encoding]}, hasValue=false} [Error: No healthy backends available for routing group 'adhoc1' under strict routing for user 'pye']

clean up code

wip

add tests
@cla-bot cla-bot bot added the cla-signed label Dec 4, 2025
@Peiyingy Peiyingy marked this pull request as ready for review December 4, 2025 03:57
@RoeyoOgen
Copy link
Contributor

Just a few remarks:

  1. why "enforceIsolation" and not strictRouting? or noFallback?
  2. does this differ if the routing group is a single cluster rather than a group? - does this clash with your other PR #796?

@xkrogen
Copy link
Member

xkrogen commented Dec 4, 2025

why "enforceIsolation" and not strictRouting? or noFallback?

+1 strictRouting is a good name

@Peiyingy
Copy link
Member Author

Peiyingy commented Dec 4, 2025

Just a few remarks:

  1. why "enforceIsolation" and not strictRouting? or noFallback?
  2. does this differ if the routing group is a single cluster rather than a group? - does this clash with your other PR #796?
  1. Thanks for the suggesting. I'll change the name to strictRouting
  2. This would also support routing cluster rules. After either of the PR is merged, we just need to rebase the other PR to make sure both strictRouting and cluster routing are supported.

@Peiyingy Peiyingy changed the title Implement Stronger Isolation Semantics for Routing Rules Implement Strict Routing Semantics for Routing Rules Dec 4, 2025
List<ProxyBackendConfiguration> backends = gatewayBackendManager.getActiveBackends(routingGroup).stream()
.filter(backEnd -> isBackendHealthy(backEnd.getName()))
.toList();
if (backends.isEmpty() && strictRouting) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a nitpick, but evaluating strictRouting first can short circuit the backends.isEmpty() one, in case it's false. A teeny performance optimization would be to re-order these so it's if (strictRouting && backends.isEmpty())

.filter(backEnd -> isBackendHealthy(backEnd.getName()))
.toList();
if (strictRouting && backends.isEmpty()) {
throw new WebApplicationException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why WebApplicationException and not IllegalStateException like in IllegalStateException("Number of active backends found zero"))

or maybe a custom Explicit Exception inherited from RouterException?

Copy link
Contributor

@felicity3786 felicity3786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also update the official doc?

.filter(backEnd -> isBackendHealthy(backEnd.getName()))
.toList();
if (strictRouting && backends.isEmpty()) {
throw new WebApplicationException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a WARN or ERROR with context when the 404 is triggered? Also +1 on maybe making it a NoBackendAvailableException?

Copy link
Member

@oneonestar oneonestar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ambiguous what you mean strictRouting.

There are two fallback logics in routing.

  1. If RoutingGroupSelector failed to select a routing group for a query, fallback to defaultRoutingGroup.

    String routingGroup = !isNullOrEmpty(routingDestination.routingGroup())
    ? routingDestination.routingGroup()
    : defaultRoutingGroup;

  2. If there are no available active healthy cluster for the selected routing group, fallback to ActiveDefaultBackends (ie. active healthy cluster in defaultRoutingGroup).

    return selectBackend(backends, user).orElseGet(() -> provideDefaultBackendConfiguration(user));

It's not clear which fallback logic (or both) are disabled. If you decided to disable both (currently you only handled 2), please make sure the error messages are clearly distinguishable for these cases. Docs are also required to explain the exact affect of this config.

.toList();
if (strictRouting && backends.isEmpty()) {
throw new WebApplicationException(
Response.status(NOT_FOUND)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client did nothing wrong and shouldn't receive a 4xx error. We should return a 5XX indicate a server side error, maybe 503 Service Unavailable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants