Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🚀 Feature]: Selenium 4 Grid Slot Matching RFC #15481

Closed
sbabcoc opened this issue Mar 22, 2025 · 23 comments · Fixed by #15574
Closed

[🚀 Feature]: Selenium 4 Grid Slot Matching RFC #15481

sbabcoc opened this issue Mar 22, 2025 · 23 comments · Fixed by #15574
Labels
A-needs-triaging A Selenium member will evaluate this soon! I-enhancement Something could be better

Comments

@sbabcoc
Copy link
Contributor

sbabcoc commented Mar 22, 2025

Selenium 4 Grid Slot Matching RFC

Background

A core feature of Selenium 4 Grid is slot matching – the process of pairing a client “new session” request with a node that can satisfy the request. This process involves evaluating the client’s desired capabilities against the stereotypes of available nodes to determine which (if any) matches the request. The default implementation is defined in a class named DefaultSlotMatcher.

Implementation Details of DefaultSlotMatcher

  • Step 1: Initial match – evaluate request against defined stereotype values:
    • Filter out extension capabilities (names that contain “:”)
    • Filter out platformName capability
    • Evaluate stereotype value against desired capability value:
      • If desired value is a string, perform case-insensitive comparison…
      • … else if desired value is defined, perform object comparison…
      • … else ignore capability with undefined desired value
  • Step 2: Evaluate managed download capability:
    • Match if desired value is undefined or ‘false’…
    • … else match if stereotype supports managed downloads
  • Step 3: Evaluate platformVersion capability:
    • Evaluate desired capabilities with names containing platformVersion
    • Match if a stereotype capability with identical name has the same value
  • Step 4: Evaluate request against defined extension capability values:
    • Filter out capabilities containing goog:, moz:, ms:, or se:
    • Evaluate stereotype value against desired capability value:
      • If desired value is a string, perform case-insensitive comparison…
      • … else if desired value is defined, perform object comparison…
      • … else ignore capability with undefined desired value
  • Step 5: Evaluate browserName capability:
    • Match if desired browserName value is undefined or empty
    • Match if stereotype browserName capability has the desired value
  • Step 6: Evaluate browserVersion capability:
    • Match if desired browserVersion value is undefined, empty, or “stable”
    • Match if desired value is equivalent to stereotype browserVersion
  • Step 7: Evaluate platformName capability:
    • Match if desired platformName value is undefined
    • Match if stereotype platformName capability has the desired value
    • Match if desired value is equivalent to stereotype platformName

Current Issue – Extension Capability Handling

As currently implemented, DefaultSlotMatcher has one obvious issue at Step 4 in its handling of extension capabilities. It ignores the extension capabilities of three vendors (Google, Mozilla, and Microsoft), but considers the extension capabilities of every other vendor (including Appium and Apple). This produces inconsistent slot matching behavior, with Appium extension capabilities causing slot matching to fail while similar Google extension capabilities would have no ill effect.

Solution – Capability Type Differentiation

The existing handling of extension capabilities for the three “special” vendors suggests that capabilities might be divided into two distinct categories: “identity” values that are considered for slot matching and configuration options that are transmitted to the matched node without analysis.

Patterns from Current Usage

In current usage, values defined within “options” extension capability objects should be ignored for slot matching (i.e. – configuration options). Two additional extension capabilities are also currently ignored: <prefix>:debuggerAddress and <prefix>:loggingPrefs. Every other defined capability for the three “special” vendors is treated as an “identity” value. However, these categories are merely incidental, not formally defined within the specifications. Consequently, there’s no defined mechanism for vendors to add new “identity” values or configuration options.

Proposal: Formally Define Capability Type Differentiation

Formalizing the current implied categorization of capabilities into “identity” values and configuration options will provide two distinct benefits:

  1. Consistent treatment of extension capabilities will provide consistent behavior for all vendors.
  2. Defining explicit patterns for “identity” values and configuration options enables each vendor to select the desired treatment for their extension capabilities, allowing them to add new capabilities and options as needed and get the expected handling without the need for custom slot matchers.

Definition of configuration options

The proposed definition for configuration options, which will be ignored for purposes of slot matching:

  • Any extension capability with prefix se: (Selenium).
  • Any extension capability with one of the following suffixes:
    • options
    • Options
    • loggingPrefs
    • debuggerAddress

Definition of “identity” values

All capabilities that don’t conform with the definition of configuration options as defined above will be treated as “identity” values, which will be considered for purposes of slot matching. This includes extension capabilities with “special” prefixes that would previously have been ignored. This change is merely academic, though, given that there are no currently defined “special” extension capabilities that are affected by the revised behavior.

Written with StackEdit.

Usage example

This feature will be implemented in DefaultSlotMatcher. Consequently, anyone using Selenium 4 Grid without explicitly configuring a custom slot matcher will encounter the revised behavior.

@sbabcoc sbabcoc added I-enhancement Something could be better A-needs-triaging A Selenium member will evaluate this soon! labels Mar 22, 2025
Copy link

@sbabcoc, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@titusfortner
Copy link
Member

Wait, how do we match on platformVersion if that isn't even a valid capability?

If we filter out platformName in step 1 how are we matching it in step 7?

I don't like the idea of artificially declaring some values to be identity and some to be configuration. It's an arbitrary distinction that isn't in the spec.

I'd like to go the other direction and not treat any capabilities as special. Everything that I know of implements platformName, but Appium sessions don't always have browserName and browserVersion value is completely ignored by chromedriver regardless of what you put.

My question is: why do we need to treat any of these keys as special if the user isn't going to manually put them the stereotype? Is there some place where a more complicated stereotype is automatically generated?

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Mar 22, 2025

@titusfortner The check for platformVersion is performed in the existing DefaultSlotMatcher implementation. This isn't a new feature being proposed in this RFC, but it may have been added for the benefit of a specific vendor. As described in the Current Issues section, the existing implementation is inconsistent, treating extension capabilities with "special" prefixes differently than it treats extension capabilities with any other prefix.

As stated in this RFC, the purpose of capability differentiation is to enable vendors to decide which capabilities should be considered for slot matching, and which should be passed on to selected end nodes without analysis. This is currently only possible via custom slot matchers, which are mutually exclusive per hub. Another possible mechanism to enable vendor-specific behavior would be to add another method to WebDriverInfo to enable vendors to extend or override the matching performed by DefaultSlotMatcher.

Regarding the concept of formally defining two distinct capability categories versus how things are defined in the specification, I'm unaware of any detailed specifications for Selenium Grid, let alone the esoteric details of slot matching. As such, the implementation is the spec. The W3C WebDriver spec provides a lot of wiggle room regarding the handling of extension capabilities. It even presents the option of defining routing-specific capabilities to control how intermediate nodes match "new session" requests with compatible end nodes. This feature could potentially introduce an additional capability category (routing options), but I don't think this is strictly necessary.

@titusfortner
Copy link
Member

I know I'm trying to understand why the existing implementation is like that. (and we probably need to wait until after the conference for the Grid experts to have time to answer) 😄

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Mar 24, 2025

Regarding the handling of platformName, each step is able to evaluate the entire set of defined capabilities. Filtering reduces the number of capabilities that must be evaluated on each step by eliminating irrelevant values.

The goal of this RFC is to replace the current arbitrary disparities in how extension capabilities are handled with a defined pattern to differentiate "identity" values from configuration options. This will enable vendors to explicitly select the treatment they need, and it will remove the ambiguities currently faced by Grid creators and Grid users.

@VietND96
Copy link
Member

In DefaultSlotMatcher, there is a comment for platformVersion

private Boolean platformVersionMatch(Capabilities stereotype, Capabilities capabilities) {
/*
This platform version match is not W3C compliant but users can add Appium servers as
Nodes, so we avoid delaying the match until the Slot, which makes the whole matching
process faster.
*/

When you are reading Appium v2 docs. You will see 2 ways to pass the platformVersion

  1. { "appium:platformVersion": "14" }
  2. { "appium:options": { "platformVersion": "14" } }

I think we should remove this platformVersionMatch to avoid confusion. Let users define explicitly as above.

@VietND96
Copy link
Member

Filter out capabilities containing goog:, moz:, ms:, or se:

This is valid. For example, if you consider the case of Apple, we can add its vendor prefix too.
Moreover, I think we had a long discussion in this PR #14485. Where we propose that skip capability something like $cloud:options, $vendor:options from slot matcher. Whether users set via node stereotypes or request capabilities, those options will be merged and sent to particular vendor only.

@VietND96
Copy link
Member

Another thing that I can see is if you are referring to the Appium test through Grid using Relay Node. The matching is involved another implementation in RelaySessionFactory.java. So, if the request passes DefaultSlotMatcher and is assigned to Relay Node, if it isn't matching the logic in test(), you also can see the error SessionNotCreatedException("New session request capabilities do not " + "match the stereotype.")); from RelaySessionFactory

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Mar 31, 2025

@VietND96 The platformVersion capability should be treated as an "identity" value, which means that it must be declared as appium:platformVersion under the specifications defined by this RFC. There would be no need for Step 3, so the platformVersionMatch method would be eliminated.

To treat all prefixes uniformly, all extension capabilities except those beginning with se: will be treated as configuration options, which will not be considered for slot matching. Extension capabilities ending with options, Options, loggingPrefs, or debuggerAddress will also be treated as configuration options.

The matching performed by RelaySessionFactory will also be revised to remove Appium-specific implementation. The updated matching performed by DefaultSlotMatcher will no longer require this implementation.

@VietND96
Copy link
Member

I am not sure how you can test with platformVersion. Immediately, Grid will throw the error

selenium.common.exceptions.WebDriverException: Message: Illegal key values seen in w3c capabilities: [platformVersion]
Stacktrace:
java.lang.IllegalArgumentException: Illegal key values seen in w3c capabilities: [platformVersion]
at org.openqa.selenium.remote.NewSessionPayload.lambda$validate$5(NewSessionPayload.java:163)

Here is the Appium tests through Grid that I tried locally, for both hybrid browser session and native app, it works - https://github.com/NDViet/test-grid-relay-appium

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Mar 31, 2025

The extension capability appium:platformVersion is totally W3C compliant. This value cannot be specified without its appium: prefix. Since this capability will be treated as an "identity" value, it will be evaluated against available slot stereotypes to determine which node matches the desired capabilities.

I have an open PR that implements the patterns of this RFC here: #14485

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Apr 2, 2025

@VietND96 Appium works with Grid though Relay, but the way we've implemented extension capability support is inconsistent and inflexible. My local-grid-parent project is able to stand up Grid instances that support all flavors of Appium engines. Configuration of such instances is not trivial, and much of the complexity is rooted in the handling of extension capabilities.

@titusfortner
Copy link
Member

I agree on removing platformVersionMatch and have Appium users namespace their values with appium:

I still am confused about why we need to treat any of these keywords as "non-identity" values. Aren't we iterating over the stereotype, not the capabilities? In which case if it isn't in the stereotype it doesn't matter?

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Apr 3, 2025

@titusfortner As currently implemented, all extension capabilities with "special" prefixes are ignored for slot matching (configuration options), while all other extension capabilities are considered for slot matching ("identity" values). This is why RelaySessionFactory currently contains Appium-specific implementation - to filter out a value merged from the relay stereotype that would otherwise cause session creation to fail.

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Apr 3, 2025

The merge operation enables the relay to define common configuration options (e.g. - BrowserStack access tokens) so the client doesn't need to specify these, or even know that they exist. These serve as overridable default settings, which configure the selected node without affecting the slot matching process.

@diemol
Copy link
Member

diemol commented Apr 4, 2025

Capabilities from Google, Mozilla, and Microsoft need to be passed along to the browser driver, who can use them to match or extract values for session creation. That is why they do not need to be matched. Defining any capability with a Google, Mozilla, or Microsoft prefix does not make sense because the driver will use the values present. For example. debuggerAddress and loggingPrefs are values the driver uses when the session is created.

Apple-specific capabilities are uncommon in session requests, so we are not excluding them. This is probably a mistake, and we could change our approach and start ignoring them.

Appium-specific capabilities are also not entirely ignored because the Appium server does not create the session but uses them to validate and find the correct driver, which then uses those capabilities to create the session. The Appium server is also an intermediary node. That is why it makes sense to help with the matching early enough and avoid delaying this process.

I think it makes sense to ignore capabilities with an options suffix. The other two will still be ignored because we will continue ignoring browser vendor capabilities for matching.

In addition, custom slot matchers exist because a group of users relies on them to handle their infrastructure without the need to tell the users to specify more capabilities.

@diemol
Copy link
Member

diemol commented Apr 4, 2025

In DefaultSlotMatcher, there is a comment for platformVersion

private Boolean platformVersionMatch(Capabilities stereotype, Capabilities capabilities) {
/*
This platform version match is not W3C compliant but users can add Appium servers as
Nodes, so we avoid delaying the match until the Slot, which makes the whole matching
process faster.
*/

When you are reading Appium v2 docs. You will see 2 ways to pass the platformVersion

1. `{ "appium:platformVersion": "14" }`

2. `{ "appium:options": { "platformVersion": "14" } }`

I think we should remove this platformVersionMatch to avoid confusion. Let users define explicitly as above.

We need to do that match early because if we avoid it, then a session will be matched with a Relay Node that connects an Appium server that could not support the platformVersion being requested. This would be expensive and would use a slot from a Node that could be serving a valid request.

@diemol
Copy link
Member

diemol commented Apr 4, 2025

@VietND96 The platformVersion capability should be treated as an "identity" value, which means that it must be declared as appium:platformVersion under the specifications defined by this RFC. There would be no need for Step 3, so the platformVersionMatch method would be eliminated.

To treat all prefixes uniformly, all extension capabilities except those beginning with se: will be treated as configuration options, which will not be considered for slot matching. Extension capabilities ending with options, Options, loggingPrefs, or debuggerAddress will also be treated as configuration options.

The matching performed by RelaySessionFactory will also be revised to remove Appium-specific implementation. The updated matching performed by DefaultSlotMatcher will no longer require this implementation.

Removing the platformVersion match will be very expensive and it can create sessions that potentially do not match the version requested.

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Apr 4, 2025

@diemol Appium and Safari both define "identity" values that should be considered for slot matching and configuration options that should not be considered for slot matching. In the Appium case, appium:platformVersion is an "identity" value that should be considered, and appium:newCommandTimeout is a configuration option that should not be considered. The current implementation of DefaultSlotMatcher does not allow for the possibility of the relay stereotype defining a default setting for appium:newCommandTimeout that the client could override in its desired capabilities. The client must either omit appium:newCommandTimeout or specify the same setting, otherwise the slot match will fail.

For Chrome, browserVersion is an "identity" value that should be considered. For HtmlUnit, browserVersion is a configuration option that should not be considered. The current implementation of DefaultSlotMatcher does not provide any mechanism for defining an HtmlUnit node with a default specification for browserVersion that clients could override in their desired capabilities.

The proposal of this RFC is that we eliminate the inconsistent handling of extension capabilities and define patterns to differentiate "identity" values from configuration options.

  • For HtmlUnit...
    • ... the browser version specification migrates into garg:htmlunitOptions, which will be ignored for slot matching.
    • NOTE: The current implementation of DefaultSlotMatcher treats garg:htmlunitOptions as an "identity" value, so there's no way to define a node that defines a default specification for browserVersion that clients could override in their desired capabilities.
  • For Appium...
    • ... platformVersion must be defined as appium:platformVersion at the root of the relay stereotype or desired capabilities object, where it will be treated as an "identity" value. Defining platformVersion in the appium:options object will cause it to be treated as a configuration option.
    • ... newCommandTimeout must be defined in the appium:options object, where it will be treated as a configuration option. Defining newCommandTimeout as appium:newCommandTimeout at the root of the relay stereotype or desired capabilities object will cause it to be treated as an "identity" value.
    • NOTE: The rigid delineation of "identity" values and configuration options conflicts with the "either/or" format indicated by the Appium documentation. Their imprecise characterization of this issue is a direct consequence of the lack of precision in the core Selenium specification.

@diemol
Copy link
Member

diemol commented Apr 4, 2025

The current implementation of DefaultSlotMatcher does not allow for the possibility of the relay stereotype defining a default setting for appium:newCommandTimeout that the client could override in its desired capabilities.

Because this is not the responsibility of the DefaultSlotMatcher. If that is desired, you can always implement your own Matcher.

For HtmlUnit, browserVersion is a configuration option that should not be considered.

Again, that is why you can implement your custom Matcher.

The proposal of this RFC is that we eliminate the inconsistent handling of extension capabilities and define patterns to differentiate "identity" values from configuration options.

The inconsistency you claim is based on assumptions you have. The use cases you mention are not the responsibility of the DefaultSlotMatcher, and again, that is why you can always extend it and implement your own.

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Apr 4, 2025

@diemol The core issue with custom matchers is that each hub can only specify one matcher. There's no mechanism to extend the default matcher; it can only be replaced. This sledgehammer approach should be reserved for uncommon scenarios, not normal cases like the ones highlighted in my examples. The inconsistencies cited in this RFC have been clearly defined. We treat extension capabilities with "special" prefixes one way and every other extension capability in precisely the opposite way.
The primary mechanism I've defined to resolve the existing inconsistencies is to differentiate "identity" values from configuration options, eliminating the "special" status of a hardcoded set of vendors. A secondary mechanism could be provided by extending the WebDriverInfo interface so that each vendor could introduce their own evaluations into the slot matching process

@diemol
Copy link
Member

diemol commented Apr 4, 2025

The core issue with custom matchers is that each hub can only specify one matcher.

How many matchers do you need? You can implement your own with all the logic you need to handle each case. Your examples are not the common use cases, no one has mentioned them in over 2-3 years.

@sbabcoc
Copy link
Contributor Author

sbabcoc commented Apr 4, 2025

@diemol The current mechanism forces arbitrary separations between hubs that support sessions from different vendors. Each vendor that requires matching behavior that deviates from "default" must be vended by a separate grid instance. The patterns specified in this RFC eliminate these arbitrary separations.

Regarding mention of vendor-specific slot matching issues, there have been several revisions in the last three years to resolve these sorts of issues. Also, the ability to run remote HtmlUnit sessions in Selenium 4 Grid was introduced less than a year ago. This is the point at which I first fully recognized the inconsistencies in slot matching behavior, and I've been working on solutions ever since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-needs-triaging A Selenium member will evaluate this soon! I-enhancement Something could be better
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants