Add Node.js version selection strategy framework with EOL policy support #5422

rishabhmalikMS · 2025-12-08T10:41:43Z

Context

This PR introduces the Node.js version selection strategy framework as part of modernizing Azure Pipelines Agent's Node.js handling. This is the first implementation step following comprehensive test specifications established in previous work. The strategy pattern provides a foundation for implementing EOL (End-of-Life) Node.js version policies andnode handler selection across both host and container environments.

Related Work: This builds on test specifications for Node Handler behavior and enables future EOL policy enforcement and node selection logic.

Workitem: AB#2339020

Description

Implemented comprehensive strategy pattern for Node.js version selection with the following components:

Core Strategy Framework:

INodeVersionStrategy.cs - Strategy interface defining CanHandle() and GetNodePath() contracts
NodeContext.cs - Context model containing environment state, glibc compatibility flags, and handler data
NodePathResult.cs - Result model with node path, version, selection reason, and optional warnings

Strategy Implementations:

Node24Strategy.cs - Latest Node24 support with glibc fallback logic and EOL policy compliance
Node20Strategy.cs - Current default Node20 with glibc error handling
Node16Strategy.cs - Legacy Node16 support with EOL warnings
Node10Strategy.cs - Node10 support for Alpine compatibility with EOL policies
Node6Strategy.cs - Legacy Node6 support with EOL enforcement

Key Features:

EOL Policy Support - Configurable blocking of end-of-life Node.js versions via AGENT_ENABLE_EOL_NODE_VERSION_POLICY
Glibc Compatibility - Automatic fallback handling for Node20/24 glibc errors (requires glibc 2.17+/2.28+)
Container/Host Unification - Single strategy pattern works for both container and host execution
Priority-Based Selection - Strategies evaluated in optimal order: Node24 → Node20 → Node16 → Node10 → Node6
Comprehensive Logging - Detailed debug output for selection decisions and fallback reasoning

Risk Assessment (Low / Medium / High)

Low Risk - This PR introduces only new strategy implementation code without modifying existing NodeHandler logic. The strategies remain dormant until integrated via the feature flag in subsequent PRs. All existing NodeHandler behavior continues unchanged.

Unit Tests Added or Updated (Yes / No)

No unit tests added in this PR. Test infrastructure and comprehensive scenario validation will be implemented in subsequent integration PR to validate equivalence between legacy and strategies.
Previous PR #5421 has covered all scenarios for both legacy testing. The same tests will be run on the new implementation once integration is completed to ensure behavior is intact and as per expectations where is it suppoed to differ.
Further, additional tests will be added for equivalency and divergence to ensure legacy and new strategy appraoch works as per expectations. These will utilize the exisitng test scenarios we have added as a part of PR: #5421

Additional Testing Performed

List manual or automated tests performed beyond unit tests (e.g., integration, scenario, regression).

Change Behind Feature Flag (Yes / No)

Yes - The strategy framework will be activated via a feature flag in the integration PR (to be shared). This allows for safe rollback and testing between legacy and new strategy approache.

Tech Design / Approach

Design has been written and reviewed.
Any architectural decisions, trade-offs, and alternatives are captured.
Design doc: End-of-Life Node.js Version Enforcement in Azure Pipelines Agent

Documentation Changes Required (Yes/No)

Logging Added/Updated (Yes/No)

Yes. Appropriate logs have been added.

Telemetry Added/Updated (Yes/No)

Custom telemetry (e.g., counters, timers, error tracking) is added as needed.
Events are tagged with proper metadata for filtering and analysis.
Telemetry is validated in staging or test environments.

Rollback Scenario and Process (Yes/No)

Rollback plan is documented.

Dependency Impact Assessed and Regression Tested (Yes/No)

All impacted internal modules, APIs, services, and third-party libraries are analyzed.
Results are reviewed and confirmed to not break existing functionality.

…or node strategy and response data models for each strategy result

…github.com/microsoft/azure-pipelines-agent into users/rishabhmalikMS/NodehandlerStrategies

azure-pipelines · 2025-12-09T04:22:43Z

Azure Pipelines successfully started running 1 pipeline(s).

rishabhmalikMS · 2025-12-09T04:24:10Z

/azp run

azure-pipelines · 2025-12-09T04:24:23Z

Azure Pipelines successfully started running 1 pipeline(s).

rishabhmalikMS · 2025-12-09T04:41:01Z

/azp run

azure-pipelines · 2025-12-09T04:41:16Z

Azure Pipelines successfully started running 1 pipeline(s).

tarunramsinghani · 2025-12-09T12:33:18Z

src/Agent.Sdk/Knob/AgentKnobs.cs

+            nameof(EnableEOLNodeVersionPolicy),
+            "When enabled, automatically upgrades tasks using end-of-life Node.js versions (6, 10, 16) to supported versions (Node 20.1 or Node 24). Throws error if no supported versions are available on the agent.",
+            new PipelineFeatureSource("AGENT_RESTRICT_EOL_NODE_VERSIONS"),
+            new EnvironmentKnobSource("AGENT_RESTRICT_EOL_NODE_VERSIONS"),


DO we need both sources for toggle to work ?

How are we planning to do E2E test for this ?

One knob is used as a feature flag to enable/disable the feature all together. Another is used to read the value of toggle from UI injected from server side. The same feature flag can be used to enable/disable Ui toggle & agent side functionality to maintain consistency on both ends.

Regarding E2E testing: Planning to use devfabric for updated server-side logic along with updated agent logic to test integrated functionality.

tarunramsinghani · 2025-12-09T12:39:06Z

src/Agent.Worker/NodeVersionStrategies/NodeContext.cs

+        /// <summary>
+        /// True if Node20 has glibc compatibility errors (requires glibc 2.17+).
+        /// </summary>
+        public bool Node20HasGlibcError { get; set; }


IMO These are node specific and should not be part of the common context I

This makes sense but from implementation perspective. If we remove glibc compatibility flags from NodeContext and try to move inside individual node 24 and 20 strategies, it will cause following issues:

Duplicate runtime checks for these: these are required in both legacy and new approach as well in both node 24 and node 20 strategies. We would end up having the same runtime checks at 3 places instead of 1.

Sames checks would run multiple times in both node 24 and node 20 handling. And if we somehow try to reuse by having it at single place we would need to pass them as context to node 24 and node 20 strategies along with legacy code. Which means this will end up outside the individual strategy code boundry.

I would prefer a clean design here instead of optimizing for implementation. Three checks should not be an issue IMO.

tarunramsinghani · 2025-12-09T12:39:49Z

src/Agent.Worker/NodeVersionStrategies/NodeContext.cs

+        /// <summary>
+        /// The selected node version determined by CanHandle().
+        /// Examples: "node24", "node20_1", "node16", "node10", "node"
+        /// </summary>


These are output and I assumed taht this class is imput context so we shold not merge these 2 into single object.

Here is the proposed fix for this:
Removing 3 fields from NodeContext: SelectedNodeVersion, SelectionReason and SelectionWarning
Since these are used in NoeRunnerInfo, will remove these from NodeContext and reuse from NodeRunnerInfo in strategy code.

As this will require updates in Orchestrator to run the strategies which utilizes results from strategies functions, will incorporate this change in next PR for orchestrator + node handler integration immediately following this PR closure.

Changes for this are in this next PR #5425

src/Agent.Worker/NodeVersionStrategies/IUnifiedNodeVersionStrategy.cs

tarunramsinghani · 2025-12-09T12:43:44Z

src/Agent.Worker/NodeVersionStrategies/IUnifiedNodeVersionStrategy.cs

+        /// </summary>
+        /// <param name="context">Context with environment, task, and glibc information</param>
+        /// <returns>True if this strategy can handle the context, false otherwise</returns>
+        bool CanHandle(UnifiedNodeContext context);


Input should be taskcontext and output should NodePathResult.

Currently canHandler function determines if the corresponding node can be selected or not based on boolean value. We have separate function getNodePath in each strategy which frames the node path based on the node selected. This is done to keep 2 operations separate:

Whether a node can be used or not

If it can be used if have getNodePath function which generate the path based on host/container environment and returns the result containing the same

If we aim to return NodePathResult (NodeRunnerInfo now after naming convention updates), this would not allow us to have single responsibilty for the canHandle and getNodePath functions.

src/Agent.Worker/NodeVersionStrategies/NodePathResult.cs

tarunramsinghani · 2025-12-09T12:45:52Z

src/Agent.Worker/NodeVersionStrategies/UnifiedNodeContext.cs

+    /// Unified context for both host and container node selection.
+    /// Contains runtime data - strategies read their own knobs via ExecutionContext.
+    /// </summary>
+    public sealed class UnifiedNodeContext


I am assuming the node is selcted based on task.json, so this should be TaskContext IMO ? also does it need anything else apart from what is defined in task.json

TaskContext might be misleading here, as the data model contains details required in node selection process and not general task execution.
Regarding "I am assuming the node is selected based on task.json", yes, node is selected based on the handler we get from task.json processed by task runner and passed to node handler which is used in both existing logic and the new node strategy classes for taking decisions.

raujaiswal · 2025-12-10T04:33:05Z

src/Agent.Worker/NodeVersionStrategies/UnifiedNode20Strategy.cs

+
+            if (eolPolicyEnabled)
+            {
+                throw new NotSupportedException(StringUtil.Loc("NodeEOLFallbackBlocked", "Node20", "Node16"));


Since we are deliberately throwing an exception from the worker process, have we validated this through a pipeline run to confirm whether the worker actually crashes or returns a failure code? My main concern is whether all failure scenarios have been thoroughly tested.

Failure scenarios are the ones where we are unable to find any node available, either due to compatibility issues, end of life nodes being disabled. In these, we are throwing not supported exception with relevant message. Currently L0 tests are available. Will be performing E2E tests and actual pipeline run as well for further testing related work.

raujaiswal · 2025-12-10T04:36:40Z

src/Agent.Worker/NodeVersionStrategies/Node20Strategy.cs

+            string systemType = context.IsContainer ? "container" : "agent";
+            context.SelectedNodeVersion = "node16";
+            context.SelectionReason = $"{baseReason}, fallback to Node16 due to Node20 glibc compatibility issue";
+            context.SelectionWarning = StringUtil.Loc("NodeGlibcFallbackWarning", systemType, "Node20", "Node16");


Since we are using Node 20.1, there are references to both Node20 and Node20_1 in various places. can you please check if we can make these references consistent across all locations.

Will take this as a part of next PR.

raujaiswal · 2025-12-10T04:43:40Z

@rishabhmalikMS ,can you please check if we can add additional Kusto telemetry to capture the customer’s Node version? This would allow us to diagnose issues more quickly without relying on log checks, improving investigation speed and accuracy.

rishabhmalikMS · 2025-12-10T04:53:23Z

@rishabhmalikMS ,can you please check if we can add additional Kusto telemetry to capture the customer’s Node version? This would allow us to diagnose issues more quickly without relying on log checks, improving investigation speed and accuracy.

Will check on which information would be relevant and helpful to publish in telemetry. Will take this as part of future pull request. Created a work item (https://mseng.visualstudio.com/AzureDevOps/_workitems/edit/2340945) for the same.

tarunramsinghani · 2025-12-10T08:45:42Z

src/Agent.Worker/NodeVersionStrategies/NodeContext.cs

+        /// <summary>
+        /// The handler data from the task definition.
+        /// </summary>
+        public BaseNodeHandlerData HandlerData { get; set; }


Why is this needed ?

This contains the handler type available to task.

tarunramsinghani · 2025-12-10T08:47:22Z

src/Agent.Worker/NodeVersionStrategies/NodeContext.cs

+        /// <summary>
+        /// Host context for directory lookups.
+        /// </summary>
+        public IHostContext HostContext { get; set; }
+
+        /// <summary>
+        /// Execution context for logging, warnings, and knob reading.
+        /// </summary>
+        public IExecutionContext ExecutionContext { get; set; }


Why do we need this in NodeContext ? The decision for which node to use is based on Container/OS and task.json, so ideally we should not have anything beyond these in this class.

HostContext and ExecutionContext are removed from TaskContext.

tarunramsinghani · 2025-12-10T08:48:19Z

src/Agent.Worker/NodeVersionStrategies/NodeContext.cs

+    /// Context for both host and container node selection.
+    /// Contains runtime data - strategies read their own knobs via ExecutionContext.
+    /// </summary>
+    public sealed class NodeContext


Why is it called Node context? this is created per task if I am not wrong, can it be called Task context in that case.

NodeContext is renamed as TaskContext

dassayantan24 · 2025-12-11T01:58:24Z

src/Agent.Worker/NodeVersionStrategies/NodeRunnerInfo.cs

+        /// Explanation of why this version was selected.
+        /// Used for debugging and telemetry.
+        /// </summary>
+        public string Reason { get; set; }


Can we make this variable name more relevant?

dassayantan24 · 2025-12-11T02:04:53Z

src/Agent.Worker/NodeVersionStrategies/Node6Strategy.cs

+
+    public sealed class Node6Strategy : INodeVersionStrategy
+    {
+        public string Name => "Node6";


In my opinion, we should move these string literals to a constants file. This way, we can reuse them consistently, as I see they are being used here as well as in the logs. It would make tracking and maintenance much easier.

This name field is removed. We are now using strategy.GetType().name instead of this. This was essentially used for logging purposes. No more needed.

dassayantan24 · 2025-12-11T02:15:49Z

src/Agent.Worker/NodeVersionStrategies/Node24Strategy.cs

+            return false;
+        }
+
+        private bool DetermineNodeVersionAndSetContext(NodeContext context, bool eolPolicyEnabled, string baseReason)


Let me know your thoughts on whether we should move this to a common unified location. It is currently declared four times.

DetermineNodeVersionAndSetContext will be required in each strategy as it has custom code for each strategy. This cannot be moved to another place

src/Agent.Worker/NodeVersionStrategies/Node6Strategy.cs

… logic - Passing execution and host context to constructors of strategies - Add GlibcCompatibilityChecker for unified glibc compatibility testing across host and container environments - Introduce GlibcCompatibilityInfo data model to encapsulate Node20/24 glibc compatibility results - Simplified NodeContext (now TaskContext) to provide runtime context (handler data, container info and step target only) to strategies - Support both host glibc testing and container pre-computed glibc flags (NeedsNode20Redirect/NeedsNode16Redirect) - Remove redundant IsContainer field, use Container != null for cleaner logic - Moved GetNodePath to orchestrator as CreateNodeRunnerInfoWithPath

rajmishra1997 · 2025-12-12T10:10:44Z

src/Agent.Worker/NodeVersionStrategies/NodeVersionOrchestrator.cs

+
+            GlibcCompatibilityInfo glibcInfo = GlibcCompatibilityInfo.Compatible;
+
+            if (context.Container == null) 


NodeVersionOrchestrator.cs is selector class, responsible for selecting an appropriate strategy. can the creation of glibcInfo moved to strategy concrete class (or a parent class) to have this class single responsibility

sanjuyadav24 · 2025-12-12T11:07:17Z

src/Agent.Worker/NodeVersionStrategies/GlibcCompatibilityChecker.cs

+            bool node20HasGlibcError = false;
+            bool node24HasGlibcError = false;
+
+            if (!useNode20InUnsupportedSystem)


shouldn't these checks be reversed first validate Node24 then Node20?

sanjuyadav24 · 2025-12-12T11:08:22Z

src/Agent.Worker/NodeVersionStrategies/GlibcCompatibilityChecker.cs

+            {
+                if (_supportsNode20.HasValue)
+                {
+                    node20HasGlibcError = !_supportsNode20.Value;


where is this __supportsNode20 initialized?

sanjuyadav24 · 2025-12-12T11:17:47Z

src/Agent.Worker/NodeVersionStrategies/Node20Strategy.cs

+                return DetermineNodeVersionSelection(context, eolPolicyEnabled, "Selected for Node20 task handler", glibcInfo);
+            }
+
+            if (eolPolicyEnabled)


shouldn't eolpolicy be chedked before hasnode20handler? or it should be first check?

sanjuyadav24 · 2025-12-12T11:19:54Z

src/Agent.Worker/NodeVersionStrategies/NodeVersionOrchestrator.cs

+            _strategies.Add(new Node20Strategy());
+            _strategies.Add(new Node16Strategy());
+            _strategies.Add(new Node10Strategy());
+            _strategies.Add(new Node6Strategy());


in my understanding I think from the priority in the strategy , we moved to this order
shouldn't we add a comment here to make sure new node versions are added in proper priority order?

…run using a feature flag knob value - Added test integration to run Test scenarios on both original and strategy code

Adding Strategies for node 24 to node 6 | Adding Interfaces created f…

68d2621

…or node strategy and response data models for each strategy result

rishabhmalikMS added the enhancement label Dec 8, 2025

Merge branch 'master' into users/rishabhmalikMS/NodehandlerStrategies

0854386

rishabhmalikMS changed the title ~~Adding Strategies for node 24 to node 6 | Adding Interfaces created f…~~ Add unified Node.js version selection strategy framework with EOL policy support Dec 8, 2025

rishabhmalikMS added 6 commits December 8, 2025 16:48

adding EnableEOLNodeVersionPolicy knob

e3b655a

Merge branch 'users/rishabhmalikMS/NodehandlerStrategies' of https://…

32e38b9

…github.com/microsoft/azure-pipelines-agent into users/rishabhmalikMS/NodehandlerStrategies

adding agent knob AGENT_RESTRICT_EOL_NODE_VERSIONS

11d1b64

Minor fix

b5a899b

Adding localized strings in string.json.

a52fbdf

Added NodeVersionNotAvailable string

3e73fe2

Added IUnifiedNodeVersionStrategy in ServiceInterfaceL0 test

baec4a9

microsoft deleted a comment from azure-pipelines bot Dec 9, 2025

Removed NodeVersionNotAvailable string temporarily.

79110c8

rishabhmalikMS marked this pull request as ready for review December 9, 2025 05:02

rishabhmalikMS requested review from a team as code owners December 9, 2025 05:02

tarunramsinghani reviewed Dec 9, 2025

View reviewed changes

src/Agent.Worker/NodeVersionStrategies/IUnifiedNodeVersionStrategy.cs Outdated Show resolved Hide resolved

tarunramsinghani reviewed Dec 9, 2025

View reviewed changes

src/Agent.Worker/NodeVersionStrategies/NodePathResult.cs Outdated Show resolved Hide resolved

tarunramsinghani reviewed Dec 9, 2025

View reviewed changes

rishabhmalikMS added internal and removed enhancement labels Dec 10, 2025

Updating nomenclature

7ae51f0

raujaiswal reviewed Dec 10, 2025

View reviewed changes

rishabhmalikMS mentioned this pull request Dec 10, 2025

Node handler integration with Node Version Orchestrator #5425

Closed

rishabhmalikMS changed the title ~~Add unified Node.js version selection strategy framework with EOL policy support~~ Add Node.js version selection strategy framework with EOL policy support Dec 10, 2025

tarunramsinghani reviewed Dec 10, 2025

View reviewed changes

gidad mentioned this pull request Dec 10, 2025

[BUG]: EOL/Obsolete Software: Node.js 16.x Detected #5214

Open

4 tasks

dassayantan24 reviewed Dec 11, 2025

View reviewed changes

rajmishra1997 reviewed Dec 11, 2025

View reviewed changes

src/Agent.Worker/NodeVersionStrategies/Node6Strategy.cs Outdated Show resolved Hide resolved

rajmishra1997 reviewed Dec 12, 2025

View reviewed changes

sanjuyadav24 reviewed Dec 12, 2025

View reviewed changes

_ Added integration for node strategy orchestrator in nodehandler to …

90b4c7a

…run using a feature flag knob value - Added test integration to run Test scenarios on both original and strategy code


		GlibcCompatibilityInfo glibcInfo = GlibcCompatibilityInfo.Compatible;

		if (context.Container == null)

Add Node.js version selection strategy framework with EOL policy support #5422

Are you sure you want to change the base?

Add Node.js version selection strategy framework with EOL policy support #5422

Conversation

rishabhmalikMS commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Description

Risk Assessment (Low / Medium / High)

Unit Tests Added or Updated (Yes / No)

Additional Testing Performed

Change Behind Feature Flag (Yes / No)

Tech Design / Approach

Documentation Changes Required (Yes/No)

Logging Added/Updated (Yes/No)

Telemetry Added/Updated (Yes/No)

Rollback Scenario and Process (Yes/No)

Dependency Impact Assessed and Regression Tested (Yes/No)

Uh oh!

azure-pipelines bot commented Dec 9, 2025

Uh oh!

rishabhmalikMS commented Dec 9, 2025

Uh oh!

azure-pipelines bot commented Dec 9, 2025

Uh oh!

rishabhmalikMS commented Dec 9, 2025

Uh oh!

azure-pipelines bot commented Dec 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raujaiswal Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raujaiswal commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rishabhmalikMS commented Dec 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rishabhmalikMS commented Dec 8, 2025 •

edited

Loading

raujaiswal Dec 10, 2025 •

edited

Loading

raujaiswal commented Dec 10, 2025 •

edited

Loading