Skip to content

Conversation

@jtnord
Copy link
Member

@jtnord jtnord commented Jan 5, 2026

diagnosis for #995 (comment) / jenkins-infra/helpdesk#4939

Testing done

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

@jtnord
Copy link
Member Author

jtnord commented Jan 5, 2026

17:41:58  Provider: SecureRandom.DRBG algorithm from: SUN
17:41:58  provider: Failed to use operating system seed generator: java.io.IOException: Required native CryptoAPI features not  available on this machine
17:41:58  provider: Using default threaded seed generator

which I do not see locally.
https://stackoverflow.com/questions/49322948/slow-securerandom-initialization/49322949#49322949 and implies this was resolved. but I guess there is something about our container infrastructure that is broken? not exactly sure what OS/Hosts we run for windows @timja

@jtnord
Copy link
Member Author

jtnord commented Jan 5, 2026

https://github.com/openjdk/jdk21u-dev/blob/bad21fbe258402e7697279fdbdf7d67e02d20c03/src/java.base/windows/native/libjava/WinCAPISeedGenerator.c#L49C13-L52 alas this code does not call GetLastError to find out why it fails... 😮‍💨

Could be some missing libraries in the container, could be permissions could be something entirely different...

@timja
Copy link
Member

timja commented Jan 5, 2026

Cc @dduportal / @lemeurherve otherwise I’ll check later on.

@jtnord
Copy link
Member Author

jtnord commented Jan 5, 2026

I would guess it is either NTE_BAD_KEY_STATE (given there is some potential funkyness around passwords in containers, or NTE_PROV_TYPE_NO_MATCH because we are using a slim container (?) without the required support (the JDK requests PROV_RSA_FULL as the type).

@timja
Copy link
Member

timja commented Jan 5, 2026

I don’t think it’s in a container but will check when back to my laptop

@timja
Copy link
Member

timja commented Jan 5, 2026

These are the labels on the VM:

image

Its a Windows 2019 VM on AWS

@lemeurherve
Copy link
Member

Its a Windows 2019 VM on AWS

And here is how it's provisioned: https://github.com/jenkins-infra/packer-images/blob/main/provisioning/windows-provision.ps1

@timja
Copy link
Member

timja commented Jan 5, 2026

@lemeurherve
Copy link
Member

lemeurherve commented Jan 5, 2026

@timja
Copy link
Member

timja commented Jan 5, 2026

I logged into a running windows VM and ran:

pwsh

cd C:\tools\jdk-17\bin
$url = 'https://gist.githubusercontent.com/timja/18b75ae57ecc2d517a8fce0811c98bdd/raw/83c73c7a59a8a3ef3b094166cdb9b35bf51ce7cb/gistfile1.txt'
Invoke-WebRequest $url -Outfile Main.java
.\javac Main.java
.\java '-Djava.security.debug="provider"' Main
provider: Failed to use operating system seed generator: java.io.IOException: Required native CryptoAPI features not  available on this machine
provider: Using default threaded seed generator
Provider: MessageDigest.SHA algorithm from: SUN
Provider: MessageDigest.SHA algori

I tried it on Java 25 as well with the same result.

@timja
Copy link
Member

timja commented Jan 5, 2026

I think best to just write a little C program and run it on the VM.

@timja
Copy link
Member

timja commented Jan 6, 2026

Going to try this:
debug-secure-random-2.zip

@timja
Copy link
Member

timja commented Jan 6, 2026

This code (from mslearn on how to use the API):

#include <Windows.h>
#include <wincrypt.h>
#include <stdio.h>

int main() {
	printf("Debug Secure Random 2\n");

    //-------------------------------------------------------------------
    // Declare and initialize variables.

    HCRYPTPROV hCryptProv = NULL;        // handle for a cryptographic
    // provider context
    LPCSTR UserName = "J2SETest";  // name of the key container
    // to be used
//-------------------------------------------------------------------
// Attempt to acquire a context and a key
// container. The context will use the default CSP
// for the RSA_FULL provider type. DwFlags is set to zero
// to attempt to open an existing key container.

    if (CryptAcquireContext(
        &hCryptProv,               // handle to the CSP
        UserName,                  // container name 
        NULL,                      // use the default provider
        PROV_RSA_FULL,             // provider type
        0))                        // flag values
    {
        printf("A cryptographic context with the %s key container \n",
            UserName);
        printf("has been acquired.\n\n");
    }
    else
    {
        //-------------------------------------------------------------------
        // An error occurred in acquiring the context. This could mean
        // that the key container requested does not exist. In this case,
        // the function can be called again to attempt to create a new key 
        // container. Error codes are defined in Winerror.h.
        DWORD errCode = GetLastError();
        char errMsg[512];
        FormatMessageA(
            FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
            NULL,
            errCode,
            0,
            errMsg,
            sizeof(errMsg),
            NULL);
        if (errCode == NTE_BAD_KEYSET)
        {
            if (CryptAcquireContext(
                &hCryptProv,
                UserName,
                NULL,
                PROV_RSA_FULL,
                CRYPT_NEWKEYSET))
            {
                printf("A new key container has been created.\n");
            }
            else
            {
                DWORD createErrCode = GetLastError();
                char createErrMsg[512];
                FormatMessageA(
                    FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
                    NULL,
                    createErrCode,
                    0,
                    createErrMsg,
                    sizeof(createErrMsg),
                    NULL);
                printf("Could not create a new key container. Error %lu: %s\n", createErrCode, createErrMsg);
                fflush(stdout);
                exit(1);
            }
        }
        else
        {
            printf("A cryptographic service handle could not be acquired. Error %lu: %s\n", errCode, errMsg);
            fflush(stdout);
            exit(1);
        }

    } // End of else.
    //-------------------------------------------------------------------
    // A cryptographic context and a key container are available. Perform
    // any functions that require a cryptographic provider handle.

    //-------------------------------------------------------------------
    // When the handle is no longer needed, it must be released.

    if (CryptReleaseContext(hCryptProv, 0))
    {
        printf("The handle has been released.\n");
    }
    else
    {
        printf("The handle could not be released.\n");
    }
}

Works on my windows machine and fails on the VM with:
"Could not create a new key container"

@timja
Copy link
Member

timja commented Jan 6, 2026

Another iteration:
debug-secure-random-2.zip

@jtnord
Copy link
Member Author

jtnord commented Jan 6, 2026

The error handling there is questionable (no better then the JDK!l
The failure loop should call GetLastError and output the result.
On phone so cannot provide the code yet.

@timja
Copy link
Member

timja commented Jan 6, 2026

debug-secure-random-2.zip

I've copiloted this:

                DWORD errCode = GetLastError();
                char errMsg[512];
                FormatMessageA(
                    FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
                    NULL,
                    errCode,
                    0,
                    errMsg,
                    sizeof(errMsg),
                    NULL);
                printf("Could not create a new key container. Error %lu: %s\n", errCode, errMsg);
                exit(1);

trying now

@timja
Copy link
Member

timja commented Jan 6, 2026

Debug Secure Random 2
Could not create a new key container. Error 5: Access is denied.

@timja
Copy link
Member

timja commented Jan 6, 2026

The user is running as an admin:

PS Z:\jenkins\debug> $user = [Security.Principal.WindowsIdentity]::GetCurrent();
PS Z:\jenkins\debug> (New-Object Security.Principal.WindowsPrincipal $user).IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator)
True

@timja
Copy link
Member

timja commented Jan 6, 2026

PS Z:\jenkins\debug> icacls "$env:APPDATA\Microsoft\Crypto\RSA"
Z:\jenkins\AppData\Roaming\Microsoft\Crypto\RSA NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
                                                BUILTIN\Administrators:(I)(OI)(CI)(F)
                                                EC2AMAZ-44RG0FU\jenkins:(I)(OI)(CI)(F)

Successfully processed 1 files; Failed processing 0 files

@timja
Copy link
Member

timja commented Jan 6, 2026

With a different key container name:
debug-secure-random-2.zip

@timja
Copy link
Member

timja commented Jan 6, 2026

With more error handling:
debug-secure-random-2.zip

@timja
Copy link
Member

timja commented Jan 6, 2026

Hmm weird I thought I'd try on windows-2022 but my exe doesn't even run there =/

no error message or anything but nothing is outputted

@timja
Copy link
Member

timja commented Jan 6, 2026

@jtnord
Copy link
Member Author

jtnord commented Jan 6, 2026

Hmm weird I thought I'd try on windows-2022 but my exe doesn't even run there =/

no error message or anything but nothing is outputted

missing the specific vc runtime?

@jtnord
Copy link
Member Author

jtnord commented Jan 6, 2026

Are you logged in via RDP to the console (for servers historically you needed to use the /console on remote desktop to get the console session as opposed a new session. Not sure that is still needed), or via SSH or winRM? WinRM is by its nature restricted, and I think SSH has some restrictions.

@timja
Copy link
Member

timja commented Jan 6, 2026

Are you logged in via RDP to the console (for servers historically you needed to use the /console on remote desktop to get the console session as opposed a new session. Not sure that is still needed), or via SSH or winRM? WinRM is by its nature restricted, and I think SSH has some restrictions.

I'm logged in over SSH (which is the same as Jenkins is doing)


missing the specific vc runtime?

Probably

@jtnord
Copy link
Member Author

jtnord commented Jan 6, 2026

missing the specific vc runtime?

Probably

image

@timja
Copy link
Member

timja commented Jan 6, 2026

New version should be statically linked:
debug-secure-random-2.zip

@dduportal
Copy link

@dduportal
Copy link

Yes my ssh was always using password. @timja was also always using a password IIRC?

So you did not use the EC2 plugin to spin up your machine, is that correct?

@timja
Copy link
Member

timja commented Jan 9, 2026

So you did not use the EC2 plugin to spin up your machine, is that correct?

On AWS I used the EC2 plugin.

On Azure I used the portal

@dduportal
Copy link

So you did not use the EC2 plugin to spin up your machine, is that correct?

On AWS I used the EC2 plugin.

On Azure I used the portal

Sorry it was aimed at Jame, my bad

@dduportal
Copy link

dduportal commented Jan 9, 2026

After a pairing session with James:

  • It looks like the (crypt) problem happens if the first login is done through SSH with public key
  • If you login with a password first time: crypto works. If you set up the key after first login and log in with the key: it still works
  • If you login with key on first time; crypto does not work for the user. If you log in again with password: it works again

Other considerations:

  • Delegating user profile to the Z: drive does not change the described behavior (so external disk is not a problem for cryto)
  • (maybe not: see debug SecureRandom #999 (comment)) We cannot use Docker if the jenkins user is not an admin.
  • (Wrong assertion: admin membership is unrelated, it's only the SSH auth. method - see debug SecureRandom #999 (comment)) We cannot use cryptoApi with Administrator or with a user part of the admin group.
    • Do we really need Docker (Windows containers) for the ci.jenkins.io plugins or Java builds? Or is it only required for building the Windows Docker images?
  • We'll proceed to improve the current cloud init setup of our Windows agents by setting up the local user with a UserProfile moved to Z: (tested with success). At least it might improve behavior for user performances and Docker "weird" behaviors

Code we used:

# Set up Windows default user profile location to Z: drive
$userpath = 'Z:\Users'
$regpath = "HKLM:\Software\Microsoft\Windows NT\CurrentVersion\ProfileList"
$regname = "ProfilesDirectory"
set-itemproperty -path $regpath -name $regname -value $userpath

# Create user and init its profile
$pw = ConvertTo-SecureString -String 'wibble1234WIBBLE@@@' -AsPlainText -Force
$username="jenkins8"
New-LocalUser -Name $username -Password $pw
Add-LocalGroupMember -Group "openssh users" -Member $username
# Add-LocalGroupMember -Group "Administrators" -Member $username
$cred = New-Object System.Management.Automation.PSCredential ($username,$pw)
# Ensure user profile is created
Start-Process -FilePath cmd.exe -ArgumentList "/c","echo initialized" -Credential $cred -LoadUserProfile -Wait

# Setup SSH key for user in its profile
$userSSHDir = "$userpath\$username\.ssh"
New-Item -ItemType Directory -Path $userSSHDir -Force | Out-Null

$authorizedKeysFile = "$userSSHDir\authorized_keys"
$keyUrl = 'https://raw.githubusercontent.com/jenkins-infra/aws/main/ec2_agents_authorized_keys'
Write-Host "Downloading SSH key from $keyUrl to $authorizedKeysFile"
Invoke-WebRequest $keyUrl -OutFile $authorizedKeysFile

# Test crypto api
$url = 'https://github.com/user-attachments/files/24455803/debug-secure-random-2.zip'
Invoke-WebRequest $url -Outfile debug-secure-random-2.zip
Expand-Archive debug-secure-random-2.zip
cd ./debug-secure-random-2/
./debug-secure-random-2.exe

@jtnord
Copy link
Member Author

jtnord commented Jan 9, 2026

asking about this on openssh windows port PowerShell/Win32-OpenSSH#2420

@timja
Copy link
Member

timja commented Jan 9, 2026

We cannot use crypto is the jenkins user is admin (or part of the admin group).

Could you clarify this? If you login with RDP then it is possible to use crypto from key based auth on an admin account.
But do you mean that the login with username:password and then key based fails if admin

Do we really need Docker (Windows containers) for the ci.jenkins.io plugins or Java builds? Or is it only required for building the Windows Docker images?

Um I'm not sure, I feel at some point someone in one of the docker plugins was trying to get windows containers working I'm not sure if any plugins actually rely on it though.

@jtnord
Copy link
Member Author

jtnord commented Jan 9, 2026

We cannot use Docker if the jenkins user is not an admin

tested this locally and verified it works (server 2022)

  • in C:\ProgramData\Docker\config\daemon.json add { "group": "docker-users" } (will need the right json syntax!)
  • create a local group called docker-users and add the non admin jenkins user to it
  • restart the docker service (docker-engine).

my daemon.json after editing looks like

{
    "hosts":  [
                  "npipe://"
              ],
    "group": "docker-users"
}

@jtnord
Copy link
Member Author

jtnord commented Jan 9, 2026

We cannot use Docker if the jenkins user is not an admin

tested this locally and verified it works (server 2022)

  • in C:\ProgramData\Docker\config\daemon.json add { "group": "docker-users" } (will need the right json syntax!)
  • create a local group called docker-users and add the non admin jenkins user to it
  • restart the docker service (docker-engine).

my daemon.json after editing looks like

{
    "hosts":  [
                  "npipe://"
              ],
    "group": "docker-users"
}

(you will possibly need to give the user the ability to create symlinks for git for any jenkins plugins whose source uses symlinks (I believe its already enabled in the git config?)

@dduportal
Copy link

dduportal commented Jan 9, 2026

We cannot use crypto is the jenkins user is admin (or part of the admin group).

Could you clarify this? If you login with RDP then it is possible to use crypto from key based auth on an admin account. But do you mean that the login with username:password and then key based fails if admin

Do we really need Docker (Windows containers) for the ci.jenkins.io plugins or Java builds? Or is it only required for building the Windows Docker images?

Um I'm not sure, I feel at some point someone in one of the docker plugins was trying to get windows containers working I'm not sure if any plugins actually rely on it though.

Good point, I've updated the comment and I'm trying to verify the following scenario: a user jenkins can be used to use the crypto, with SSH password login and admin group.

(edit) Thanks for the external view. I confirm that the user can be in the Administrator group without any issue. It's only the SSH authentication technique which makes the crypto fail or not.

PS C:\Users\jenkins\debug-secure-random-2> ./debug-secure-random-2.exe
Debug Secure Random 2
A new key container has been created.
The handle has been released.
PS C:\Users\jenkins\debug-secure-random-2>
PS C:\Users\jenkins\debug-secure-random-2> docker info
Client:
 Version:    28.5.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.1
    Path:     C:\ProgramData\Docker\cli-plugins\docker-buildx.exe

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 28.5.2
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: inactive
 Default Isolation: process
 Kernel Version: 10.0 26100 (26100.1.amd64fre.ge_release.240331-1435)
 Operating System: Microsoft Windows Server Version 24H2 (OS Build 26100.7171)
 OSType: windows
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.71GiB
 Name: EC2AMAZ-AT061VF
 ID: b3e30832-04a7-4a3f-b600-664128bf5f9b
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

@jtnord
Copy link
Member Author

jtnord commented Jan 9, 2026

Yes my ssh was always using password. @timja was also always using a password IIRC?

So you did not use the EC2 plugin to spin up your machine, is that correct?

I was doing all my testing on a pure EC2 VM created via the AWS console with the init script in user data

@jtnord
Copy link
Member Author

jtnord commented Jan 13, 2026

so it may be we are not specifying the user correctly. As the machine is a domain controller the accounts need to be fully qualified (otherwise some things may not work).
e.g. ssh -l user@domain host (note I have no idea what this domain would be called)
New-LocalUser appears to still create domain accounts!

@dduportal
Copy link

so it may be we are not specifying the user correctly. As the machine is a domain controller the accounts need to be fully qualified (otherwise some things may not work). e.g. ssh -l user@domain host (note I have no idea what this domain would be called) New-LocalUser appears to still create domain accounts!

Oh oh, interesting one, thanks! I'm trying manually once to verify if it works

@timja
Copy link
Member

timja commented Jan 13, 2026

More info on PowerShell/Win32-OpenSSH#2420 @dduportal

@dduportal
Copy link

More info on PowerShell/Win32-OpenSSH#2420 @dduportal

Yup, I was checking PowerShell/Win32-OpenSSH#1597 (comment) :)

@dduportal
Copy link

so it may be we are not specifying the user correctly. As the machine is a domain controller the accounts need to be fully qualified (otherwise some things may not work). e.g. ssh -l user@domain host (note I have no idea what this domain would be called) New-LocalUser appears to still create domain accounts!

At first try, it did not work:

As admin:

# Set up Windows default user profile location to Z: drive
$userpath = 'Z:\Users'
$regpath = "HKLM:\Software\Microsoft\Windows NT\CurrentVersion\ProfileList"
$regname = "ProfilesDirectory"
set-itemproperty -path $regpath -name $regname -value $userpath

# Create user and init its profile
$pw = ConvertTo-SecureString -String 'wibble1234WIBBLE@@@' -AsPlainText -Force
$username="jenkins"
New-LocalUser -Name $username -Password $pw
Add-LocalGroupMember -Group "openssh users" -Member $username
# Add-LocalGroupMember -Group "Administrators" -Member $username
$cred = New-Object System.Management.Automation.PSCredential ($username,$pw)
# Ensure user profile is created
Start-Process -FilePath cmd.exe -ArgumentList "/c","echo initialized" -Credential $cred -LoadUserProfile -Wait

# Setup SSH key for user in its profile
$userSSHDir = "$userpath\$username\.ssh"
New-Item -ItemType Directory -Path $userSSHDir -Force | Out-Null

$authorizedKeysFile = "$userSSHDir\authorized_keys"
$keyUrl = 'https://raw.githubusercontent.com/jenkins-infra/aws/main/ec2_agents_authorized_keys'
Write-Host "Downloading SSH key from $keyUrl to $authorizedKeysFile"
Invoke-WebRequest $keyUrl -OutFile $authorizedKeysFile

Then retrieved the domain with [System.Security.Principal.WindowsIdentity]::GetCurrent().Name (or echo $env:COMPUTERNAME in powershell) and logged in (with success):

ssh -l 'EC2AMAZ-941RC2B/jenkins' 10.0.202.26
  • With the public key auth, still fails to run the "exe" (Could not create a new key container. Error 5: Access is denied.)
  • With password key auth: it works again
  • Then subsequent public key auth are now working

@jtnord
Copy link
Member Author

jtnord commented Jan 13, 2026

hen retrieved the domain with [System.Security.Principal.WindowsIdentity]::GetCurrent().Name (or echo $env:COMPUTERNAME in powershell)

are you sure that gets the domain? the domain and the computername are distinct (a domain can have multiple controllers and having a domain be the same name as the compute may well break something - so it would surprise me if this was the case)

@jtnord
Copy link
Member Author

jtnord commented Jan 13, 2026

hen retrieved the domain with [System.Security.Principal.WindowsIdentity]::GetCurrent().Name (or echo $env:COMPUTERNAME in powershell)

are you sure that gets the domain? the domain and the computername are distinct (a domain can have multiple controllers and having a domain be the same name as the compute may well break something - so it would surprise me if this was the case)

Well, launched an EC2 instance and it is not a domain server of 1 but a workgroup so that would be right.. 🤷
and

As the machine is a domain controller
is incorrect for our setup.

@dduportal
Copy link

The commands described in https://learn.microsoft.com/en-us/powershell/module/activedirectory/get-addomain?view=windowsserver2025-ps#example-2-get-domain-information-of-the-current-local-computer-domain are failing alas:

Get-ADDomain : The term 'Get-ADDomain' is not recognized as the name of a cmdlet, function, script 
file, or operable program. Check the spelling of the name, or if a path was included, verify that the 
path is correct and try again.

The command does not seem present OOB.

@dduportal
Copy link

  • Trying with WORKGROUP as domain (both UPN and Netlogon syntaxes) does not work. SSH refuses the auth. on both password and public key
  • However, both syntax works when using the value from $env:COMPUTERNAME

I'm now playing around with using the VM hostname instead of its IPv4

@dduportal
Copy link

  • Trying with WORKGROUP as domain (both UPN and Netlogon syntaxes) does not work. SSH refuses the auth. on both password and public key

    • However, both syntax works when using the value from $env:COMPUTERNAME

I'm now playing around with using the VM hostname instead of its IPv4

Using the hostname does not work either: same behavior. I'm going back to the password init in noninteractive

@dduportal
Copy link

AH I might have found a technique with the Posh-SSH module. Let me retry with cloudinit

@dduportal
Copy link

YEEEPEEKAY

21:55:20  Running on [EC2 (aws-us-east-2) - Windows Infra Test (i-01e25c18c0cd94c7d)](https://ci.jenkins.io/computer/EC2%20%28aws%2Dus%2Deast%2D2%29%20%2D%20Windows%20Infra%20Test%20%28i%2D01e25c18c0cd94c7d%29/) in Z:/agent/workspace/ra_acceptance-tests_infra-checks
21:55:20  [Pipeline] {
21:55:20  [Pipeline] powershell
21:55:33  Debug Secure Random 2
21:55:36  A cryptographic context with the J2SETest key container 
21:55:36  has been acquired.
21:55:36  
21:55:36  The handle has been released.

https://ci.jenkins.io/job/Infra/job/acceptance-tests/job/infra-checks/736/console

# Requires using YAML for the Windows "Cloud Init" stuff. Multipart upload of a powershell script does not work.
version: 1.1
tasks:
- task: executeScript
  inputs:
  - frequency: always
    type: powershell
    runAs: localSystem
    content: |-
      ## Set up permissions context (as you are Administrator here)
      Set-ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore
      # Don't set this before Set-ExecutionPolicy as it throws an error
      $ErrorActionPreference = "stop"

      ## Setup NVMe(s) and map it to the Z: drive
      $nb = Get-Disk | Where-Object PartitionStyle -eq 'RAW' | tee -Variable Disks | measure
      Write-Output "$nb.Count disk found."
      Switch ($nb.Count)
      {
        0 {Write-Output "No RAW disk found."}
        1 {
            Write-Output "1 disk found."
            $Disks | Initialize-Disk -PartitionStyle MBR
            $Disks | New-Partition -UseMaximumSize -MbrType IFS
            $Partition = Get-Partition -DiskNumber $Disks.Number
            $Partition | Format-Volume -FileSystem NTFS -Confirm:$false
            $Partition | Add-PartitionAccessPath -AccessPath "Z:"
            Get-WmiObject Win32_Volume | Format-Table Name, Label, FreeSpace, Capacity
        }
        default {
            Write-Output "$nb.Count disks found."
            $Disks | ForEach-Object -Begin {Get-Date} -Process {
                    Initialize-Disk -PartitionStyle MBR -PassThru -DiskNumber $_.Number
                    New-Partition -UseMaximumSize -MbrType IFS
                    $Partition = Get-Partition -DiskNumber $_.Number
                    $Partition | Format-Volume -FileSystem NTFS -Confirm:$false
                    $Partition | Add-PartitionAccessPath -AccessPath "Z:"
                } -End {Get-Date}
            Get-WmiObject Win32_Volume | Format-Table Name, Label, FreeSpace, Capacity
        }
      }

      # Set up Windows default user profile location to Z: drive
      $userpath = 'Z:\Users'
      $regpath = "HKLM:\Software\Microsoft\Windows NT\CurrentVersion\ProfileList"
      $regname = "ProfilesDirectory"
      $username="jenkins"
      $userHome = "$userpath\$username"
      $userSSHDir = "$userHome\.ssh"
      $authorizedKeysFile = "$userSSHDir\authorized_keys"


      set-itemproperty -path $regpath -name $regname -value $userpath
      Write-Output "Set up default user profiles to $userpath"

      # Create user and init its profile
      $pw = ConvertTo-SecureString -String '<redacted>' -AsPlainText -Force
      New-LocalUser -Name $username -Password $pw
      Add-LocalGroupMember -Group "openssh users" -Member $username
      Write-Output "Created the user $username"
      Install-Module -Name Posh-SSH -Force
      Write-Output "Installed PoshSSH module"
      $cred = New-Object System.Management.Automation.PSCredential ($username,$pw)
      $SessionID = New-SSHSession -ComputerName "localhost" -AcceptKey -Credential $cred
      Write-Output "Got SSH Session"
      Invoke-SSHCommand -Index $SessionID.Sessionid -Command 'whoami'

      $cryptoDebugUrl = 'https://github.com/user-attachments/files/24455803/debug-secure-random-2.zip'
      Invoke-SSHCommand -Index $SessionID.Sessionid -Command "powershell Invoke-WebRequest $cryptoDebugUrl -OutFile debug-secure-random-2.zip"
      Write-Output "Downloaded $cryptoDebugUrl in debug-secure-random-2.zip"
      Invoke-SSHCommand -Index $SessionID.Sessionid -Command "powershell Expand-Archive debug-secure-random-2.zip"
      Write-Output "Expanded debug-secure-random-2.zip"
      Invoke-SSHCommand -Index $SessionID.Sessionid -Command "powershell .\debug-secure-random-2\debug-secure-random-2.exe"
      Write-Output "Ran .\debug-secure-random-2\debug-secure-random-2.exe"

      # Setup SSH key for user in its profile once the initial SSH password has been performed to init CryptoAPI
      
      $keyUrl = 'https://raw.githubusercontent.com/jenkins-infra/aws/main/ec2_agents_authorized_keys'
      

      Write-Output "Starting setting up SSH key"
      Invoke-SSHCommand -Index $SessionID.Sessionid -Command "powershell New-Item -ItemType Directory -Path $userSSHDir -Force"
      Invoke-SSHCommand -Index $SessionID.Sessionid -Command "powershell Invoke-WebRequest $keyUrl -OutFile $authorizedKeysFile"
      Write-Output "Finished setting up SSH key"

      ## Setup datadog
      (Get-Content C:\ProgramData\Datadog\datadog.yaml -Raw) -Replace 'api_key:', 'api_key: <redacted>' | Set-Content C:\ProgramData\Datadog\datadog.yaml
      & "$env:ProgramFiles\Datadog\Datadog Agent\bin\agent.exe" restart-service

      ## Disable WinRM
      Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
      cmd.exe /c net stop winrm

      ## Mark cloud init as finished using a marker file
      New-Item -Path "Z:/Temp" -ItemType "Directory"
      New-Item -Path "Z:/Temp/.cloud-init.done" -ItemType "File" -Value "Cloud Init"

@timja
Copy link
Member

timja commented Jan 14, 2026

Nice!

Is Posh-SSH needed? Isn't the fix to do an SSH to localhost when the user is created using password auth first and then key based auth?

The server should have SSH pre-installed on it?

I guess it makes it easier to e.g. work with a password and use powershell objects

@dduportal
Copy link

The server should have SSH pre-installed on it?

Is Posh-SSH needed?

Could be. I focused on validating the "patch" (e.g. verifying that the CryptoAPI can be used in the pipeline by performing a first SSH login with password in non interactive) and I wanted to rule out the "how to pass password to ssh client".

I guess it makes it easier to e.g. work with a password and use powershell objects

Exactly, that's why I started with it. ssh client does not support reading password from stdin.
I can try with the SSH_ASKPASS environment variable (which I usually do on Unix); but only now that we are sure it is worth it ;)).
If it does not work, I'll propose a PR to have the module pre-installed in our AMI to gain some time during init (as it takes ~4 min end to end before agents is ready).

Isn't the fix to do an SSH to localhost when the user is created using password auth first and then key based auth?

Absolutely. But the sequence of task is important:

  • The privileged operations can only be run by the "cloud init" ("EC2Launchv2"), e.g. during the boot phase. The "init script" of the ec2 plugin is expected to be run by the (almost) non privileged user jenkins
  • However, since the ec2 loops on its SSH connections, it means the jenkins user SSH session is opened "asap": the cloud-init might not be (and is usually not) finished when the connection happens. To fix this, i've moved the configuration of the authorized_key as the last step (after we are sure that user is properly setup).
  • We have not baked the user jenkins in the AMIs: it used to provide faster agent startup (less than 1 min) but it prevented us to have anything in the Z: (NVMe) drive: Jenkins ec2 plugin had already connected the jenkins user (prebaked) before cloud-init even finished to format and mount Z:.

@jtnord
Copy link
Member Author

jtnord commented Jan 14, 2026

So I'm not clear if the fix is just "ssh with user:pass" or if it you need to "run something that sets up the crypt API that has a user/PW token"
(Or if it's a combination of both)

If it's the latter then start-process will probably work to just run out debug tool and not need poshssh

@dduportal
Copy link

So I'm not clear if the fix is just "ssh with user:pass" or if it you need to "run something that sets up the crypt API that has a user/PW token" (Or if it's a combination of both)

If it's the latter then start-process will probably work to just run out debug tool and not need poshssh

I initially thought we ran this scenario (setup CryptoAPI with the Start-Process), but not sure. Retrying to be sure

@dduportal
Copy link

dduportal commented Jan 14, 2026

So I'm not clear if the fix is just "ssh with user:pass" or if it you need to "run something that sets up the crypt API that has a user/PW token" (Or if it's a combination of both)

If it's the latter then start-process will probably work to just run out debug tool and not need poshssh

Start-Process is really an awful function. Can't find how to show its stdout / stderr
/me is tired: https://learn.microsoft.com/fr-fr/powershell/module/microsoft.powershell.management/start-process?view=powershell-7.5#-redirectstandardoutput

@jtnord
Copy link
Member Author

jtnord commented Jan 14, 2026

So I'm not clear if the fix is just "ssh with user:pass" or if it you need to "run something that sets up the crypt API that has a user/PW token" (Or if it's a combination of both)
If it's the latter then start-process will probably work to just run out debug tool and not need poshssh

Start-Process is really an awful function. Can't find how to show its stdout / stderr :'(

you don't reallay need to. just check the manually with ssh -i ... in the resulting VM?

@dduportal
Copy link

So I'm not clear if the fix is just "ssh with user:pass" or if it you need to "run something that sets up the crypt API that has a user/PW token" (Or if it's a combination of both)
If it's the latter then start-process will probably work to just run out debug tool and not need poshssh

Start-Process is really an awful function. Can't find how to show its stdout / stderr :'(

you don't reallay need to. just check the manually with ssh -i ... in the resulting VM?

Actually I do: it does not seem to do anything at all. I'm not sure what i'm missing in the call.

Start-Process -FilePath "cmd.exe" -ArgumentList "/c pwd" -Credential $cred -LoadUserProfile -Wait -RedirectStandardOutput 'Z:\output.log' -RedirectStandardError 'Z:\err.log'

generates the log files but they are empty (both of them). I can't even get an exit code. What is wrong with my instruction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants