Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Diagnose Windows cert creation failures #3237

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

paulmedynski
Copy link
Contributor

@paulmedynski paulmedynski commented Mar 24, 2025

Diagnostic changes for #3223. We're now emitting the PowerShell script content used to generate a self-signed cert and add it to the Windows system cert store. This may help figure out why the cert operations fail intermittently.

Created this draft PR to run the CI pipelines where the failures sometimes occur and acquire the PowerShell content for further investigation.

@edwardneal
Copy link
Contributor

I think I introduced this in #3034, which used PowerShell to generate certificates more frequently. Apologies if so!

The only difference in the PS invocation which I can see between my version and the preceding one is that we no longer load the user profile. Does setting ProcessStartInfo.LoadUserProfile to true fix this?

@paulmedynski
Copy link
Contributor Author

I think I introduced this in #3034, which used PowerShell to generate certificates more frequently. Apologies if so!

The only difference in the PS invocation which I can see between my version and the preceding one is that we no longer load the user profile. Does setting ProcessStartInfo.LoadUserProfile to true fix this?

LoadUserProfile is already being set to true, so likely not the issue. I've run some manual tests in a Windows Server container, and the PS script works fine. Since the failures are intermittent, I suspect a flaky test environment.

We're having some ADO CI pipeline issues that are preventing me from running CI reliably right now, so I will circle back once the overall CI stuff settles down.

Copy link

codecov bot commented Mar 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.82%. Comparing base (1247ca4) to head (1336f73).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3237      +/-   ##
==========================================
+ Coverage   72.69%   72.82%   +0.12%     
==========================================
  Files         303      298       -5     
  Lines       59718    59612     -106     
==========================================
- Hits        43414    43412       -2     
+ Misses      16304    16200     -104     
Flag Coverage Δ
addons 92.58% <ø> (ø)
netcore 75.27% <ø> (+0.17%) ⬆️
netfx 71.41% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@paulmedynski paulmedynski force-pushed the dev/paul/issue-3223-pwsh-test-failure branch from 11d4009 to b1bb29f Compare March 25, 2025 10:57
- Added powershell script content to test error output for disgnostics.
- Replaced #if NET9_0 with #if NET9_0_OR_GREATER to future-proof our directives.
- Removed unnecessary NETFRAMEWORK symbol.
- Updated download URL for .NET x86 installer for tests pipelines.
@paulmedynski paulmedynski force-pushed the dev/paul/issue-3223-pwsh-test-failure branch from b65989c to 5a29f5b Compare March 31, 2025 16:34
- Added some retries to the PowerShell script execution to see if that helps avoid overall test failures.
@paulmedynski
Copy link
Contributor Author

The latest commit adds a retry within the CertificateFixtureBase.CreateCertificate that seems to work around the ACCESS DENIED issue. We now see this in the test logs:

2025-03-31T17:22:11.0473456Z   PowerShell command failed with exit code 1 on attempt 1 of 3; retrying in 5 seconds...
...
2025-03-31T17:22:19.7322237Z     Passed Microsoft.Data.SqlClient.Tests.AlwaysEncryptedTests.ExceptionsAlgorithmErrors.TestInvalidCipherText [44 ms]

There are no warnings or errors for the CI runs.

@paulmedynski paulmedynski marked this pull request as ready for review April 1, 2025 11:05
@paulmedynski paulmedynski requested review from Copilot and a team April 1, 2025 11:06
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates various preprocessor directives from NET9_0 to NET9_0_OR_GREATER and introduces enhanced diagnostics for Windows certificate creation failures via retry logic and additional output details. Key changes include updating build conditions in multiple certificate and SQL components, adding retry logic with delay and logging for PowerShell command execution in certificate creation, and updating a pipeline URL to a newer dotnet-install.ps1 script location.

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ColumnEncryptionCertificateFixture.cs Updated preprocessor condition to reflect newer target frameworks
CertificateFixtureBase.cs Replaced NET9_0 with NET9_0_OR_GREATER and added retry logic with delay and diagnostic output for certificate generation
CertificateTestWithTdsServer.cs Updated preprocessor condition for certificate loading
CertificateUtility.cs Updated preprocessor condition for certificate loading and RSA decryption/signature verification
SqlDbTypeExtensions.cs Updated preprocessor condition for exposing the Json SqlDbType constant
VirtualSecureModeEnclaveProvider.cs Updated preprocessor condition for loading a certificate from binary payload
SNICommon.cs Updated preprocessor condition for validating SSL server certificates
ci-run-tests-job.yml Updated the URL to the latest dotnet-install.ps1 script
Files not reviewed (1)
  • src/Microsoft.Data.SqlClient/netfx/src/Microsoft.Data.SqlClient.csproj: Language not supported
Comments suppressed due to low confidence (1)

src/Microsoft.Data.SqlClient/tests/tools/Microsoft.Data.SqlClient.TestUtilities/Fixtures/CertificateFixtureBase.cs:173

  • Consider adding tests to verify the retry logic and ensure correct behavior on repeated PowerShell failures.
for (int attempt = 1; attempt <= retries; ++attempt)

Copy link
Contributor Author

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for review, with my comments added for context.

@@ -236,7 +236,7 @@ jobs:
- powershell: |
dotnet --info

Invoke-WebRequest https://dot.net/v1/dotnet-install.ps1 -OutFile dotnet-install.ps1
Invoke-WebRequest https://builds.dotnet.microsoft.com/dotnet/scripts/v1/dotnet-install.ps1 -OutFile dotnet-install.ps1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids Invoke-WebRequest failures due to some dead IPs in the DNS results for dot.net, and is the proper canonical DNS name for the dotnet scripts.

@@ -174,7 +174,7 @@ internal static bool ValidateSslServerCertificate(Guid connectionId, string targ
{
try
{
#if NET9_0
#if NET9_0_OR_GREATER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're future-proofed!

The docs for the preprocessor symbols were missing NET9_0_OR_GREATER, so I notified the owners and they will be updated accordingly. The symbol itself has been present since .NET 9.0 release though.

@@ -16,7 +16,6 @@
We should remove ResolveComReferenceSilent as soon as we can remove the dependency on mscoree. -->
<ResolveComReferenceSilent>True</ResolveComReferenceSilent>
<EnableDefaultCompileItems>false</EnableDefaultCompileItems>
<DefineConstants>$(DefineConstants);NETFRAMEWORK;</DefineConstants>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to explicitly define NETFRAMEWORK. It has been a built-in symbol for ages.

//
// https://sqlclientdrivers.visualstudio.com/ADO.Net/_workitems/edit/34304
//
const int delay = 5;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like reasonable values. I observed 2 retries across all of the jobs, so it's not adding significant delay.

@paulmedynski paulmedynski added this to the 6.1-preview1 milestone Apr 2, 2025
@mdaigle mdaigle self-assigned this Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants