Skip to content

Conversation

@ArafatKhan2198
Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 commented Jan 13, 2026

What changes were proposed in this pull request?

This PR fixes a flaky test failure in TestOmDBInsightEndPoint, where tests would randomly fail with
RocksDatabaseException: Rocks Database is closed.

The issue was that ReconDBProvider and OzoneStorageContainerManager were created for each test but were never properly cleaned up. When a test finished, these resources remained open, leaving the database files locked. As a result, the next test could fail.

The fix includes:

  • Adding class-level fields to store references to OzoneStorageContainerManager and ReconDBProvider
  • Updating tearDown() to properly stop and close these resources after each test

This ensures all database connections are closed before the next test runs, preventing the “database is closed” error.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14210

How was this patch tested?

Ran the test 100 times, and passed in all of them - https://github.com/ArafatKhan2198/ozone/actions/runs/21200116059/job/60984087538#logs

@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review January 13, 2026 08:41
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for working on this. If it's a test problem, please fix it in the test. We should not ignore closed database globally.

@ArafatKhan2198 ArafatKhan2198 marked this pull request as draft January 13, 2026 09:10
@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review January 21, 2026 07:09
@ArafatKhan2198
Copy link
Contributor Author

Thanks @ArafatKhan2198 for working on this. If it's a test problem, please fix it in the test. We should not ignore closed database globally.

I agree with you @adoroszlai it's better to fix the test.
I have made the changes, could you please take a look.

ozoneStorageContainerManager.stop();
}
if (reconDBProvider != null) {
reconDBProvider.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for updating the patch.

ReconTestInjector is created withContainerDB in @BeforeEach of some other tests. Don't they need to close it?

Examples:

  • TestClusterStateEndpoint
  • TestDeletedKeysSearchEndpoint
  • TestOpenKeysSearchEndpoint

(There may be more.)

Also, it would be better to add ReconTestInjector.close() to clean up all resources it created, and let tests call that single method instead of storing and closing reconDBProvider, which is otherwise not directly used in the test.

@ArafatKhan2198
Copy link
Contributor Author

Thanks for the review comment @adoroszlai

Based on my findings, here is the list of test classes that initialise ReconTestInjector with .withContainerDB() (which creates the RocksDB instance) but are missing the cleanup logic.

These 13 files need the same fix:

TestClusterStateEndpoint.java
TestDeletedKeysSearchEndpoint.java
TestOpenKeysSearchEndpoint.java
TestBlocksEndPoint.java
TestNSSummaryEndpointWithOBSAndLegacy.java
TestNSSummaryEndpointWithFSO.java
TestEndpoints.java
TestNSSummaryDiskUsageOrdering.java
TestFeaturesEndPoint.java
TestTriggerDBSyncEndpoint.java
TestNSSummaryEndpointWithLegacy.java
TestOpenContainerCount.java
TestContainerEndpoint.java

Would you suggest that we fix for TestClusterStateEndpoint only in this jira?
And fix the bulk remaining in a single jira separately?

@adoroszlai
Copy link
Contributor

I think this change can wait for a proper fix. I have not seen this test fail on master even once.

On the other hand, TestContainerEndpoint and acceptance-unsecure are frequently failing, so it would be nice if you could fix those first (HDDS-14414, HDDS-14178).

@ArafatKhan2198 ArafatKhan2198 marked this pull request as draft January 21, 2026 11:31
reconDBProvider.close();
}
} catch (Exception e) {
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of exception here we should fail the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants