Skip to content

feat(integration): Parallelize integration tests #1086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 137 commits into
base: master
Choose a base branch
from

Conversation

emizzle
Copy link
Contributor

@emizzle emizzle commented Jan 22, 2025

Runs integration tests in parallel, by running each integration test file specified in testIntegration in a separate process. Note, this is all still single threaded, using chronos' AsyncProcess.

The result is that the integration tests run about 60% faster locally (~26 mins on my machine) than if run serially.
image

I will update this description with the results in CI..

Configuration

There are few options for configuring the TestManager, which can be set in the following ways:

  1. Set the value when running the make testIntegration command, eg:
make DEBUG_CODEXNODES=1 testIntegration
  1. By defining the value and running testIntegration directly. This option is probably faster assuming that the codex binary does not need to re-built each time. For example:
./env.sh nim c -d:DebugCodexNodes=true -r tests/testIntegration.nim
  1. Changing the value(s) directly in tests/testIntegration.nim

Chronicles logs

For chronicles logs

Option definitions

Default values indicate what values are set when make testIntegration is run.

Option Make param Default Description
debugTestHarness DEBUG_TESTHARNESS false Echoes stderr if there's a test failure (eg test failed, compilation error) or error (eg test manager error)
debugHardhat DEBUG_HARDHAT false Echoes stdout from Hardhat process
debugCodexNodes DEBUG_CODEXNODES true Echoes stdout from the integration test file process. Codex process logs can also be output if a test uses a multinodesuite, requires CodexConfig.debug to be enabled. Note: default is true so that the output of unittest/unittest2 in stdout will be shown, ie [Suite] <suite name> and [Test] <test name>, eg image
showContinuousStatusUpdates DEBUG_UPDATES false Shows test status updates at time intervals. Useful for running locally with active terminal interaction. Set to false for unattended runs, eg CI. Example: image
testTimeout TEST_TIMEOUT 60.minutes Timeout duration (in minutes) for EACH integration test file.

@emizzle emizzle force-pushed the fix/build/macos-compile-with-nat-traversal branch from 9c3c2dd to 3b79379 Compare January 28, 2025 04:50
@emizzle emizzle force-pushed the feat/integration/parallelize-tests branch 3 times, most recently from f402de4 to 59cf143 Compare February 7, 2025 01:23
@emizzle emizzle changed the base branch from fix/build/macos-compile-with-nat-traversal to master February 7, 2025 01:23
@emizzle emizzle force-pushed the feat/integration/parallelize-tests branch from 59cf143 to 2a81eb4 Compare February 7, 2025 04:20
@emizzle emizzle changed the base branch from master to chore/integration/simplify-block-expiration-tests February 7, 2025 04:21
@emizzle emizzle force-pushed the chore/integration/simplify-block-expiration-tests branch 3 times, most recently from 384d28b to d3a2668 Compare February 19, 2025 02:42
@emizzle emizzle force-pushed the feat/integration/parallelize-tests branch from 2a81eb4 to e8b0356 Compare February 20, 2025 00:49
Base automatically changed from chore/integration/simplify-block-expiration-tests to master February 20, 2025 07:00
@veaceslavdoina veaceslavdoina force-pushed the feat/integration/parallelize-tests branch from 50b6bee to 5b6c2aa Compare February 21, 2025 17:09
@emizzle emizzle force-pushed the feat/integration/parallelize-tests branch 2 times, most recently from edd082d to 7e14753 Compare February 28, 2025 12:03
@emizzle emizzle force-pushed the feat/integration/parallelize-tests branch from f24bcee to 95804c6 Compare March 6, 2025 08:57
@emizzle emizzle marked this pull request as draft March 10, 2025 00:14
@emizzle emizzle force-pushed the feat/integration/parallelize-tests branch 2 times, most recently from d7d80e8 to 17b84b2 Compare March 18, 2025 04:48
emizzle added 11 commits March 25, 2025 17:13
- need to test with longer tests to ensure the parallelisation is truly happening
- is the +10 hardhat port needed?
- try with more integration tests

# Conflicts:
#	tests/integration/hardhatprocess.nim
#	tests/integration/multinodes.nim
#	tests/integration/testcli.nim
#	tests/testIntegration.nim
# Conflicts:
#	tests/testIntegration.nim
# Conflicts:
#	tests/testIntegration.nim
# Conflicts:
#	tests/testIntegration.nim
# Conflicts:
#	tests/integration/hardhatprocess.nim
prevents showing error in the logs when an expected process exit code is encountered

# Conflicts:
#	tests/integration/testcli.nim
# Conflicts:
#	tests/integration/testcli.nim
emizzle and others added 22 commits March 25, 2025 17:22
Windows hangs when attempting to hardhat's process streams, so try to kill the process externally.
TODO: This doesn't actually kill the process, as the pid given by chronos is an msys2 pid, and the command is used to kill a windows process. `ps -ef` in msys2 also doesn't show hardhat running as a process, so the only way to kill the process is to kill it with the windows pid. So we need to figure out a way to get a windows pid from the msys2 pid.
On windows, termination of hardhat processes would not actually kill the process, and then closing the process' streams would then hang the calling nim process. To get around this, the process is now killed externally using a script, winkillhardhat.sh. This script first queries open processes by inspecting the command line value of all "node.exe" processes, searching for "vendor/codex-contracts-eth" and for the port parameter it was started with. After querying, the process is killed using the `Stop-Process` powershell command (passing the pid of the windows process).
… params of eventually

With this fix in, there is no need to use the asynctest update that sets longer defaults for eventually, so downgrade asynctest
Since the HttpClient now supports async, re-enable debug logging in the Codex nodes
…ebugHardhat=true

Hardhat output is logged to file in hardhat.log for each test, and printing to screen is not necessarily needed as it is already logged to file and can create clutter in the output, so stop writing logging output to memory and writing to screen.
In situations like timeouts, windows will hang when attempting to close the test process streams. In this case, we have to force kill the test process externally. This is the same process as force killing hardhat nodes after they are already terminated, but windows refuses hangs when closing their process streams. This commit creates a forceKillProcess utility that allows a process to be killed by its process name and matching commandline criteria, like TestId (for test process) or --port (for hardhat)
Because `eventuallySafe` calls the symbol `eventually`, it should be declared before `proc eventually` is declared to avoid ambiguous symbol lookups.
Also adds a condition such that a hardhat node is not started for parallel integration tests
@emizzle emizzle force-pushed the feat/integration/parallelize-tests branch from f55c792 to b3bc587 Compare March 25, 2025 08:56
emizzle added 7 commits March 26, 2025 12:29
It didn't fix the error with invalid proofs in windows in ci
Windows CI is consistently failing with invalid proof, so increasing the period duration may help alleviate this, allowing SPs to fill slots when they're bogged down with proving multiple slots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants