Skip to content

Conversation

@agriyakhetarpal
Copy link
Member

@agriyakhetarpal agriyakhetarpal commented May 16, 2025

In this PR, I'm trying to investigate if we can shorten the time taken to run the test suite using pytest-xdist, which can take upwards of ~25 minutes in the case of a full build.

@agriyakhetarpal
Copy link
Member Author

Of course, we can only get the best estimate using a full build, but if the tests all work correctly and pass in this parallel context, it should be a sign that there shouldn't be any (many?) problems.

@agriyakhetarpal
Copy link
Member Author

Based on https://github.com/pyodide/pyodide-recipes/actions/runs/15079325254/job/42394053702?pr=92, this seems to work! 409 tests executed in 1 minute and 51 seconds, while in the logs for an older job, I see that 380 tests ran in 3 minutes. The time savings will show up better and will be more useful for full builds.

@hoodmane
Copy link
Member

Previously we had xdist on but our ci was pretty flakey and @rth turned it off because he thought it was oversubscribed. Because the browser uses several threads. Maybe start with xdist on node?

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented May 17, 2025

Oh, I didn't know! I'll see if I can restrict this to just Node through a global-level marker and let the rest of the tests run in serial, but I don't immediately see it in the matrix. I thought that we were testing only with Chrome at first, but that doesn't seem to be the case. I'll check.

@github-actions
Copy link
Contributor

github-actions bot commented May 17, 2025

Package Build Results

Total packages built: 292
Total build time: 2:04:26

Package Build Times (click to expand)
Package Build Time
opencv-python 37m 12s
flint 29m 0s
ffmpeg 24m 16s
mypy 20m 1s
igraph 17m 0s
geos 13m 18s
scipy 12m 25s
python-sat 11m 10s
coolprop 10m 46s
lightgbm 10m 43s
libproj 10m 5s
python-flint 10m 4s
gdal 8m 34s
openblas 7m 38s
pandas 7m 30s
yt 7m 19s
libhdf5 7m 14s
nlopt 6m 54s
b2d 6m 53s
xgboost 6m 48s
scikit-image 6m 38s
statsmodels 6m 10s
lxml 5m 39s
clingo 5m 38s
pysam 4m 52s
openssl 4m 24s
boost-cpp 4m 14s
boost-histogram 4m 11s
numpy 4m 7s
rustworkx 4m 2s
libgsl 3m 58s
pynacl 3m 57s
numcodecs 3m 53s
scikit-learn 3m 50s
pydantic_core 3m 48s
libgmp 3m 42s
awkward-cpp 3m 27s
libde265 3m 2s
suitesparse 2m 32s
cramjam 2m 27s
sourmash 2m 14s
apsw 2m 10s
rasterio 2m 9s
astropy 2m 5s
libmpfr 2m 3s
blosc2 1m 58s
primesieve 1m 56s
liblzma 1m 53s
pyerfa 1m 52s
libnetcdf 1m 46s
matplotlib 1m 46s
sisl 1m 46s
river 1m 40s
h5py 1m 35s
libcst 1m 35s
glpk 1m 33s
libiconv 1m 32s
Pillow 1m 32s
rateslib 1m 31s
libmagic 1m 30s
libtiff 1m 25s
libheif 1m 20s
cryptography 1m 17s
libwebp 1m 16s
libxml 1m 11s
shapely 1m 2s
pcodec 1m 2s
sqlite3 1m 2s
contourpy 58s
pywavelets 56s
lakers-python 56s
libxslt 54s
fiona 54s
pyxirr 54s
python-solvespace 54s
libzfp 51s
pycryptodome 48s
libmpc 45s
fastparquet 45s
jiter 45s
protobuf 45s
tiktoken 44s
nh3 44s
netcdf4 41s
orjson 40s
aiohttp 40s
soxr 39s
cffi_example 37s
zstandard 37s
freesasa 36s
cysignals 36s
ewah_bool_utils 35s
bcrypt 35s
kiwisolver 34s
cbor-diag 32s
cytoolz 32s
reboundx 31s
h3 31s
libf2c 28s
rpds-py 27s
test 25s
pyclipper 25s
brotli 25s
msprime 23s
logbook 23s
rust-panic-test 23s
galpy 22s
zlib 21s
rebound 21s
biopython 21s
sqlalchemy 18s
cftime 17s
frozenlist 17s
rust-abi-test 16s
memory-allocator 15s
tskit 14s
peewee 14s
yarl 14s
gmpy2 14s
zfpy 13s
tree-sitter-java 13s
ssl 13s
cffi 12s
tree-sitter 12s
tree-sitter-go 12s
wrapt 12s
wordcloud 12s
swiglpk 12s
regex 12s
gsw 12s
hashlib 11s
pillow-heif 11s
pyyaml 10s
simplejson 10s
traits 10s
msgspec 10s
bitarray 10s
msgpack 9s
fpcast-test 9s
ndindex 9s
mmh3 9s
libyaml 8s
bokeh 8s
pyiceberg 8s
autograd 7s
uncertainties 7s
cpp-exceptions-test 7s
pi-heif 6s
argon2-cffi-bindings 6s
tree-sitter-python 6s
crc32c 5s
coverage 5s
sympy 5s
xxhash 5s
lzma 5s
pyinstrument 5s
lazy-object-proxy 5s
sparseqr 4s
pydecimal 4s
screed 4s
cpp-exceptions-test2 4s
pyheif 4s
pydoc_data 4s
joblib 4s
multidict 4s
zengl 4s
sharedlib-test-py 4s
pycparser 4s
future 4s
xyzservices 4s
pyrsistent 4s
asciitree 4s
packaging 3s
sharedlib-test 3s
strictyaml 3s
pytest-benchmark 3s
argon2-cffi 3s
urllib3 3s
python-magic 3s
patsy 3s
buffer-test 3s
MarkupSafe 3s
pyshp 3s
httpx 3s
retrying 3s
termcolor 3s
donfig 2s
tqdm 2s
Jinja2 2s
tblib 2s
prompt_toolkit 2s
atomicwrites 2s
matplotlib-pyodide 2s
bitstring 2s
anyio 2s
tzdata 2s
decorator 2s
typing-extensions 2s
mne 2s
xlrd 2s
bleach 2s
iniconfig 2s
html5lib 2s
pydantic 2s
mpmath 2s
fsspec 2s
pyodide-unix-timezones 2s
py 2s
jedi 2s
altair 2s
setuptools 1s
pyparsing 1s
pytest 1s
xarray 1s
affine 1s
pytz 1s
astropy_iers_data 1s
fonttools 1s
click 1s
imageio 1s
python-dateutil 1s
traitlets 1s
openai 1s
vega-datasets 1s
smart-open 1s
asttokens 1s
h11 1s
ipython 1s
micropip 1s
pkgconfig 1s
cmyt 1s
Pygments 1s
aiosignal 1s
rich 1s
tomli-w 1s
certifi 1s
toolz 1s
nltk 1s
narwhals 1s
tomli 1s
svgwrite 1s
networkx 1s
requests 1s
ruamel.yaml 1s
optlang 1s
beautifulsoup4 1s
more-itertools 1s
charset-normalizer 1s
jsonschema 1s
distlib 1s
soupsieve 1s
sniffio 1s
wcwidth 1s
parso 1s
exceptiongroup 1s
cloudpickle 1s
httpcore 1s
cssselect 1s
cligj 1s
six 1s
sortedcontainers 1s
pluggy 1s
aiohappyeyeballs 1s
colorspacious 1s
attrs 1s
docutils 1s
annotated-types 1s
threadpoolctl 1s
webencodings 1s
cycler 1s
pytest_httpx 0s
lazy_loader 0s
demes 0s
newick 0s
munch 0s
propcache 0s
platformdirs 0s
texttable 0s
unyt 0s
idna 0s
pytest-asyncio 0s
referencing 0s
async-timeout 0s
deprecation 0s
matplotlib-inline 0s
pure-eval 0s
pyodide-http 0s
executing 0s
jsonschema_specifications 0s
distro 0s
cachetools 0s
stack-data 0s

Longest build: opencv-python (37m 12s)
Packages built in more than 10 minutes: 12

@agriyakhetarpal
Copy link
Member Author

Okay, I notice that we aren't running tests on Node.js in CI at the moment, it's just that the conftest.py file was ported and includes some references to be used later. I assume there has been a decent amount of time since the last leg of pytest-xdist usage in the main Pyodide repo. I could trigger the tests a few more times to see if there are any flakes, if you want me to, but we could also go ahead and revert the change in case we see there is a problem that isn't easily fixable. What do you think?

Thanks!

@hoodmane
Copy link
Member

I'm fine with merging it and seeing what happens. Maybe don't set the concurrency very high?

We should also add node tests ideally.

@agriyakhetarpal
Copy link
Member Author

I'm fine with merging it and seeing what happens. Maybe don't set the concurrency very high?

Great! Yes, since this is on GitHub Actions runners, and we've set it to -n auto, so the number of jobs is 2 (and 3 on macOS, if we were to add it later). We would have more cores on a larger runner shall we opt for one.

@agriyakhetarpal
Copy link
Member Author

We should also add node tests ideally.

Yes, and also Safari and Firefox, as we just need to download the artifacts in a matrix of jobs. I can take it up in a follow-up.

@agriyakhetarpal
Copy link
Member Author

Let's wait for #91 before we go ahead with this; I'll rebase after that is done.

@rth
Copy link
Member

rth commented May 17, 2025

-n auto, so the number of jobs is 2

Just worth double checking that this is creating the number of jobs you expect. "auto" may not work properly in some virtualized environnement including CI pytest-dev/pytest-xdist#1103. as they are not handling some of the edge cases that say joblib.cpu_count includes...

@ryanking13
Copy link
Member

Thanks. Yeah, I think it was flaky before, but it was a long time ago, so we can try it again. We can revert it anytime if it doesn't work.

@agriyakhetarpal
Copy link
Member Author

-n auto, so the number of jobs is 2

Just worth double checking that this is creating the number of jobs you expect. "auto" may not work properly in some virtualized environnement including CI pytest-dev/pytest-xdist#1103. as they are not handling some of the edge cases that say joblib.cpu_count includes...

Thanks for that link, @rth. I tried to find the CPU specifications for the Linux runners, but I don't see any, unfortunately. If these are Intel runners, then we need to check for hyper-threading, and the CPU affinity might be a better value to use. If they are AMD CPUs, we have SMT which isn't well defined either as we know nothing about the make/model/architecture.

It looks like there are now 4 cores across all runners, bumped up from the previous 2 (except M-series macOS runners, which have 3): https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories. The page doesn't say anything about the number of physical/logical/virtual cores. In the meantime, we can change to -n ${nproc}, hopefully that helps.

@agriyakhetarpal agriyakhetarpal changed the title Try out parallel testing Try out parallel testing [full build] May 19, 2025
@agriyakhetarpal
Copy link
Member Author

I've triggered a full build and full test suite run, let's see if we have any issues.

@agriyakhetarpal agriyakhetarpal requested review from rth and ryanking13 May 19, 2025 22:54
@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review May 19, 2025 22:54
@ryanking13
Copy link
Member

Thanks! The failing tests are related to cython so I think they are okay.

The output looks a little bit noisy than before (two lines per test?). It would be nice to fix that, but it is not a big problem, otherwise looks good to me.

@agriyakhetarpal
Copy link
Member Author

Yes, I'm not really sure why that is happening. I think we can try again; I'm happy to wait a little. It could be a bug with how pytest-xdist interacts with the terminal and reports test successes/failures.

@agriyakhetarpal
Copy link
Member Author

Based on multiple outputs in the logs, here's an example:

[gw2] [ 10%] PASSED packages/blosc2/test_blosc2.py::test_large_typesize[chrome-shape0-261-False] 
packages/blosc2/test_blosc2.py::test_large_typesize[chrome-shape0-65536-True]

What I think is happening here is that it uses multiple lines because it adds the [ 10%] PASSED text to the first test case out of a parametrised test, but it doesn't do so for the rest of the parametrised test cases that come after it?

@agriyakhetarpal
Copy link
Member Author

Okay, after a bit of experimentation, I think this is expected. I ran the pyodide-build test suite, and this is what we get on a few of the tests:

[gw0] [ 59%] PASSED pyodide_build/tests/test_venv.py::test_venv_cli_args[options2-expected_calls2]
pyodide_build/tests/test_venv.py::test_venv_cli_args[options4-expected_calls4]
pyodide_build/tests/test_venv.py::test_venv_cli_args[options5-expected_calls5]
pyodide_build/tests/test_venv.py::test_supported_virtualenv_options
pyodide_build/tests/test_venv.py::test_venv_creation[default]

This means that there is something more fishy going on, and it's not a problem with this repository alone – I checked the logs for other tests and this is everywhere. Perhaps pytest-dev/pytest-xdist#605?

A worker $X$ that operates on a test method $Y$ with $n$ parametrized cases first runs the starting one, and then it proceeds to display the next case out of the $(n - 1)$ cases it's going to run within the next lines. However, I confirmed that no tests are missing from the test report.

Hence we should be fine to merge this, as it's easily a close-to-2x improvement (31 minutes ➡️ 16 minutes).

Thanks for the reviews!

@agriyakhetarpal agriyakhetarpal merged commit 3bf8f4a into pyodide:main May 20, 2025
3 checks passed
@agriyakhetarpal agriyakhetarpal deleted the try-parallel-testing branch May 20, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants