-
Notifications
You must be signed in to change notification settings - Fork 97
Demo julienne #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Demo julienne #236
Conversation
This commit adds instructions for building and running neural-fortran (specifcally the test suite) using the experimental multi-image capabilities of LLVM flang 22 + the Caffeine parallel runtime library as an alternative to gfortran + OpenCoarrays.
This commit demonstrates running the linear_2d_layer unit tests using the Julienne correctness-checking framework (https://go.lbl.gov/julienne). The commit adds * test/driver.f90 - main program * test/linear_2d_layer_test_m.f90 - test module and adds the Julienne 3.2.1 release as a development dependency so that it is only downloaded and built if the tests are being run.
Fortran 2008 allowed for a procedure name to be passed as the actual argument to a dummy argument that is a procedure pointer. This feature was added to gfortran 14.3. This commit works around the lack of the feature in older versions.
@milancurcic all tests pass now, including the Julienne test, with
whereupon it seems that the use of |
After reviewing the original test that is failing, it's clear that 0-tolerance equality should not be expected because the weights that are being tested are updated during the backward pass, which performs multiple floating point operations on them. So, it is expected for this test to fail with some compilers and with operation reordering. Let's increase the tolerance within reason to make it pass. |
Regarding Julienne, thank you for demoing it. I need to sit with this and think for some time. I also appreciate your offer to re-write other suites (maybe all other?) to Julienne, however, I think this would be unproductive because it wouldn't give me a chance to evaluate it and learn it for myself. So I think I'll need to rewrite one suite myself to get a taste for it. |
@milancurcic that makes sense. There's not a lot more to learn than what's in this PR so just let me know any questions once you go through it. Two useful things that we didn't have time to discuss today are testing parallel runs and the option to skip a test. For parallel tests, Julienne uses a collective subroutine to ensure that a test is reported as passing only if it passes on all images. (If the test only exercises a subset of images, then the images outside that subset can simply be hardwired to pass.). To skip a test, simply omit the function when invoking the We'll submit the camera-ready version of our first Julienne paper by this Friday and then I'll present it at the US-RSE Conference next week. I'll be happy to share a copy of the final paper and the talk slides once done. |
Oh... and only image 1 prints results, which of course is important when testing with a large number of images. |
This PR includes the commits from #235 and adds a demonstration of running the
linear_2d_layer
unit tests using the Julienne correctness-checking framework. The PR adds two files, the initial versions of which were automatically generated by Julienne'sscaffold
appscaffold
output)scaffold
output edited to incorporate code from the pre-existing text)The PR also edits the
fpm.toml
file to add Julienne 3.2.1 as a development dependency so that it is only downloaded and built if the tests are being built and run.