Skip to content

Conversation

rouson
Copy link
Collaborator

@rouson rouson commented Sep 29, 2025

This PR includes the commits from #235 and adds a demonstration of running the linear_2d_layer unit tests using the Julienne correctness-checking framework. The PR adds two files, the initial versions of which were automatically generated by Julienne's scaffold app

  • test/driver.f90 - main program (unmodified scaffold output)
  • test/linear_2d_layer_test_m.f90 (scaffold output edited to incorporate code from the pre-existing text)

The PR also edits the fpm.toml file to add Julienne 3.2.1 as a development dependency so that it is only downloaded and built if the tests are being built and run.

This commit adds instructions for building and running
neural-fortran (specifcally the test suite) using the experimental
multi-image capabilities of LLVM flang 22 + the Caffeine parallel
runtime library as an alternative to gfortran + OpenCoarrays.
This commit demonstrates running the linear_2d_layer unit tests
using the Julienne correctness-checking framework
(https://go.lbl.gov/julienne).  The commit adds

* test/driver.f90 - main program
* test/linear_2d_layer_test_m.f90 - test module

and adds the Julienne 3.2.1 release as a development dependency
so that it is only downloaded and built if the tests are being
run.
Fortran 2008 allowed for a procedure name to be passed as the
actual argument to a dummy argument that is a procedure pointer.
This feature was added to gfortran 14.3.  This commit works around
the lack of the feature in older versions.
@rouson
Copy link
Collaborator Author

rouson commented Sep 29, 2025

@milancurcic all tests pass now, including the Julienne test, with gfortran and flang-new with a few caveats:

  1. Even the pre-existing test seg faults with gfortran 15 but not with versions 13 or 14.
  2. In the case of flang-new 22, any optimization level above -O0 causes the following test failure in the Julienne test:
A linear_2d_layer
   FAILS  on updating gradients.
      diagnostics: 
        expected 0.8160000443459 within a tolerance of 0.000000000000; actual value is 0.8159999847412
        expected 0.2124000191689 within a tolerance of 0.000000000000; actual value is 0.2124000042677 (updated weights)
 0 of 1 tests passed. 0 tests were skipped.

whereupon it seems that the use of tolerance = 0. is the culprit. The tests would pass with say tolerance=0.001. Are we certain that this is a case for which the answer should be exact? The only thing Julienne is doing is checking whether the absolute value of the absolute error is within tolerance. If the numbers really are exactly the same, I would expect that test to pass, but apparently the values aren't exactly the same at high levels of optimization.

@milancurcic
Copy link
Member

After reviewing the original test that is failing, it's clear that 0-tolerance equality should not be expected because the weights that are being tested are updated during the backward pass, which performs multiple floating point operations on them. So, it is expected for this test to fail with some compilers and with operation reordering. Let's increase the tolerance within reason to make it pass.

@milancurcic
Copy link
Member

Regarding Julienne, thank you for demoing it. I need to sit with this and think for some time. I also appreciate your offer to re-write other suites (maybe all other?) to Julienne, however, I think this would be unproductive because it wouldn't give me a chance to evaluate it and learn it for myself. So I think I'll need to rewrite one suite myself to get a taste for it.

@rouson
Copy link
Collaborator Author

rouson commented Sep 30, 2025

@milancurcic that makes sense. There's not a lot more to learn than what's in this PR so just let me know any questions once you go through it. Two useful things that we didn't have time to discuss today are testing parallel runs and the option to skip a test. For parallel tests, Julienne uses a collective subroutine to ensure that a test is reported as passing only if it passes on all images. (If the test only exercises a subset of images, then the images outside that subset can simply be hardwired to pass.). To skip a test, simply omit the function when invoking the test_description_t() constructor. This is useful when a test is known to crash with a specific compiler or compiler version as is the case for one or more neural-fortran tests with GCC 15. Importantly, Julienne reports and tallies skips so that ultimately the condition "passes + skips = total number of tests" is sufficient to consider all tests as passing. Using a preprocessor macro, a test can be skipped with the problematic compilers or compiler versions, which also serves as a form of live documentation of which features don't work with a given compiler.

We'll submit the camera-ready version of our first Julienne paper by this Friday and then I'll present it at the US-RSE Conference next week. I'll be happy to share a copy of the final paper and the talk slides once done.

@rouson
Copy link
Collaborator Author

rouson commented Sep 30, 2025

Oh... and only image 1 prints results, which of course is important when testing with a large number of images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants