Skip to content

Conversation

@blackwer
Copy link
Collaborator

@blackwer blackwer commented Jul 23, 2025

Description

This PR aims to add the same level of backend support for float types as double types. This work is branched from #103 and addresses issue #108.

  • Add all level BLAS routines currently implemented, including MKL batching
  • Test all level BLAS routines properly
  • Add all level LAPACK routines currently implemented
  • Test all level LAPACK routines properly
  • Ensure all tests are not sneakily doing type promotion
  • Incorporate solid concept requirements, including their template macros (this is currently verbose, and somewhat inconsistent)
  • Document all changes, ensuring each API member properly reports the expect type requirements and behaviors
  • Resolve issues with answer tolerances in various tests (absolute tolerances on broad values, some randomness giving spontaneously failing tests, etc)

Notes

  • I had to change the tolerances on the tests to work for both float and double types. I did some very basic error analysis to get rough bounds on the error for a given matrix. Someone who is more knowledgeable about this kind of thing should check to make sure it's sensible.
  • Some matrices are generated randomly. This means that changing the seed might increase their error. Notably anything using the syhe_matrix calls in the lapack/linear_algebra tests. Those matrices aren't generally applied, so they can't really amplify error, so I wasn't sure what to use to bound the error. I chose 100*machine_eps.

Thoemi09 added 30 commits July 25, 2025 16:19
- Allow general nda::MemoryArray types of rank 1 or 2
- do not allow conjugate expressions
- update docs and tests
- Remove outer_product from nda::blas namespace
- Update docs and tests
- move outer_product from blas/ger.hpp to linalg/outer_product.hpp
- move the generic dot routines from the blas to the linalg namespace
- blas::dot and blas::dotc
  - thin wrappers around the BLAS routines
  - restrict to nda::MemoryVectors
- linalg::dot linalg::dotc
  - allow scalar-scalar and vector-vector dot products
  - only call blas::dot and blas::dotc if it is possible without making a copy
- move matvecmul into linalg namespace and into its own linalg/matvecmul.hpp header
- move blas::gemv_generic into linalg::detail namespace in linalg/matvecmul.hpp
- linalg::matvecmul
  - allow non-contiguous matrices
  - move various helper lambdas into linalg::detail namespace
  - avoid unnecessary vector copies
- move blas::gemm_generic into linalg::detail namespace in linalg/matmul.hpp
- linalg::matvecmul
  - allow non-contiguous matrices
  - move various helper lambdas into linalg::detail namespace
- remove explicit throw statements --> always return the LAPACK info
- consistent checks and function layout
- remove explicit throw statement --> always return the LAPACK info
- allow C-layout matrices
- require F-layout for B matrices
- lapack::getrf
  - allow C-layout matrices
- lapack::getri
  - restrict to host (there is not getri for cuda)
- lapack::getrs
  - allow conjugate lazy expressions
  - allow B to be either a vector or matrix
  - restrict B to F-layout
- move norm into linalg namespace
- allow vectors with different value types
- require vectors to have a host compatible address space
- rename inverse to inv, inverse_in_place to inv_in_place, inverse1_in_place to inv_in_place_1d,
inverse2_in_place to inv_place_2d, inverse3_in_place to inv_in_place_3d
- allow in place functions for host compatible address spaces
- use getrf + getrs to allow device matrices in inv
- rename determinant to det, determinant_in_place to det_in_place
- introduce det_1d, det_2d and det_3d
- only allow matrices on host compatible address spaces and with double or complex value types
- call optimized routines for matrices with less than 4 rows/columns
- replace throw statements with std::terminate()
- wrap the BLAS gerc routine
- restrict to Fortran layout matrices
- rename linalg::eigenelements to linalg::eigh
- rename linalg::eigenvalues to linalg::eigvalsh
- rename linalg/eigenelements.hpp to linalg/eigh.hpp
- call the LAPACK wrappers lapack::syev and lapack::heev
Thoemi09 and others added 19 commits July 25, 2025 16:19
- increase overall test coverage
- add tests for unified memory
- cusolverDn?gesvd only supports matrices with m >= n
- increase overall test coverage
- add (some) tests for unified memory
- try to get compile time errors instead of runtime errors whenever
possible
Supported functions in this patch:
sdot, cdotu, cdotc, sgemv, cgemv, sgemm, cgemm

Still need to add LAPACK routines, and device support
This was a bit hairier than the blas patch before, as there were a lot of implicit assumptions
about a double underlying type in many of the temporaries and null returns.  For tests I
factored out some logic to make the tests run in float and double, which also required grabbing
separate tolerances and passing them to the checkers. I've added some more tests to teh blas
and linear algebra test runners. Also, one of the eigenvector/value decompositions gives
eigenvectors with a different sign, which is something that probably needs to be handled better
than my approach. This isn't wrong, and there are means to ensure the same eigenvector sign, if
that's of interest in the future.

Some other functions were touched, but these are the main functions in this patch:
geqp3, heev, orgqr, syev, ungqr, eigh
gelss, gesvd, hegv, sygv, svd
blackwer added 2 commits July 28, 2025 12:29
I'm not a numerics guy, but calculating the condition number of the input matrix and bounding
the error by

coefficient_magnitude * condition_number * machine_epsilon

gives very reasonable results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants