Skip to content

Adding kernel for Newton-Schulz inverse#107

Merged
asobczyk merged 5 commits into
mainfrom
dev-port-tri-inv-ns
Apr 17, 2026
Merged

Adding kernel for Newton-Schulz inverse#107
asobczyk merged 5 commits into
mainfrom
dev-port-tri-inv-ns

Conversation

@asobczyk
Copy link
Copy Markdown
Collaborator

@asobczyk asobczyk commented Apr 16, 2026

$ pytest -s tests/test_tri_inv_ns.py 
=============================================== test session starts ================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.6.0
rootdir: /home/asobczyk/software/pto-kernels
configfile: pytest.ini
collected 400 items                                                                                                

tests/test_tri_inv_ns.py [W416 13:34:00.010460655 TensorFactories.cpp:338] Warning: Cannot create tensor with interal format while allow_internel_format=False, tensor will be created with base format. (function operator())
................................................................................................................................................................................................................................................................................................................................................................................................................

=============================================== 400 passed in 16.56s ===============================================

Copy link
Copy Markdown
Collaborator

@gioelegott gioelegott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and everything passes. I trust @asobczyk for the correct absolute and relative errors.

I left some minor comments, but feel free to merge.

Comment thread csrc/kernel/kernel_tri_inv_ns.cpp
Comment thread tests/test_tri_inv_ns.py
Comment thread tests/test_tri_inv_ns.py Outdated
Comment thread csrc/host/torch_tri_inv_ns.h
Comment thread csrc/host/torch_tri_inv_ns.h Outdated
Comment thread csrc/host/torch_tri_inv_ns.h Outdated
* X = Y @ X + 2 * X
* return X
*/
template <typename InputT, typename OutputT, uint32_t MatrixSize>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick for doxygen docs. I thought CI should complain about it :-(

Copy link
Copy Markdown
Collaborator

@zouzias zouzias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks, feel free to merge.

@asobczyk asobczyk merged commit b5dc94f into main Apr 17, 2026
16 checks passed
@asobczyk asobczyk deleted the dev-port-tri-inv-ns branch April 17, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants