Skip to content

dsuarez01/minfer-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

minfer-python

Benchmarking inference on (decoder-only) LLM models.

Kernels written using Triton and CUDA as the underlying backend.

Currently the kernels are stubs, and they successfully compile as tested on my university cluster setup.

Information you may find useful:

The environment variable TORCH_EXTENSIONS_DIR determines where the compiled kernels are stored, and TORCH_CUDA_ARCH_LIST limits which compute capabilities the kernels are compiled for.

About

"Minimal" inference benchmarking in Python w/ Triton and CUDA kernels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published