Skip to content

bolairookie/megatile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Megatile

Mega kernel implementation based on TileLang. Inspired by Triton-distributed's mega_triton_kernel.

What it does

Fuses multiple GPU kernels into a single mega kernel using TileLang.

Install

pip install tilelang
pip install -e .

Example

import torch
from megatile import ModelBuilder

builder = ModelBuilder(num_warps=4)

input = torch.randn(1024, 512, dtype=torch.bfloat16, device="cuda").contiguous()
weight = torch.randn(1024, 512, dtype=torch.bfloat16, device="cuda").contiguous()
output = torch.empty(1024, 1024, dtype=torch.bfloat16, device="cuda").contiguous()

builder.make_linear(input, weight, output, layer_id=0)
builder.compile()
builder.run()

Structure

  • core/ - task management, scheduling, code generation
  • tasks/ - task builders (e.g., linear)
  • kernels/ - TileLang kernel implementations
  • models/ - ModelBuilder API

Credits

This project is based on Triton-distributed's mega_triton_kernel implementation. Thanks to the ByteDance Seed team for their excellent work.

About

Mega kernel implementation based on TileLang.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages