Skip to content

Flash attention 140 TFlops builder with barriers removal and experimental TILE_S1=512 version#138

Open
MirkoDeVita98 wants to merge 1 commit into
mainfrom
new_flash
Open

Flash attention 140 TFlops builder with barriers removal and experimental TILE_S1=512 version#138
MirkoDeVita98 wants to merge 1 commit into
mainfrom
new_flash

Conversation

@MirkoDeVita98
Copy link
Copy Markdown
Collaborator

@MirkoDeVita98 MirkoDeVita98 commented May 12, 2026

The following performance is measured with:

bash compile.sh --remove-vec-barriers 1264,1267,1272,1275,1279,1282,1311,1313,1316,1320,1322,1325,1328,1330,1333,1362,1364,1367,1371,1373,1376,1379,1381,1384,1390
python run.py
naive_tpush_dsl_plot

…. Included also pottisibility to remove barriers during compilation
@MirkoDeVita98 MirkoDeVita98 changed the title Flash attention 140 builder with barriers removal and experimental TILE_S1=512 version Flash attention 140 TFlops builder with barriers removal and experimental TILE_S1=512 version May 13, 2026
@MirkoDeVita98
Copy link
Copy Markdown
Collaborator Author

2026/05/13

Commit 5176daa

Examples

$ python examples/validate_all_examples.py 
FAILED aot/flash_attention/experimental [0.00s]
PASSED aot/activations/geglu_dynamic_multicore [37.29s]
PASSED aot/activations/relu_dynamic_multicore [29.13s]
PASSED aot/batch_matmul/matmul_dynbatch_multicore [22.61s]
PASSED aot/batch_matmul/matmul_dynbatch_multicore_2buf [19.84s]
PASSED aot/batch_matmul/matmul_dynbatch_multicore_opt [25.23s]
PASSED aot/elementwise/add_dynamic_multicore [41.64s]
PASSED aot/fast_hadamard [112.13s]
PASSED aot/fast_inverse/basic_dense [66.37s]
PASSED aot/fast_inverse/block_inversion [44.29s]
PASSED aot/flash_attention/140tflops [27.42s]
PASSED aot/flash_attention [44.51s]
PASSED aot/matmul_optimization_guide [163.03s]
PASSED aot/matmul_optimization_guide/experimental [74.86s]
PASSED aot/print_tile [24.72s]
PASSED aot/simple_static/add_static_multicore [22.02s]
PASSED aot/simple_static/matmul_static_singlecore [21.51s]
PASSED aot/sinkhorn_demo [19.06s]
PASSED aot/topk [53.56s]
PASSED aot/tpushpop/mix-kernel_mlir [83.47s]
PASSED jit/add_dynamic_multicore [19.91s]
PASSED jit/add_static_multicore [38.26s]
PASSED jit/matmul_dynamic_multicore [21.08s]
PASSED jit/scan [21.20s]
==============================================================================
23 passed, 1 failed in 1033.18s

Unit tests

$ pytest -v ./tests
======================================================= 615 passed, 21 skipped in 464.11s (0:07:44) ========================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant