Current code

For now, I just have a row-major impl of TSS set up in tss-demo/tss.cpp. There are some issues that need to be fixed -- feel free to go in and fix them if you see them.

Run g++ tss-demo/tss.cpp and run ./a.out <cache_size> <cache_line_size> <row_length> <column_length> to get the best row and column for a given set of parameters.

Issues

[DONE] uncertain if line 34 makes sense
[DONE] uncertain how to init row_size on line 48
- we thought cache_size % row_length was fine, but this causes us never to enter the while loop if the cache is smaller than the array
[DONE] current impl does not correctly handle working set size greater than cache

Todos

Create a pass in MLIR to apply TSS
- set up MLIR on server
- copy over relevant parts of the MLIR loop tiling passes
- get MLIR pass running successfully
- change algorithm to use tss
- decide which edge cases we don't want to handle
- do da coding
[DONE] Fix case where you return cols_per_set + 1 (Minkyoung: no need to modify since it is correct)
[DONE] Fix initialization of row_size (Minkyoung: check required)
[DONE] Fix WS issue (Elanor)
Write tests and walk through algorithm to determine correct solutions

Test Cases

Tests from Paper

Cache sizes: 8K (512 element) and 64K (4096 element)
Cache line sizes in bytes: 32 (2 element), 64 (4 element), 128 (8 element)
Array sizes: 256 x 256, 300 x 300, 301 x 301

CS	CLS	Matrix	Tile	WS	Pass?
512	2	300 x 300	16 x 29	482	Y
512	8	300 x 300	16 x 29	Not given	Y
512	2	256 x 256	170 x 2	512	Y
512	2	301 x 301	28 x 17	506	Y
4096	8	256 x 256	240 x 16	4088	Y
4096	8	300 x 300	88 x 41	3704	Y
4096	8	301 x 301	112 x 26	3086	Test invalid

Note: Elanor and Minkyoung both think the last row dimensions are not possible given the algorithm

Custom Cases

1024, 200, 200, 200 (PASS)
18, 8, 10, 10
1024, ___, 200, 200
cache smaller than array: 500 50 500000 100000 (PASS)
WS never fits in cache: 256 8 240 240 (PASS)
300x300, 248, 7
500x500, 408, 4

Commands

Useful Commands for Running MLIR Code

/home/llvm-project/build/bin/mlir-opt matmul-tiling.mlir -split-input-file -affine-loop-tile="tile-size=32"
/home/llvm-project/build/bin/mlir-opt matmul-tiling.mlir -split-input-file -affine-loop-tile="cache-size=512"

Using the Makefile Pipeline

"make loop_tile_size" runs the affine dialect loop tile optimization from mlir to mlir
"make convert_mlir_to_llvm" runs the entire pipeline for the affine loop optimization from mlir to llvm, outputting pipeline/convert_mlir_to_llvm.mlir which should only contain llvm instructions
The make commands are intended to be run in your home directory on server and create a pipeline directory to store output
You can run "make clean" to get rid of the pipeline directory and run it fresh
"make get_executable" runs the MLIR conversion pipeline and also converts the output to llvm bc, makes an object file, and creates an executable from that
"make run_cachegrind" runs the cachegrind command; it assumes the executable is called basic_test.exe
"make tss" gives the executable tss. Then run ./tss <cache size> <cache line size> <number of rows> <number of columns>
For "make loop_tile_size" we should probably have a better way of specifying the tile size; currently it is hardcoded

Parameters on the server

Cache size: 64 KB (2048 elements if elements are 32 B) (use command lscpu)
Cache line size: 64 B (2 elements if elements are 32 B) (use command getconf LEVEL1_DCACHE_LINESIZE)

Cachegrind

After running the "make convert_mlir_to_llvm" command, run /home/llvm-project/build/bin/mlir-translate --mlir-to-llvmir ~/mlir-loop-tiling/pipeline/convert_mlir_to_llvm.mlir -o main_llvm.bc to get the bitcode file
Then run llc -filetype=obj -opaque-pointers=1 main_llvm.bc which should generate main_llvm.o
Then run gcc main_llvm.o -g -o basic_test.exe
Last step is valgrind --tool=cachegrind ./basic_test.exe

Results

Output from cachegrind is stored in the results folder
Output from running the tiling pass is stored in the tiled-matmul folder
Default test cases are named by the dimensions of the matrix multiply _ default (square matrices are just named with a single number)
TSS test cases for an (axb)*(bxc) matrix are named as follows
- after TSS is run (by running `./tss 2048 2 b c), it outputs tile_rowsize and tile_colsize
- input to tiling pass is a, tile-colsize, tile-rowsize (col comes before row because the for loop iterating over columns of the special matrix comes before the for loop iterating over rows)
- test cases are named a_[tile-colsize]_[tile-rowsize]
- sorry it is annoying to map a test case back to the original dimensions; this made sense as I was running the cases. You can figure it out by looking at the mlir file for that test case and seeing the parameters of the matrices

Running a test case

Assume (axb)*(bxc) matrix multiply

Modify matmul-tiling.mlir
1. arg0 should be of dimension axb, arg1 of dimension bxc, and arg2 of dimension axc
2. i should go to a, j should go to c, and k should go to b
If running a TSS case:
1. Run the tss algorithm with input 2048 2 b c (assuming the matrixes have 32B elements); get output tile-rowsize and tile-colsize.
2. Modify TILE_SIZES in the Makefile to be a, tile-colsize, tile-rowsize.
Modify TEST_CASE in the Makefile to be the name of your test case.
Tile the matrix
1. If default, run make tile_default.
2. If TSS, run make loop_tile_size.
Run mlir code through pipeline to get executable: make executable
Profile with cachegrind to see number of cache misses: make run_cachegrind

Notes

Currently am running tss, then manually changing the Makefile tiling arguments. This could definitely be better.
I also don't know how to get the cachegrind to send the output to a file (> doesn't seem to work) so am manually copying that. Should also be better.

TODO

Figure out how to add our pass to the rest of them
Modify LoopTiling code and LoopUtils to support rectangular tiles
Figure out what modifications to getWorkingSetSize() and CIR() are necessary with the current tiling pass
Tiling for the most referenced matrix not all matrix [ideally but maybe too hard]

[1] getTileSizes(): How about just using TSS.cpp and set its resulting tilesize as commend-line tileSize?

void LoopTiling::getTileSizes(ArrayRef<AffineForOp> band,
                              SmallVectorImpl<unsigned> *tileSizes) {
  if (band.empty())
    return;
 
  **// Use command-line tileSize for all loops if specified.**
  if (**tileSize**) {
    tileSizes->assign(band.size(), tileSize);
    return;
  }

[2] Should check if this func supports rectangular tile → Seems not yet supported

constructTiledLoopNest()
tilePerfectlyNested()
separateFullTiles()

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
examples		examples
logs		logs
old		old
results		results
tiled-matmul		tiled-matmul
tss-demo		tss-demo
Makefile		Makefile
README.md		README.md
simple_test.sh		simple_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Current code

Issues

Todos

Test Cases

Tests from Paper

Custom Cases

Commands

Useful Commands for Running MLIR Code

Using the Makefile Pipeline

Parameters on the server

Cachegrind

Results

Running a test case

Notes

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Current code

Issues

Todos

Test Cases

Tests from Paper

Custom Cases

Commands

Useful Commands for Running MLIR Code

Using the Makefile Pipeline

Parameters on the server

Cachegrind

Results

Running a test case

Notes

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages