- Vector addition and dot product with shared memory.
- Matrix multiplication, Gaussian elimination, inversion, and exponentiation.
- 2D convolution with shared-memory tiling for fast image processing.
- Thread and block hierarchy, memory types (global, shared, constant).
- Memory coalescing, bank conflict avoidance, synchronization, and warp efficiency.
TODO: implement a makefile and polish the project
/$$$$$$ /$$ /$$ /$$$$$$$ /$$$$$$ /$$ /$$ /$$ /$$ /$$ /$$
/$$__ $$| $$ | $$| $$__ $$ /$$__ $$ | $$$ /$$$ | $$ | $$$ /$$$ | $$
| $$ \__/| $$ | $$| $$ \ $$| $$ \ $$ | $$$$ /$$$$ /$$$$$$ /$$$$$$ | $$$$ /$$$$ /$$ /$$| $$
| $$ | $$ | $$| $$ | $$| $$$$$$$$ | $$ $$/$$ $$ |____ $$|_ $$_/ | $$ $$/$$ $$| $$ | $$| $$
| $$ | $$ | $$| $$ | $$| $$__ $$ | $$ $$$| $$ /$$$$$$$ | $$ | $$ $$$| $$| $$ | $$| $$
| $$ $$| $$ | $$| $$ | $$| $$ | $$ | $$\ $ | $$ /$$__ $$ | $$ /$$| $$\ $ | $$| $$ | $$| $$
| $$$$$$/| $$$$$$/| $$$$$$$/| $$ | $$ | $$ \/ | $$| $$$$$$$ | $$$$/| $$ \/ | $$| $$$$$$/| $$
\______/ \______/ |_______/ |__/ |__/ |__/ |__/ \_______/ \___/ |__/ |__/ \______/ |__/