Skip to content

Commit fb335f6

Browse files
authored
CUTLASS 2.0 (NVIDIA#62)
CUTLASS 2.0 Substantially refactored for - Better performance, particularly for native Turing Tensor Cores - Robust and durable templates spanning the design space - Encapsulated functionality embodying modern C++11 programming techniques - Optimized containers and data types for efficient, generic, portable device code Updates to: - Quick start guide - Documentation - Utilities - CUTLASS Profiler Native Turing Tensor Cores - Efficient GEMM kernels targeting Turing Tensor Cores - Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands Coverage of existing CUTLASS functionality: - GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs - Volta Tensor Cores through native mma.sync and through WMMA API - Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions - Batched GEMM operations - Complex-valued GEMMs Note: this commit and all that follow require a host compiler supporting C++11 or greater.
1 parent b5cab17 commit fb335f6

File tree

5,434 files changed

+599678
-250055
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

5,434 files changed

+599678
-250055
lines changed

.gitmodules

-3
Original file line numberDiff line numberDiff line change
@@ -1,3 +0,0 @@
1-
[submodule "tools/external/googletest"]
2-
path = tools/external/googletest
3-
url = https://github.com/google/googletest.git

CHANGELOG.md

+27-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,31 @@
11
# NVIDIA CUTLASS Changelog
22

3+
# CUTLASS 2.0
4+
5+
## [2.0.0](https://github.com/NVIDIA/cutlass/releases/tag/v2.0.0) (2019-11-19)
6+
* Substantially refactored for
7+
* Better performance, particularly for native Turing Tensor Cores
8+
* Robust and durable templates spanning the design space
9+
* Encapsulated functionality embodying modern C++11 programming techniques
10+
* Optimized containers and data types for efficient, generic, portable device code
11+
* Updates to:
12+
* [Quick start guide](/media/docs/quickstart.md)
13+
* [Documentation](/README.md#documentation)
14+
* [Utilities](/media/docs/utilities.md)
15+
* [CUTLASS Profiler](/media/docs/profiler.md)
16+
* Native Turing Tensor Cores
17+
* Efficient GEMM kernels targeting Turing Tensor Cores
18+
* Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands
19+
* Coverage of existing CUTLASS functionality
20+
* GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs
21+
* Volta Tensor Cores through native mma.sync and through WMMA API
22+
* Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions
23+
* Batched GEMM operations
24+
* Complex-valued GEMMs
25+
* Note: a host compiler supporting C++11 or greater is required.
26+
27+
# CUTLASS 1.x
28+
329
## [1.3.2](https://github.com/NVIDIA/cutlass/releases/tag/v1.3.2) (2019-07-09)
430
* Performance improvement for Volta Tensor Cores TN and TT layouts.
531

@@ -50,7 +76,7 @@
5076

5177
## Copyright
5278

53-
Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved.
79+
Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
5480

5581
```
5682
Redistribution and use in source and binary forms, with or without modification, are permitted

0 commit comments

Comments
 (0)