|
1 | 1 | # NVIDIA CUTLASS Changelog
|
2 | 2 |
|
| 3 | +# CUTLASS 2.0 |
| 4 | + |
| 5 | +## [2.0.0](https://github.com/NVIDIA/cutlass/releases/tag/v2.0.0) (2019-11-19) |
| 6 | + * Substantially refactored for |
| 7 | + * Better performance, particularly for native Turing Tensor Cores |
| 8 | + * Robust and durable templates spanning the design space |
| 9 | + * Encapsulated functionality embodying modern C++11 programming techniques |
| 10 | + * Optimized containers and data types for efficient, generic, portable device code |
| 11 | + * Updates to: |
| 12 | + * [Quick start guide](/media/docs/quickstart.md) |
| 13 | + * [Documentation](/README.md#documentation) |
| 14 | + * [Utilities](/media/docs/utilities.md) |
| 15 | + * [CUTLASS Profiler](/media/docs/profiler.md) |
| 16 | + * Native Turing Tensor Cores |
| 17 | + * Efficient GEMM kernels targeting Turing Tensor Cores |
| 18 | + * Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands |
| 19 | + * Coverage of existing CUTLASS functionality |
| 20 | + * GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs |
| 21 | + * Volta Tensor Cores through native mma.sync and through WMMA API |
| 22 | + * Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions |
| 23 | + * Batched GEMM operations |
| 24 | + * Complex-valued GEMMs |
| 25 | + * Note: a host compiler supporting C++11 or greater is required. |
| 26 | + |
| 27 | +# CUTLASS 1.x |
| 28 | + |
3 | 29 | ## [1.3.2](https://github.com/NVIDIA/cutlass/releases/tag/v1.3.2) (2019-07-09)
|
4 | 30 | * Performance improvement for Volta Tensor Cores TN and TT layouts.
|
5 | 31 |
|
|
50 | 76 |
|
51 | 77 | ## Copyright
|
52 | 78 |
|
53 |
| -Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved. |
| 79 | +Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved. |
54 | 80 |
|
55 | 81 | ```
|
56 | 82 | Redistribution and use in source and binary forms, with or without modification, are permitted
|
|
0 commit comments