Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ void dispatchMoeGemmFinalDispatchTmaWarpSpecialized(
"MXFPX is not supported for the selected weight combination");
}

if constexpr (Arch::kMinComputeCapability >= 100 && Arch::kMinComputeCapability < 120) {
if constexpr (Arch::kMinComputeCapability >= 100 && Arch::kMinComputeCapability < 110) {
bool const dynamic_cga =
gemm_config.dynamic_cluster_shape != cutlass_extensions::ClusterShape::Undefined;
bool const swap_ab = hopper_input.swap_ab;
Expand Down Expand Up @@ -204,7 +204,8 @@ void dispatchMoeGemmFinalDispatchTmaWarpSpecialized(
gemm_config.epilogue_schedule, dynamic_cga, swap_ab);
selected_func(hopper_input, num_experts, multi_processor_count, stream, occupancy,
workspace_size, cluster_shape_cute, cluster_shape_cute_fallback);
} else if constexpr (Arch::kMinComputeCapability >= 120 || Arch::kMinComputeCapability == 90) {
} else if constexpr (Arch::kMinComputeCapability >= 120 || Arch::kMinComputeCapability == 90 ||
Arch::kMinComputeCapability == 110) {
using EpilogueSchedule = void; // These are hardcoded in the launcher
constexpr bool dynamic_cga = false;
auto selected_func =
Expand All @@ -225,10 +226,12 @@ void dispatchMoeGemmFinalDispatchTmaWarpSpecialized(
template <typename Arch, typename CtaShape, typename ClusterShape, typename DataType,
typename WeightType>
constexpr bool are_tile_shapes_supported_sm100() {
// We use a runtime cluster shape for SM100, so we only support 1x1x1 and 2x1x1 cluster shapes.
if (cute::size<0>(ClusterShape{}) > 2 || cute::size<1>(ClusterShape{}) != 1 ||
cute::size<2>(ClusterShape{}) != 1) {
return false;
if constexpr (Arch::kMinComputeCapability != 110) {
// We use a runtime cluster shape for SM100, so we only support 1x1x1 and 2x1x1 cluster shapes.
if (cute::size<0>(ClusterShape{}) > 2 || cute::size<1>(ClusterShape{}) != 1 ||
cute::size<2>(ClusterShape{}) != 1) {
return false;
}
}
Comment on lines +229 to 235
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this change fixes the regression for SM110, special-casing it within are_tile_shapes_supported_sm100 highlights a structural inconsistency that could affect maintainability.

The changes in dispatchMoeGemmFinalDispatchTmaWarpSpecialized correctly group SM110 with SM90 and SM120+, separating it from the SM100/103 path. However, the validation logic in the calling function are_tile_shapes_supported still groups SM110 with SM100/103, which necessitates this patch.

This discrepancy makes the code harder to reason about, as the logic for SM110 is fragmented. For better long-term code health, the validation paths in are_tile_shapes_supported should be refactored to align with the dispatch paths. I recommend creating a follow-up technical debt issue to address this.


using namespace cute;
Expand Down