Document known issues (#29)

123epsilon · brucekimrokcmu · web-flow · commit 8fcbac138777 · 2023-08-15T11:30:55.000-04:00
This includes a markdown file with documentation for some known issues.
The three included here comprise the vast majority of all errors that we
encounter when attempting to import novel `nn.Module` instances. Each
one also includes a minimal reproducible example. The vast majority of
other errors (at least in the tests that I have sampled) have to do with
unimplemented ops which can either be dealt with via including
decompositions or implementing ops in upstream torch-mlir.

Note: I included a small tweak to the importer, adding the ability to
convert `None` to the appropriate `!torch.none` in our `TypeSubclassMap`
because it 1) obscures the real issue in one of these cases and 2) is
probably something we want there anyways.

---------

Co-authored-by: brucekimrokcmu &lt;kwangkyk@alumni.cmu.edu&gt;
diff --git a/python/shark_turbine/dynamo/importer.py b/python/shark_turbine/dynamo/importer.py
@@ -7,6 +7,7 @@
 import logging
 import operator
 import re
+from types import NoneType
 from typing import Any, Dict, List, Optional, Sequence, Set, Tuple
 
 from iree.compiler.ir import (
@@ -490,7 +491,7 @@ def _import_list_argument(self, loc: Location, arg):
             operand_type = type(operand)
             if not isinstance(operand, arg_type):
                 raise TypeError(
-                    f"Lists with multiple types are not supported, got: {arg_type}, {operand_type}"
+                    f"Heterogeneous lists are not supported: expected {arg_type}, got {operand_type}"
                 )
 
             if isinstance(operand, torch.fx.Node):
@@ -588,7 +589,7 @@ def _make_constant_op(
 
 LITERAL_CONVERTER_MAP = TypeSubclassMap()
 LITERAL_CONVERTER_MAP.map(
-    type(None),
+    NoneType,
     lambda arg, gni, cc: Operation.create(
         "torch.constant.none", results=[cc.torch_none_type]
     ).result,
@@ -654,6 +655,7 @@ def _make_constant_op(
     float: "!torch.float",
     str: "!torch.str",
     bool: "!torch.bool",
+    NoneType: "!torch.none",
 }
 
 # AOT-autograd sometimes falsely emit tensor version op with scalar arguments.
diff --git a/python/shark_turbine/known_issues.md b/python/shark_turbine/known_issues.md
@@ -0,0 +1,59 @@
+# Known Issues in SHARK-Turbine
+
+## Handling lists of optional types
+```py
+from torch import nn
+class foomod(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
+    def forward(self, x):
+        return self.up(x)
+```
+```
+# occuring in importer -> import_list_arguments
+compiler_fn raised TypeError: Heterogeneous lists are not supported: expected <class 'NoneType'>, got <class 'torch.fx.node.Node'>
+```
+An example is attempting to import `nn.Upsample`. This module internally makes a call to `F.interpolate` which eventually 
+calls `aten.index.Tensor` whose [second argument](https://github.com/llvm/torch-mlir/blob/50f5b658b6dc50f664d78c89c403149b064fb59b/include/torch-mlir/Dialect/Torch/IR/GeneratedTorchOps.td#L7389C46-L7389C46) is an
+optional list of tensors. If indices in a few dimensions are omitted in favor of `None`, we get an error. In reality these values
+should have an `AnyTorchOptionalTensorType` type, we need a way to set optional types when importing lists in this scenario.
+
+
+## Dealing with functional variants of Torch Ops
+
+```py
+import torch.nn.functional as F
+def forward(self, x):
+    return F.max_pool2d(8, x)
+```
+```
+# occuring in importer -> import_list_arguments
+compiler_fn raised IndexError: list index out of range
+```
+
+Currently, we have issues dealing with functional variants of
+torch operations that do not define meaningful defaults for their arguments.
+Two common operations for which this issue arises are `F.avg_pool2d` and `F.max_pool2d`.
+Taking `max_pool2d` as an example, the [functional version](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool2d.html) sets `stride=None` by default (which returns an empty list to the importer), 
+however, the actual intended default setting is to set `stride=kernel_size`. This issue does not occur with the corresponding `nn.Module` wrapper `MaxPool2d` because
+it actually [manually sets the intended default value](https://pytorch.org/docs/stable/_modules/torch/nn/modules/pooling.html#_MaxPoolNd). The same issue is at play in `avg_pool2d`.
+
+
+## Ephemeral Tensor objects from `aten.lift_fresh_copy`
+```py
+def forward(self, x, y):
+    x[y == 1] = 2
+```
+```
+# in importer -> import_argument
+torch._dynamo.exc.BackendCompilerFailed: compiler_fn raised KeyError: (_tensor_constant0, 0)
+```
+This error arises due to an odd case in the Fx Graph generation where the
+graph module for our code generates a node `_tensor_constant0 = self._tensor_constant0` with no traceable origin within
+the graph. This means that our lookup for the appropriate MlirValue in the importer's `_v` table fails. This consistently
+occurs when the graph generates an intermediate `aten.lift_fresh_copy` as in the boolean indexing example above.
+The same error occurs in the expectedFailure test cases of `list(tensor_data)` and `tensor_data.tolist()`.
+
+There is an existing issue in PyTorch that is tracking this problem in the `aot-eager` backend: https://github.com/pytorch/pytorch/issues/105327.
+This issue arises because this particular op is not handled in the PyTorch dispatch logic, and is instead suppresed [here](https://github.com/pytorch/pytorch/blob/ddf36c82b83b2db3be7ce7a85d4aea3507c9d7ef/torch/_dispatch/python.py#L108)