diff --git a/README.md b/README.md
index fe816e3..dc227e1 100644
--- a/README.md
+++ b/README.md
@@ -26,197 +26,237 @@
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 -->
 
+# PyTorch (LibTorch) Backend
+
 [![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)
 
-# PyTorch (LibTorch) Backend
+The Triton backend for
+[PyTorch](https://github.com/pytorch/pytorch)
+is designed to run
+[TorchScript](https://pytorch.org/docs/stable/jit.html)
+models using the PyTorch C++ API.
+All models created in PyTorch using the python API must be traced/scripted to produce a TorchScript model.
+
+You can learn more about Triton backends in the
+[Triton Backend](https://github.com/triton-inference-server/backend)
+repository.
+
+Ask questions or report problems using
+[Triton Server issues](https://github.com/triton-inference-server/server/issues).
 
-The Triton backend for [PyTorch](https://github.com/pytorch/pytorch).
-You can learn more about Triton backends in the [backend
-repo](https://github.com/triton-inference-server/backend). Ask
-questions or report problems on the [issues
-page](https://github.com/triton-inference-server/server/issues).
-This backend is designed to run [TorchScript](https://pytorch.org/docs/stable/jit.html)
-models using the PyTorch C++ API. All models created in PyTorch
-using the python API must be traced/scripted to produce a TorchScript
-model.
-
-Where can I ask general questions about Triton and Triton backends?
-Be sure to read all the information below as well as the [general
-Triton documentation](https://github.com/triton-inference-server/server#triton-inference-server)
-available in the main [server](https://github.com/triton-inference-server/server)
-repo. If you don't find your answer there you can ask questions on the
-main Triton [issues page](https://github.com/triton-inference-server/server/issues).
+Be sure to read all the information below as well as the
+[general Triton documentation](https://github.com/triton-inference-server/server#triton-inference-server)
+available in the [Triton Server](https://github.com/triton-inference-server/server) repository.
 
 ## Build the PyTorch Backend
 
-Use a recent cmake to build. First install the required dependencies.
+Use a recent cmake to build.
+First install the required dependencies.
 
-```
-$ apt-get install rapidjson-dev python3-dev python3-pip
-$ pip3 install patchelf==0.17.2
+```bash
+apt-get install rapidjson-dev python3-dev python3-pip
+pip3 install patchelf==0.17.2
 ```
 
-An appropriate PyTorch container from [NGC](https://ngc.nvidia.com) must be used.
-For example, to build a backend that uses the 23.04 version of the PyTorch
-container from NGC:
+An appropriate PyTorch container from [NVIDIA NGC Catalog](https://ngc.nvidia.com) must be used.
+For example, to build a backend that uses the 23.04 version of the PyTorch container from NGC:
 
-```
-$ mkdir build
-$ cd build
-$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_PYTORCH_DOCKER_IMAGE="nvcr.io/nvidia/pytorch:23.04-py3" ..
-$ make install
+```bash
+mkdir build
+cd build
+cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_PYTORCH_DOCKER_IMAGE="nvcr.io/nvidia/pytorch:23.04-py3" ..
+make install
 ```
 
-The following required Triton repositories will be pulled and used in
-the build. By default, the "main" branch/tag will be used for each repo
-but the listed CMake argument can be used to override.
+The following required Triton repositories will be pulled and used in the build.
+By default, the `main` head will be used for each repository but the listed CMake argument can be used to override the value.
 
-* triton-inference-server/backend: -DTRITON_BACKEND_REPO_TAG=[tag]
-* triton-inference-server/core: -DTRITON_CORE_REPO_TAG=[tag]
-* triton-inference-server/common: -DTRITON_COMMON_REPO_TAG=[tag]
+* triton-inference-server/backend: `-DTRITON_BACKEND_REPO_TAG=[tag]`
+* triton-inference-server/core: `-DTRITON_CORE_REPO_TAG=[tag]`
+* triton-inference-server/common: `-DTRITON_COMMON_REPO_TAG=[tag]`
 
 ## Build the PyTorch Backend With Custom PyTorch
 
-Currently, Triton requires that a specially patched version of
-PyTorch be used with the PyTorch backend. The full source for
-these PyTorch versions are available as Docker images from
-[NGC](https://ngc.nvidia.com). For example, the PyTorch version
-compatible with the 25.09 release of Triton is available as
-nvcr.io/nvidia/pytorch:25.09-py3.
+Currently, Triton requires that a specially patched version of PyTorch be used with the PyTorch backend.
+The full source for these PyTorch versions are available as Docker images from
+[NGC](https://ngc.nvidia.com).
 
-Copy over the LibTorch and Torchvision headers and libraries from the
+For example, the PyTorch version compatible with the 25.09 release of Triton is available as `nvcr.io/nvidia/pytorch:25.09-py3` which supports PyTorch version `2.9.0a0`.
+
+> [!NOTE]
+> Additional details and version information can be found in the container's
+> [release notes](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-09.html#rel-25-09).
+
+Copy over the LibTorch and TorchVision headers and libraries from the
 [PyTorch NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch)
-into local directories. You can see which headers and libraries
-are needed/copied from the docker.
+into local directories.
+You can see which headers and libraries are needed/copied from the docker.
 
-```
-$ mkdir build
-$ cd build
-$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_PYTORCH_INCLUDE_PATHS="<PATH_PREFIX>/torch;<PATH_PREFIX>/torch/torch/csrc/api/include;<PATH_PREFIX>/torchvision" -DTRITON_PYTORCH_LIB_PATHS="<LIB_PATH_PREFIX>" ..
-$ make install
+```bash
+mkdir build
+cd build
+cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_PYTORCH_INCLUDE_PATHS="<PATH_PREFIX>/torch;<PATH_PREFIX>/torch/torch/csrc/api/include;<PATH_PREFIX>/torchvision" -DTRITON_PYTORCH_LIB_PATHS="<LIB_PATH_PREFIX>" ..
+make install
 ```
 
 ## Using the PyTorch Backend
 
-### Parameters
+### PyTorch 2.0 Models
 
-Triton exposes some flags to control the execution mode of the TorchScript models through
-the Parameters section of the model's `config.pbtxt` file.
+The model repository should look like:
 
-* `DISABLE_OPTIMIZED_EXECUTION`: Boolean flag to disable the optimized execution
-of TorchScript models. By default, the optimized execution is always enabled.
+```bash
+model_repository/
+`-- model_directory
+    |-- 1
+    |   |-- model.py
+    |   `-- [model.pt]
+    `-- config.pbtxt
+```
 
-The initial calls to a loaded TorchScript model take extremely long. Due to this longer
-model warmup [issue](https://github.com/pytorch/pytorch/issues/57894), Triton also allows
-execution of models without these optimizations. In some models, optimized execution
-does not benefit performance as seen [here](https://github.com/pytorch/pytorch/issues/19978)
-and in other cases impacts performance negatively, as seen [here](https://github.com/pytorch/pytorch/issues/53824).
+The `model.py` contains the class definition of the PyTorch model.
+The class should extend the
+[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
+The `model.pt` may be optionally provided which contains the saved
+[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
+of the model.
 
-The section of model config file specifying this parameter will look like:
+### TorchScript Models
 
-```
-parameters: {
-key: "DISABLE_OPTIMIZED_EXECUTION"
-    value: {
-    string_value: "true"
-    }
-}
+The model repository should look like:
+
+```bash
+model_repository/
+`-- model_directory
+    |-- 1
+    |   `-- model.pt
+    `-- config.pbtxt
 ```
 
-* `INFERENCE_MODE`: Boolean flag to enable the Inference Mode execution
-of TorchScript models. By default, the inference mode is enabled.
+The `model.pt` is the TorchScript model file.
 
-[InferenceMode](https://pytorch.org/cppdocs/notes/inference_mode.html) is a new
-RAII guard analogous to NoGradMode to be used when you are certain your operations
-will have no interactions with autograd. Compared to NoGradMode, code run under
-this mode gets better performance by disabling autograd.
+## Configuration
 
-Please note that in some models, InferenceMode might not benefit performance
-and in fewer cases might impact performance negatively.
+Triton exposes some flags to control the execution mode of the TorchScript models through the `Parameters` section of the model's `config.pbtxt` file.
 
-The section of model config file specifying this parameter will look like:
+### Parameters
 
-```
-parameters: {
-key: "INFERENCE_MODE"
-    value: {
-    string_value: "true"
-    }
-}
-```
+* `DISABLE_OPTIMIZED_EXECUTION`:
+  Boolean flag to disable the optimized execution of TorchScript models.
+  By default, the optimized execution is always enabled.
 
-* `DISABLE_CUDNN`: Boolean flag to disable the cuDNN library. By default, cuDNN is enabled.
+  The initial calls to a loaded TorchScript model take a significant amount of time.
+  Due to this longer model warmup
+  ([pytorch #57894](https://github.com/pytorch/pytorch/issues/57894)),
+  Triton also allows execution of models without these optimizations.
+  In some models, optimized execution does not benefit performance
+  ([pytorch #19978](https://github.com/pytorch/pytorch/issues/19978))
+  and in other cases impacts performance negatively
+  ([pytorch #53824](https://github.com/pytorch/pytorch/issues/53824)).
 
-[cuDNN](https://developer.nvidia.com/cudnn) is a GPU-accelerated library of primitives for
-deep neural networks. cuDNN provides highly tuned implementations for standard routines.
+  The section of model config file specifying this parameter will look like:
 
-Typically, models run with cuDNN enabled are faster. However there are some exceptions
-where using cuDNN can be slower, cause higher memory usage or result in errors.
+  ```proto
+  parameters: {
+    key: "DISABLE_OPTIMIZED_EXECUTION"
+    value: { string_value: "true" }
+  }
+  ```
 
+* `INFERENCE_MODE`:
 
-The section of model config file specifying this parameter will look like:
+  Boolean flag to enable the Inference Mode execution of TorchScript models.
+  By default, the inference mode is enabled.
 
-```
-parameters: {
-key: "DISABLE_CUDNN"
-    value: {
-    string_value: "true"
-    }
-}
-```
+  [InferenceMode](https://pytorch.org/cppdocs/notes/inference_mode.html) is a new RAII guard analogous to `NoGradMode` to be used when you are certain your operations will have no interactions with autograd.
+  Compared to `NoGradMode`, code run under this mode gets better performance by disabling autograd.
 
-* `ENABLE_WEIGHT_SHARING`: Boolean flag to enable model instances on the same device to
-share weights. This optimization should not be used with stateful models. If not specified,
-weight sharing is disabled.
+  Please note that in some models, InferenceMode might not benefit performance and in fewer cases might impact performance negatively.
 
-The section of model config file specifying this parameter will look like:
+  To enable inference mode, use the configuration example below:
 
-```
-parameters: {
-key: "ENABLE_WEIGHT_SHARING"
-    value: {
-    string_value: "true"
-    }
-}
-```
+  ```proto
+  parameters: {
+    key: "INFERENCE_MODE"
+    value: { string_value: "true" }
+  }
+  ```
 
-* `ENABLE_CACHE_CLEANING`: Boolean flag to enable CUDA cache cleaning after each model execution.
-If not specified, cache cleaning is disabled. This flag has no effect if model is on CPU.
-Setting this flag to true will negatively impact the performance due to additional CUDA cache
-cleaning operation after each model execution. Therefore, you should only use this flag if you
-serve multiple models with Triton and encounter CUDA out of memory issue during model executions.
+* `DISABLE_CUDNN`:
 
-The section of model config file specifying this parameter will look like:
+  Boolean flag to disable the cuDNN library.
+  By default, cuDNN is enabled.
 
-```
-parameters: {
-key: "ENABLE_CACHE_CLEANING"
-    value: {
-    string_value:"true"
-    }
-}
-```
+  [cuDNN](https://developer.nvidia.com/cudnn) is a GPU-accelerated library of primitives for deep neural networks.
+  It provides highly tuned implementations for standard routines.
+
+  Typically, models run with cuDNN enabled execute faster.
+  However there are some exceptions where using cuDNN can be slower, cause higher memory usage, or result in errors.
+
+  To disable cuDNN, use the configuration example below:
+
+  ```proto
+  parameters: {
+    key: "DISABLE_CUDNN"
+    value: { string_value: "true" }
+  }
+  ```
+
+* `ENABLE_WEIGHT_SHARING`:
+
+  Boolean flag to enable model instances on the same device to share weights.
+  This optimization should not be used with stateful models.
+  If not specified, weight sharing is disabled.
+
+  To enable weight sharing, use the configuration example below:
+
+  ```proto
+  parameters: {
+    key: "ENABLE_WEIGHT_SHARING"
+    value: { string_value: "true" }
+  }
+  ```
+
+* `ENABLE_CACHE_CLEANING`:
+
+  Boolean flag to enable CUDA cache cleaning after each model execution.
+  If not specified, cache cleaning is disabled.
+  This flag has no effect if model is on CPU.
+
+  Setting this flag to true will likely negatively impact the performance due to additional CUDA cache cleaning operation after each model execution.
+  Therefore, you should only use this flag if you serve multiple models with Triton and encounter CUDA out-of-memory issues during model executions.
+
+  To enable cleaning of the CUDA cache after every execution, use the configuration example below:
+
+  ```proto
+  parameters: {
+    key: "ENABLE_CACHE_CLEANING"
+    value: { string_value: "true" }
+  }
+  ```
 
 * `INTER_OP_THREAD_COUNT`:
 
-PyTorch allows using multiple CPU threads during TorchScript model inference.
-One or more inference threads execute a model's forward pass on the given
-inputs. Each inference thread invokes a JIT interpreter that executes the ops
-of a model inline, one by one. This parameter sets the size of this thread
-pool. The default value of this setting is the number of cpu cores. Please refer
-to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
-document on how to set this parameter properly.
+  PyTorch allows using multiple CPU threads during TorchScript model inference.
+  One or more inference threads execute a model’s forward pass on the given inputs.
+  Each inference thread invokes a JIT interpreter that executes the ops of a model inline, one by one.
 
-The section of model config file specifying this parameter will look like:
+  This parameter sets the size of this thread pool.
+  The default value of this setting is the number of cpu cores.
 
-```
-parameters: {
-key: "INTER_OP_THREAD_COUNT"
-    value: {
-    string_value:"1"
-    }
-}
-```
+  > [!TIP]
+  > Refer to
+  > [CPU Threading TorchScript](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
+  > on how to set this parameter properly.
+
+  To set the inter-op thread count, use the configuration example below:
+
+  ```proto
+  parameters: {
+    key: "INTER_OP_THREAD_COUNT"
+    value: { string_value: "1" }
+  }
+  ```
 
 > [!NOTE]
 > This parameter is set globally for the PyTorch backend.
@@ -225,70 +265,68 @@ key: "INTER_OP_THREAD_COUNT"
 
 * `INTRA_OP_THREAD_COUNT`:
 
-In addition to the inter-op parallelism, PyTorch can also utilize multiple threads
-within the ops (intra-op parallelism). This can be useful in many cases, including
-element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and
-others. The default value for this setting is the number of CPU cores. Please refer
-to [this](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
-document on how to set this parameter properly.
+  In addition to the inter-op parallelism, PyTorch can also utilize multiple threads within the ops (intra-op parallelism).
+  This can be useful in many cases, including element-wise ops on large tensors, convolutions, GEMMs, embedding lookups and others.
 
-The section of model config file specifying this parameter will look like:
+  The default value for this setting is the number of CPU cores.
 
-```
-parameters: {
-key: "INTRA_OP_THREAD_COUNT"
-    value: {
-    string_value:"1"
-    }
-}
-```
+  > [!TIP]
+  > Refer to
+  > [CPU Threading TorchScript](https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html)
+  > on how to set this parameter properly.
 
-> [!NOTE]
-> This parameter is set globally for the PyTorch backend.
-> The value from the first model config file that specifies this parameter will be used.
-> Subsequent values from other model config files, if different, will be ignored.
+  To set the intra-op thread count, use the configuration example below:
+
+  ```proto
+  parameters: {
+    key: "INTRA_OP_THREAD_COUNT"
+    value: { string_value: "1" }
+  }
+  ```
 
-* Additional Optimizations: Three additional boolean parameters are available to disable
-certain Torch optimizations that can sometimes cause latency regressions in models with
-complex execution modes and dynamic shapes. If not specified, all are enabled by default.
+* **Additional Optimizations**:
+
+  Three additional boolean parameters are available to disable certain Torch optimizations that can sometimes cause latency regressions in models with complex execution modes and dynamic shapes.
+  If not specified, all are enabled by default.
 
     `ENABLE_JIT_EXECUTOR`
 
     `ENABLE_JIT_PROFILING`
 
-### PyTorch 2.0 Models
+### Model Instance Group Kind
 
-The model repository should look like:
+The PyTorch backend supports the following kinds of
+[Model Instance Groups](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
+where the input tensors are placed as follows:
 
-```bash
-model_repository/
-`-- model_directory
-    |-- 1
-    |   |-- model.py
-    |   `-- [model.pt]
-    `-- config.pbtxt
-```
+* `KIND_GPU`:
 
-The `model.py` contains the class definition of the PyTorch model.
-The class should extend the
-[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
-The `model.pt` may be optionally provided which contains the saved
-[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
-of the model.
+  Inputs are prepared on the GPU device associated with the model instance.
 
-### TorchScript Models
+* `KIND_CPU`:
 
-The model repository should look like:
+  Inputs are prepared on the CPU.
 
-```bash
-model_repository/
-`-- model_directory
-    |-- 1
-    |   `-- model.pt
-    `-- config.pbtxt
-```
+* `KIND_MODEL`:
 
-The `model.pt` is the TorchScript model file.
+  Inputs are prepared on the CPU.
+  When loading the model, the backend does not choose the GPU device for the model;
+  instead, it respects the device(s) specified in the model and uses them as they are during inference.
+
+  This is useful when the model internally utilizes multiple GPUs, as demonstrated in
+  [this example model](https://github.com/triton-inference-server/server/blob/main/qa/L0_libtorch_instance_group_kind_model/gen_models.py).
+
+  > [!IMPORTANT]
+  > If a device is not specified in the model, the backend uses the first available GPU device.
+
+To set the model instance group, use the configuration example below:
+
+```proto
+instance_group {
+   count: 2
+   kind: KIND_GPU
+}
+```
 
 ### Customization
 
@@ -329,69 +367,46 @@ parameters: {
 }
 ```
 
-### Support
+## Important Notes
 
-#### Model Instance Group Kind
+* The execution of PyTorch model on GPU is asynchronous in nature.
+  See
+  [CUDA Asynchronous Execution](https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution)
+  for additional details.
+  Consequently, an error in PyTorch model execution may be raised during the next few inference requests to the server.
+  Setting environment variable `CUDA_LAUNCH_BLOCKING=1` when launching server will help in correctly debugging failing cases by forcing synchronous execution.
 
-The PyTorch backend supports the following kinds of
-[Model Instance Groups](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
-where the input tensors are placed as follows:
+  * The PyTorch model in such cases may or may not recover from the failed state and a restart of the server may be required to continue serving successfully.
 
-* `KIND_GPU`: Inputs are prepared on the GPU device associated with the model
-instance.
-
-* `KIND_CPU`: Inputs are prepared on the CPU.
-
-* `KIND_MODEL`: Inputs are prepared on the CPU. When loading the model, the
-backend does not choose the GPU device for the model; instead, it respects the
-device(s) specified in the model and uses them as they are during inference.
-This is useful when the model internally utilizes multiple GPUs, as demonstrated
-in this
-[example model](https://github.com/triton-inference-server/server/blob/main/qa/L0_libtorch_instance_group_kind_model/gen_models.py).
-If no device is specified in the model, the backend uses the first available
-GPU device. This feature is available starting in the 23.06 release.
-
-### Important Notes
-
-* The execution of PyTorch model on GPU is asynchronous in nature. See
-  [here](https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution)
-  for more details. Consequently, an error in PyTorch model execution may
-  be raised during the next few inference requests to the server. Setting
-  environment variable `CUDA_LAUNCH_BLOCKING=1` when launching server will
-  help in correctly debugging failing cases by forcing synchronous execution.
-  * The PyTorch model in such cases may or may not recover from the failed
-    state and a restart of the server may be required to continue serving
-    successfully.
-
-* PyTorch does not support Tensor of Strings but it does support models that
-accept a List of Strings as input(s) / produces a List of String as output(s).
-For these models Triton allows users to pass String input(s)/receive String
-output(s) using the String datatype. As a limitation of using List instead of
-Tensor for String I/O, only for 1-dimensional input(s)/output(s) are supported
-for I/O of String type.
+* PyTorch does not support Tensor of Strings but it does support models that accept a List of Strings as input(s) / produces a List of String as output(s).
+  For these models Triton allows users to pass String input(s)/receive String output(s) using the String datatype.
+  As a limitation of using List instead of Tensor for String I/O, only for 1-dimensional input(s)/output(s) are supported for I/O of String type.
 
 * In a multi-GPU environment, a potential runtime issue can occur when using
-[Tracing](https://pytorch.org/docs/stable/generated/torch.jit.trace.html)
-to generate a
-[TorchScript](https://pytorch.org/docs/stable/jit.html) model. This issue
-arises due to a device mismatch between the model instance and the tensor. By
-default, Triton creates a single execution instance of the model for each
-available GPU. The runtime error occurs when a request is sent to a model
-instance with a different GPU device from the one used during the TorchScript
-generation process. To address this problem, it is highly recommended to use
-[Scripting](https://pytorch.org/docs/stable/generated/torch.jit.script.html#torch.jit.script)
-instead of Tracing for model generation in a multi-GPU environment. Scripting
-avoids the device mismatch issue and ensures compatibility with different GPUs
-when used with Triton. However, if using Tracing is unavoidable, there is a
-workaround available. You can explicitly specify the GPU device for the model
-instance in the
-[model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
-to ensure that the model instance and the tensors used for inference are
-assigned to the same GPU device as on which the model was traced.
-
-* Python functions optimizable by `torch.compile` may not be served directly in the `model.py` file, they need to be enclosed by a class extending the
-  [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
+  [Tracing](https://pytorch.org/docs/stable/generated/torch.jit.trace.html)
+  to generate a
+  [TorchScript](https://pytorch.org/docs/stable/jit.html)
+  model.
+  This issue arises due to a device mismatch between the model instance and the tensor.
 
-* Model weights cannot be shared across multiple instances on the same GPU device.
+  By default, Triton creates a single execution instance of the model for each available GPU.
+  The runtime error occurs when a request is sent to a model instance with a different GPU device from the one used during the TorchScript generation process.
+
+  To address this problem, it is highly recommended to use
+  [Scripting](https://pytorch.org/docs/stable/generated/torch.jit.script.html#torch.jit.script)
+  instead of Tracing for model generation in a multi-GPU environment.
+  Scripting avoids the device mismatch issue and ensures compatibility with different GPUs when used with Triton.
+
+  However, if using Tracing is unavoidable, there is a workaround available.
+  You can explicitly specify the GPU device for the model instance in the
+  [model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
+  to ensure that the model instance and the tensors used for inference are assigned to the same GPU device as on which the model was traced.
 
 * When using `KIND_MODEL` as model instance kind, the default device of the first parameter on the model is used.
+
+> [!WARNING]
+>
+> * Python functions optimizable by `torch.compile` may not be served directly in the `model.py` file, they need to be enclosed by a class extending the
+  [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
+>
+> * Model weights cannot be shared across multiple instances on the same GPU device.