Skip to content

sagemaker: Support serverless variants for endpoints #23148

@petermeansrock

Description

@petermeansrock

Describe the feature

As described in the SageMaker Endpoint L2 construct RFC:

Serverless Inference: By default, upon endpoint deployment, SageMaker will provision EC2 instances (managed by SageMaker) for hosting purposes. To shield customers from the complexity of forecasting fleet sizes, the ServerlessConfig attribute was added to the ProductionVariant CloudFormation structure of an endpoint config resource. This configuration removes the need for customers to specify instance-specific settings (e.g., instance count, instance type), abstracting the runtime compute from customers, much in the same way Lambda does for its customers.

Please 👍 this issue to help with the prioritization of this feature.

Use Case

"Amazon SageMaker Serverless Inference is ideal for applications with intermittent or unpredictable traffic." (link)

Proposed Solution

As described in the SageMaker Endpoint L2 construct RFC:

In preparation for the addition of this feature into the CDK, all concrete production variant related classes and attributes have been prefixed with the string [Ii]nstance to designate that they are only associated with instance-based hosting. When later adding serverless support to the SageMaker module, [Ss]erverless-prefixed analogs can be created with attributes appropriate for the use-case with appropriate plumbing to the L1 constructs. Note, there are a number of features which do not yet work with serverless variants, so it may be necessary to incorporate a number of new synthesis-time checks or compile-time contracts to guard against mixing incompatible features. For example, as discussed with the bar raiser, alongside the proposed EndpointConfigProps attribute instanceProductionVariants?: InstanceProductionVariantProps[], a new mutually exclusive attribute serverlessProductionVariant?: ServerlessProductionVariantProps (as only a single variant is supported with serverless inference) could be added with a synthesis-time check confirming that the customer hasn't configured both instance-based and serverless production variants.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.54.0-alpha.0

Environment details (OS name and version, etc.)

macOS Ventura

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-sagemakerRelated to AWS SageMakereffort/mediumMedium work item – several days of effortfeature-requestA feature should be added or improved.p1

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions