aws
diff --git a/‎packages/@aws-cdk/aws-sagemaker-alpha/README.md‎
Lines changed: 32 additions & 0 deletions b/‎packages/@aws-cdk/aws-sagemaker-alpha/README.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.442a71de95281cb26bd41da567c79060206108b97bdde93cb4ce5f213f50013a/Dockerfile‎ renamed to ‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.98e27853307092de1d03c86e89a5ead7aab9f8ea8f6722e4f113f04f34a329fd/Dockerfile‎
Lines changed: 1 addition & 1 deletion b/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.442a71de95281cb26bd41da567c79060206108b97bdde93cb4ce5f213f50013a/Dockerfile‎ renamed to ‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.98e27853307092de1d03c86e89a5ead7aab9f8ea8f6722e4f113f04f34a329fd/Dockerfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.442a71de95281cb26bd41da567c79060206108b97bdde93cb4ce5f213f50013a/index.html‎ renamed to ‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.98e27853307092de1d03c86e89a5ead7aab9f8ea8f6722e4f113f04f34a329fd/index.html‎ b/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.442a71de95281cb26bd41da567c79060206108b97bdde93cb4ce5f213f50013a/index.html‎ renamed to ‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.98e27853307092de1d03c86e89a5ead7aab9f8ea8f6722e4f113f04f34a329fd/index.html‎
diff --git a/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.442a71de95281cb26bd41da567c79060206108b97bdde93cb4ce5f213f50013a/index.py‎ renamed to ‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.98e27853307092de1d03c86e89a5ead7aab9f8ea8f6722e4f113f04f34a329fd/index.py‎ b/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.442a71de95281cb26bd41da567c79060206108b97bdde93cb4ce5f213f50013a/index.py‎ renamed to ‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/asset.98e27853307092de1d03c86e89a5ead7aab9f8ea8f6722e4f113f04f34a329fd/index.py‎
diff --git a/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/aws-cdk-sagemaker-endpointconfig.assets.json‎
Lines changed: 12 additions & 9 deletions b/‎packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/aws-cdk-sagemaker-endpointconfig.assets.json‎
Lines changed: 12 additions & 9 deletions
@@ -214,6 +214,38 @@ const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
 });
 ```
 
+### Serverless Inference
+
+Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.
+
+To create a serverless endpoint configuration, use the `serverlessProductionVariant` property:
+
+```typescript
+import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';
+
+declare const model: sagemaker.Model;
+
+const endpointConfig = new sagemaker.EndpointConfig(this, 'ServerlessEndpointConfig', {
+  serverlessProductionVariant: {
+    model: model,
+    variantName: 'serverlessVariant',
+    maxConcurrency: 10,
+    memorySizeInMB: 2048,
+    provisionedConcurrency: 5, // optional
+  },
+});
+```
+
+Serverless inference is ideal for workloads with intermittent or unpredictable traffic patterns. You can configure:
+
+- `maxConcurrency`: Maximum concurrent invocations (1-200)
+- `memorySizeInMB`: Memory allocation in 1GB increments (1024, 2048, 3072, 4096, 5120, or 6144 MB)
+- `provisionedConcurrency`: Optional pre-warmed capacity to reduce cold starts
+
+**Note**: Provisioned concurrency incurs charges even when the endpoint is not processing requests. Use it only when you need to minimize cold start latency.
+
+You cannot mix serverless and instance-based variants in the same endpoint configuration.
+
 ### Endpoint
 
 When you create an endpoint from an `EndpointConfig`, Amazon SageMaker launches the ML compute