Skip to content

Commit 04fc444

Browse files
committed
docs(sagemaker): add README section and enhance integ test for serverless inference
- Add comprehensive serverless inference documentation to SageMaker alpha README - Update integration test with serverless endpoint configuration examples - Include verification comments for both instance-based and serverless endpoints - Generate CloudFormation snapshots with proper ServerlessConfig properties Addresses reviewer feedback requiring README documentation and integration test coverage for the new serverless inference feature.
1 parent 78ef21c commit 04fc444

File tree

12 files changed

+1043
-1334
lines changed

12 files changed

+1043
-1334
lines changed

packages/@aws-cdk/aws-sagemaker-alpha/README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,38 @@ const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
214214
});
215215
```
216216

217+
### Serverless Inference
218+
219+
Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.
220+
221+
To create a serverless endpoint configuration, use the `serverlessProductionVariant` property:
222+
223+
```typescript
224+
import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';
225+
226+
declare const model: sagemaker.Model;
227+
228+
const endpointConfig = new sagemaker.EndpointConfig(this, 'ServerlessEndpointConfig', {
229+
serverlessProductionVariant: {
230+
model: model,
231+
variantName: 'serverlessVariant',
232+
maxConcurrency: 10,
233+
memorySizeInMB: 2048,
234+
provisionedConcurrency: 5, // optional
235+
},
236+
});
237+
```
238+
239+
Serverless inference is ideal for workloads with intermittent or unpredictable traffic patterns. You can configure:
240+
241+
- `maxConcurrency`: Maximum concurrent invocations (1-200)
242+
- `memorySizeInMB`: Memory allocation in 1GB increments (1024, 2048, 3072, 4096, 5120, or 6144 MB)
243+
- `provisionedConcurrency`: Optional pre-warmed capacity to reduce cold starts
244+
245+
**Note**: Provisioned concurrency incurs charges even when the endpoint is not processing requests. Use it only when you need to minimize cold start latency.
246+
247+
You cannot mix serverless and instance-based variants in the same endpoint configuration.
248+
217249
### Endpoint
218250

219251
When you create an endpoint from an `EndpointConfig`, Amazon SageMaker launches the ML compute
Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
File renamed without changes.
File renamed without changes.

packages/@aws-cdk/aws-sagemaker-alpha/test/integ.endpoint-config.js.snapshot/aws-cdk-sagemaker-endpointconfig.assets.json

Lines changed: 12 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)