Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 24 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This repository is part of [Intelligent Document Processing with AWS AI Services

Documents contain valuable information and come in various shapes and forms. In most cases, you are manually processing these documents which is time consuming, prone to error, and expensive. Not only do you want this information extracted quickly but you also want to automate business processes that presently rely on manual inputs and intervention across various file types and formats.

To help you overcome these challenges, AWS Machine Learning (ML) now provides you choices when it comes to extracting information from complex content in any document format such as insurance claims, mortgages, healthcare claims, contracts, and legal contracts.
To help you overcome these challenges, AWS Machine Learning (ML) now provides you choices when it comes to extracting information from complex content in any document format such as insurance claims, mortgages, healthcare claims, contracts, and legal contracts.

## Different phases of Intelligent Document Processing pipeline

Expand All @@ -25,55 +25,56 @@ In order to be able to execute all the Jupyter Notebooks in this sample, we will
> :warning: Your AWS account **must have a default VPC** for this CloudFormation template to work.
> Your AWS account may incur some nominal charges for SageMaker Studio domain, Amazon Textract, and Amazon Comprehend. However, Amazon Textract, Comprehend, and SageMaker are free to try as part of [AWS Free Tier](https://aws.amazon.com/free/).

* Navigate to AWS Console
* Search for CloudFormation in the "Services" search bar
* Once in the CloudFormation console, click on the "Create Stack" button (use the "With new resources option")
* In the "Create Stack" wizard, chose "Template is ready", then select "Upload a template file"
- Navigate to AWS Console
- Search for CloudFormation in the "Services" search bar
- Once in the CloudFormation console, click on the "Create Stack" button (use the "With new resources option")
- In the "Create Stack" wizard, chose "Template is ready", then select "Upload a template file"
<p align="center">
<img src="./images/cfn1.png" alt="cfn1"/>
<img src="./images/cfn1.png" alt="cfn1"/![alt text](image.png)>
</p>

* Upload the [provided](./dist/idp-deploy.yaml) `yaml` file, click "Next"
* In the "Specify stack details" screen, enter "Stack name". Click "Next"
- Upload the [provided](./dist/idp-deploy.yaml) `yaml` file, click "Next"
- In the "Specify stack details" screen, enter "Stack name". Click "Next"
<p align="center">
<img src="./images/cfn2.png" alt="cfn2"/>
</p>

* In the "Configure Stack options" screen, leave the configurations as-is. Click "Next"
* In the "Review" screen, scroll down to the bottom of the page to the "Capabilities" section and acknowledge the notice that the stack is going to create required IAM Roles by checking the check box. Click "Create stack".
- In the "Configure Stack options" screen, scroll down to the bottom of the page to the "Capabilities" section and acknowledge the notice that the stack is going to create required IAM Roles by checking the check box. Click "Next".
- In the "Review and create screen, leave the configurations as-is. Click "Submit"
<p align="center">
<img src="./images/cfn3.png" alt="cfn3"/>
</p>

The stack creation can take upto 30 minutes. Once your SageMaker domain is created, you can navigate to the SageMaker console and click on "Amazon SageMaker Studio" on the left pane of the screen. Choose the default user created "SageMakerUser" and Click on "Launch Studio". This will open the SageMaker Studio IDE in a new browser tab. NOTE: If this is your first time using SageMaker Studio then it may take some time for the IDE to fully launch.
The stack creation can take upto 30 minutes. Once your SageMaker domain is created, you can navigate to the SageMaker AI console and click on "Studio" on the left pane of the screen. Choose the default user created "SageMakerUser" and Click on "Open Studio". This will open the SageMaker Studio IDE in a new browser tab. NOTE: If this is your first time using SageMaker Studio then it may take some time for the IDE to fully launch.

<p align="center">
<img src="./images/cfn4.png" alt="cfn4"/>
</p>

## Setup SageMaker Studio

Once the SageMaker Studio IDE has fully loaded in your browser, you can clone this repository into the SageMaker Domain instance and start working on the provided Jupyter Notebooks. To clone this repository-
Once the SageMaker Studio IDE has fully loaded in your browser, you navigate to the JupyterLab Application in the top left of the IDE window:

<p align="center">
<img src="./images/apps.png" alt="apps"/>
</p>

And then you can start the JupyterLab environment:

* On the SageMaker Studio IDE, click on "File menu > New > Terminal". This will open a terminal window within SageMaker Studio.
<p align="center">
<img src="./images/sm1.png" alt="sm1"/>
<img src="./images/jupyterlab.png" alt="jupyterlab"/>
</p>

* By default, the terminal launches at the root of the SageMaker Studio IDE workspace.
* Next, clone this repository using
During deployment, the repository is cloned automatically, but you can also clone manually if needed by using the following command in the JupyterLab terminal:

```
```
git clone https://github.com/aws-samples/aws-ai-intelligent-document-processing idp_workshop
```

* Once the repository is cloned, a direcotry named `idp_workshop` will appear in the "File Browser" on the left panel of SageMaker Studio IDE
* You can now access the Jupyter Notebooks inside the directory and start working on them.
Once the repository is cloned, a directory named `idp_workshop` will appear in the "File Browser" on the left panel of SageMaker Studio IDE

You're all set to begin the workshop!
You can now access the Jupyter Notebooks inside the directory and start working on them. You're all set to begin the workshop!

## License

This library is licensed under the MIT-0 License. See the LICENSE file.



161 changes: 39 additions & 122 deletions dist/idp-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,85 +12,7 @@ Parameters:
Description: The domain name of the Sagemaker studio instance
Default: 'IDPSagemakerDomain'

Mappings:
JupyterMap:
us-east-1:
jupyterimage: "arn:aws:sagemaker:us-east-1:081325390199:image/jupyter-server-3"
us-east-2:
jupyterimage: "arn:aws:sagemaker:us-east-2:429704687514:image/jupyter-server-3"
us-west-1:
jupyterimage: "arn:aws:sagemaker:us-west-1:742091327244:image/jupyter-server-3"
us-west-2:
jupyterimage: "arn:aws:sagemaker:us-west-2:236514542706:image/jupyter-server-3"
af-south-1:
jupyterimage: "arn:aws:sagemaker:af-south-1:559312083959:image/jupyter-server-3"
ap-east-1:
jupyterimage: "arn:aws:sagemaker:ap-east-1:493642496378:image/jupyter-server-3"
ap-south-1:
jupyterimage: "arn:aws:sagemaker:ap-south-1:394103062818:image/jupyter-server-3"
ap-northeast-2:
jupyterimage: "arn:aws:sagemaker:ap-northeast-2:806072073708:image/jupyter-server-3"
ap-southeast-1:
jupyterimage: "arn:aws:sagemaker:ap-southeast-1:492261229750:image/jupyter-server-3"
ap-southeast-2:
jupyterimage: "arn:aws:sagemaker:ap-southeast-2:452832661640:image/jupyter-server-3"
ap-northeast-1:
jupyterimage: "arn:aws:sagemaker:ap-northeast-1:102112518831:image/jupyter-server-3"
ca-central-1:
jupyterimage: "arn:aws:sagemaker:ca-central-1:310906938811:image/jupyter-server-3"
eu-central-1:
jupyterimage: "arn:aws:sagemaker:eu-central-1:936697816551:image/jupyter-server-3"
eu-west-1:
jupyterimage: "arn:aws:sagemaker:eu-west-1:470317259841:image/jupyter-server-3"
eu-west-2:
jupyterimage: "arn:aws:sagemaker:eu-west-2:712779665605:image/jupyter-server-3"
eu-west-3:
jupyterimage: "arn:aws:sagemaker:eu-west-3:615547856133:image/jupyter-server-3"
eu-north-1:
jupyterimage: "arn:aws:sagemaker:eu-north-1:243637512696:image/jupyter-server-3"
eu-south-1:
jupyterimage: "arn:aws:sagemaker:eu-south-1:592751261982:image/jupyter-server-3"
sa-east-1:
jupyterimage: "arn:aws:sagemaker:sa-east-1:782484402741:image/jupyter-server-3"
RegionMap:
us-east-1:
datascience: "arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
us-east-2:
datascience: "arn:aws:sagemaker:us-east-2:429704687514:image/datascience-1.0"
us-west-1:
datascience: "arn:aws:sagemaker:us-west-1:742091327244:image/datascience-1.0"
us-west-2:
datascience: "arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0"
af-south-1:
datascience: "arn:aws:sagemaker:af-south-1:559312083959:image/datascience-1.0"
ap-east-1:
datascience: "arn:aws:sagemaker:ap-east-1:493642496378:image/datascience-1.0"
ap-south-1:
datascience: "arn:aws:sagemaker:ap-south-1:394103062818:image/datascience-1.0"
ap-northeast-2:
datascience: "arn:aws:sagemaker:ap-northeast-2:806072073708:image/datascience-1.0"
ap-southeast-1:
datascience: "arn:aws:sagemaker:ap-southeast-1:492261229750:image/datascience-1.0"
ap-southeast-2:
datascience: "arn:aws:sagemaker:ap-southeast-2:452832661640:image/datascience-1.0"
ap-northeast-1:
datascience: "arn:aws:sagemaker:ap-northeast-1:102112518831:image/datascience-1.0"
ca-central-1:
datascience: "arn:aws:sagemaker:ca-central-1:310906938811:image/datascience-1.0"
eu-central-1:
datascience: "arn:aws:sagemaker:eu-central-1:936697816551:image/datascience-1.0"
eu-west-1:
datascience: "arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
eu-west-2:
datascience: "arn:aws:sagemaker:eu-west-2:712779665605:image/datascience-1.0"
eu-west-3:
datascience: "arn:aws:sagemaker:eu-west-3:615547856133:image/datascience-1.0"
eu-north-1:
datascience: "arn:aws:sagemaker:eu-north-1:243637512696:image/datascience-1.0"
eu-south-1:
datascience: "arn:aws:sagemaker:eu-south-1:488287956546:image/sagemaker-data-wrangler-1.0"
sa-east-1:
datascience: "arn:aws:sagemaker:sa-east-1:782484402741:image/datascience-1.0"


Resources:
LambdaExecutionRole:
Expand Down Expand Up @@ -306,43 +228,46 @@ Resources:
import cfnresponse
sagemaker = boto3.client('sagemaker')
def lambda_handler(event, context):
print(event)
script = textwrap.dedent('''\
#!/bin/bash
set -eux
set -eu
echo "Cloning IDP repository"
export REPOSITORY_URL="https://github.com/aws-samples/aws-ai-intelligent-document-processing"
git -C /home/sagemaker-user clone $REPOSITORY_URL
echo "Cloning complete"''')
cd /home/sagemaker-user
if [ ! -d "aws-ai-intelligent-document-processing" ]; then
git clone https://github.com/aws-samples/aws-ai-intelligent-document-processing.git idp_workshop || {
echo "Git clone failed, continuing without repository"
exit 0
}
else
echo "Repository already exists, skipping clone"
fi
echo "Setup complete"''')
script_byte = script.encode("ascii")
base64_bytes = base64.b64encode(script_byte)
base64_string = base64_bytes.decode("ascii")
if 'RequestType' in event and event['RequestType'] == 'Create':
domain_id = event['ResourceProperties']['DomainID']
user_profile = event['ResourceProperties']['UserProfileName']
try:
resp = sagemaker.create_studio_lifecycle_config(StudioLifecycleConfigName='idp-git-bootstrap',
StudioLifecycleConfigContent=base64_string,
StudioLifecycleConfigAppType='JupyterServer')
StudioLifecycleConfigAppType='JupyterLab')
lcc_config_arn = resp['StudioLifecycleConfigArn']
jupyter_setting = {'JupyterServerAppSettings': {
jupyter_setting = {'JupyterLabAppSettings': {
'DefaultResourceSpec': {
'LifecycleConfigArn': lcc_config_arn,
'InstanceType': 'system'
'InstanceType': 'ml.t3.medium'
},
'LifecycleConfigArns': [ lcc_config_arn ]
}
}

resp_d = sagemaker.update_domain(DomainId=domain_id,
DefaultUserSettings=jupyter_setting)
sagemaker.update_domain(DomainId=domain_id, DefaultUserSettings=jupyter_setting)
cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, '')
except Exception as e:
print(e)
cfnresponse.send(event, context, cfnresponse.FAILED, {'Error': 'Unable to create Lifecycle Config'}, '')
elif 'RequestType' in event and event['RequestType'] == 'Delete':
try:
resp = sagemaker.delete_studio_lifecycle_config(StudioLifecycleConfigName='idp-git-bootstrap')
sagemaker.delete_studio_lifecycle_config(StudioLifecycleConfigName='idp-git-bootstrap')
cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, '')
except Exception as e:
print(e)
Expand All @@ -355,14 +280,13 @@ Resources:
Role: !GetAtt LambdaExecutionRole.Arn
Runtime: python3.9
Timeout: 5

DefaultLcc:
Type: Custom::ResourceForLcc
DependsOn:
- StudioDomain
DependsOn: StudioDomain
Properties:
ServiceToken: !GetAtt LccLambda.Arn
DomainID: !Ref StudioDomain
UserProfileName: !Ref UserProfileName

StudioDomain:
Type: AWS::SageMaker::Domain
Expand All @@ -371,16 +295,14 @@ Resources:
AuthMode: IAM
DefaultUserSettings:
ExecutionRole: !GetAtt SageMakerExecutionRole.Arn
JupyterServerAppSettings:
DefaultResourceSpec:
InstanceType: system
SageMakerImageArn: !FindInMap
- JupyterMap
- !Ref 'AWS::Region'
- jupyterimage
DefaultSpaceSettings:
ExecutionRole: !GetAtt SageMakerExecutionRole.Arn
DomainName: !Ref DomainName
SubnetIds: !GetAtt DefaultVpcFinder.Subnets
VpcId: !GetAtt DefaultVpcFinder.VpcId




UserProfile:
Type: AWS::SageMaker::UserProfile
Expand All @@ -390,30 +312,25 @@ Resources:
UserProfileName: !Ref UserProfileName
UserSettings:
ExecutionRole: !GetAtt SageMakerExecutionRole.Arn
JupyterLabAppSettings:
DefaultResourceSpec:
InstanceType: ml.t3.medium

JupyterApp:
Type: AWS::SageMaker::App
DependsOn: UserProfile
Properties:
AppName: default
AppType: JupyterServer
DomainId: !GetAtt StudioDomain.DomainId
UserProfileName: !Ref UserProfileName

DataScienceApp:
Type: AWS::SageMaker::App
JupyterSpace:
Type: AWS::SageMaker::Space
DependsOn: UserProfile
Properties:
AppName: instance-event-engine-datascience-ml-t3-medium
AppType: KernelGateway
DomainId: !GetAtt StudioDomain.DomainId
ResourceSpec:
InstanceType: ml.t3.medium
SageMakerImageArn: !FindInMap
- RegionMap
- !Ref 'AWS::Region'
- datascience
UserProfileName: !Ref UserProfileName
SpaceName: IDPDeployJupyterSpace
OwnershipSettings:
OwnerUserProfileName: !Ref UserProfileName
SpaceSharingSettings:
SharingType: Private
SpaceSettings:
AppType: JupyterLab
JupyterLabAppSettings:
DefaultResourceSpec:
InstanceType: ml.t3.medium

### S3 Bucket For A2I
A2IBucket:
Expand Down
Binary file added images/apps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/cfn1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/cfn2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/cfn3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/cfn4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/jupyterlab.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed images/sm1.png
Binary file not shown.