Skip to content

Commit

Permalink
(release): A2I custom and IDP Mortgage
Browse files Browse the repository at this point in the history
  • Loading branch information
anjanvb committed Jul 27, 2022
1 parent 087f06c commit fcef49e
Show file tree
Hide file tree
Showing 46 changed files with 46,542 additions and 1 deletion.
882 changes: 882 additions & 0 deletions 04-idp-document-a2i.ipynb

Large diffs are not rendered by default.

912 changes: 912 additions & 0 deletions 04.01-idp-a2i-with-custom-rules.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Once the SageMaker Studio IDE has fully loaded in your browser, you can clone th
* Next, clone this repository using

```
git clone <repo_url> idp_workshop
git clone https://github.com/aws-samples/aws-ai-intelligent-document-processing idp_workshop
```

* Once the repository is cloned, a direcotry named `idp_workshop` will appear in the "File Browser" on the left panel of SageMaker Studio IDE
Expand Down
Binary file added a2idata/990-sample-page-1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file added a2idata/__init__.py
Empty file.
36 changes: 36 additions & 0 deletions a2idata/a2i-bi-sample-data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
timestamp,doc_id,condition_category,condition_setting,field_name,field_value,human_loop_name,reviewer,process_method
2022/07/01,9.34932E+13,LengthCheck,^[0-9a-zA-Z]{16}$,dln,9.34932E+13,custom-loop-a8e97e82-2b71-43cc-9b9f-9f3cc1f17bc5,Reviewer 1,manu
2022/07/01,93493188018523,Confidence,99,d.employer_id,98.06,custom-loop-a8e97e82-2b71-43cc-9b9f-9f3cc1f17bc5,Reviewer 1,manu
2022/07/02,93493188018524,Required,,dln,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1d2,Reviewer 2,manu
2022/07/02,93493188018525,LengthCheck,,e.phone_number,123,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1w3,Reviewer 1,manu
2022/07/02,93493188018525,Required,,dln,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1w3,Reviewer 1,manu
2022/07/03,93493188018526,Required,,dln,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a13d,Reviewer 1,manu
2022/07/03,93493188018527,Confidence,,dln,94,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a134,Reviewer 3,manu
2022/07/04,93493188018528,LengthCheck,,omb_no,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1xs,Reviewer 3,manu
2022/07/01,93493188018529,,,,,,,auto
2022/07/01,93493188018530,,,,,,,auto
2022/07/01,93493188018531,,,,,,,auto
2022/07/01,93493188018532,,,,,,,auto
2022/07/01,93493188018533,,,,,,,auto
2022/07/02,93493188018534,,,,,,,auto
2022/07/02,93493188018535,,,,,,,auto
2022/07/02,93493188018536,,,,,,,auto
2022/07/02,93493188018537,,,,,,,auto
2022/07/02,93493188018538,,,,,,,auto
2022/07/02,93493188018539,,,,,,,auto
2022/07/03,93493188018540,,,,,,,auto
2022/07/03,93493188018541,,,,,,,auto
2022/07/03,93493188018542,,,,,,,auto
2022/07/03,93493188018543,,,,,,,auto
2022/07/03,93493188018544,,,,,,,auto
2022/07/03,93493188018545,,,,,,,auto
2022/07/03,93493188018546,,,,,,,auto
2022/07/03,93493188018546,,,,,,,auto
2022/07/03,93493188018547,,,,,,,auto
2022/07/03,93493188018548,,,,,,,auto
2022/07/03,93493188018549,,,,,,,auto
2022/07/04,93493188018550,,,,,,,auto
2022/07/04,93493188018551,,,,,,,auto
2022/07/04,93493188018552,,,,,,,auto
2022/07/04,93493188018553,,,,,,,auto
2022/07/04,93493188018554,,,,,,,auto
159 changes: 159 additions & 0 deletions a2idata/a2i-custom-ui.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<link rel="stylesheet" href="https://s3.amazonaws.com/smgtannotation/web/static/css/1.3fc3007b.chunk.css">
<link rel="stylesheet" href="https://s3.amazonaws.com/smgtannotation/web/static/css/main.9504782e.chunk.css">
<link href="/static/css/1.fe2e351b.chunk.css" rel="stylesheet">
<link href="/static/css/main.2b80d815.chunk.css" rel="stylesheet">
<style>
.wrapper {
position:relative;
display:block; /* <= shrinks container to image size */
overflow-y: scroll;
max-height:1000px;
background-color: #e9ecec;
padding: 30px;
border:red 10px;
}
.img-overlay-wrap {
position: relative;
display: inline-block; /* <= shrinks container to image size */
transition: transform 150ms ease-in-out;
overflow-y: scroll;
background-color: #e9ecec;
}

.img-overlay-wrap img { /* <= optional, for responsiveness */
display: block;
max-width: 800;
height: auto;
box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
}

.img-overlay-wrap svg {
position: absolute;
top: 0;
left: 0;
}

.img-overlay-wrap svg rect {
stroke:#009879;
stroke-width: 2;
fill: #009879;
fill-opacity: 20%;
}

.styled-table input {
width:250px;
height: 100px;
vertical-align: top;
}
.styled-table {
border-collapse: collapse;
margin: 10px 0;
font-size: 0.9em;
font-family: sans-serif;
width:100%;
box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
}
.styled-table thead tr {
background-color: #009879;
color: #ffffff;
text-align: left;
}
.styled-table th,
.styled-table td {
padding: 12px 15px;
vertical-align: top;
}
.styled-table tbody tr {
border-bottom: 1px solid #dddddd;
}

.styled-table tbody tr:nth-of-type(even) {
background-color: #f3f3f3;
}

.styled-table tbody tr:last-of-type {
border-bottom: 2px solid #009879;
}
.styled-table tbody tr.active-row {
font-weight: bold;
color: #009879;
}
</style>
<script>
function condition_over(idx) {
document.getElementById("rectm_" + idx).style = "stroke-width:2px; fill: transparent; stroke: #9e4064; fill: #c5a7be;";
document.getElementById("tr_" + idx).class = "active-row"
}
function condition_out(idx) {
document.getElementById("rectm_" + idx).style = "stroke-width: 2;fill: #009879; fill-opacity: 20%;";
document.getElementById("tr_" + idx).class = ""
}
</script>
<div id='document-text' style="display: none;">
{{ task.input.text }}
</div>
<div id='document-image' style="display: none;">
{{ task.input.s3.url | grant_read_access }}
</div>

<table>
<tr>
<td style="vertical-align: top;">
<div class="wrapper">
<div class="img-overlay-wrap">
<img src="{{ task.input.s3.url | grant_read_access }}">
<svg viewBox="0 0 {{task.input.s3.image_width}} {{task.input.s3.image_height}}">
{% for b in task.input.Results.ConditionMissed %}
{% if b.block != null %}
<rect id="rectm_{{b.index}}" width="{{ b.block.Geometry.BoundingBox.Width | times: task.input.s3.image_width }}" height="{{ b.block.Geometry.BoundingBox.Height | times: task.input.s3.image_height }}" x="{{ b.block.Geometry.BoundingBox.Left | times: task.input.s3.image_width }}" y="{{ b.block.Geometry.BoundingBox.Top | times: task.input.s3.image_height }}"></rect>
{% endif %}
{% endfor %}
</svg>
</div>
</div>
</td>
<td>&nbsp;&nbsp;&nbsp;</td>
<td style="vertical-align: top; padding: 20px;">
<crowd-form>
<div>
<h3>Instructions</h3>
<p>Please review the extracted result, and make corrections where appropriate. </p>
</div>
<br>
<h3> Missed Conditions </h3>
<table class="styled-table">
<thead>
<tr>
<th style="width:250px">DESCRIPTION</th>
<th>ACTUAL VALUE</th>
<th>YOUR VALUE</th>
<th>CHANGE REASON</th>
</tr>
</thead>
<tbody>
{% for r in task.input.Results.ConditionMissed %}

<tr id="tr_{{r.index}}" onmouseover="javascript:condition_over( {{r.index}} )" onmouseout="javascript:condition_out({{r.index}})">
<td title="Field name: {{r.field_name}} ({{ r.condition_category }})">{{ r.message }}</td>
<td>{{ r.field_value }}</td>
<td>
<p>
<input type="text" name="True Value {{r.index}}" placeholder="Enter your value" />
</p>
</td>
<td>
<p>
<input type="text" name="Change Reason {{r.index}}" placeholder="Explain why you changed the value" />
</p>
</td>
</tr>

{% endfor %}
</tbody>
</table>
</crowd-form>
</td>
</tr>
</table>
100 changes: 100 additions & 0 deletions a2idata/condition.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
from enum import Enum
import re

class Condition:
_data = None
_conditions = None
_result = None

def __init__(self, data, conditions):
self._data = data
self._conditions = conditions

def check(self, field_name, obj):
r,s = [],[]
for c in self._conditions:
# Matching field_name or field_name_regex
condition_setting = c.get("condition_setting")
if c["field_name"] == field_name \
or (c.get("field_name") is None and c.get("field_name_regex") is not None and re.search(c.get("field_name_regex"), field_name)):
field_value, block = None, None
if obj is not None:
field_value = obj.get("value")
block = obj.get("block")
confidence = obj.get("confidence")

if c["condition_type"] == "Required" \
and (obj is None or field_value is None or len(str(field_value)) == 0):
r.append({
"message": f"The required field [{field_name}] is missing.",
"field_name": field_name,
"field_value": field_value,
"condition_type": str(c["condition_type"]),
"condition_setting": condition_setting,
"condition_category":c["condition_category"],
"block": block
})
elif c["condition_type"] == "ConfidenceThreshold" \
and c["condition_setting"] is not None and float(confidence) < float(c["condition_setting"]):
r.append({
"message": f"The field [{field_name}] confidence score {confidence} is lower than the threshold {c['condition_setting']}",
"field_name": field_name,
"field_value": field_value,
"condition_type": str(c["condition_type"]),
"condition_setting": condition_setting,
"condition_category":c["condition_category"],
"block": block
})
elif field_value is not None and c["condition_type"] == "ValueRegex" and condition_setting is not None \
and re.search(condition_setting, str(field_value)) is None:
r.append({
"message": f"{c['description']}",
"field_name": field_name,
"field_value": field_value,
"condition_type": str(c["condition_type"]),
"condition_setting": condition_setting,
"condition_category":c["condition_category"],
"block": block
})

# field has condition defined and sastified
s.append(
{
"message": f"{c['description']}",
"field_name": field_name,
"field_value": field_value,
"condition_type": str(c["condition_type"]),
"condition_setting": condition_setting,
"condition_category":c["condition_category"],
"block": block
})

return r, s

def check_all(self):
if self._data is None or self._conditions is None:
return None

broken_conditions = []
satisfied_conditions = []
for key, obj in self._data.items():
value = None
if obj is not None:
value = obj.get("value")

if value is not None and type(value)==str:
value = value.replace(' ','')

r, s = self.check(key, obj)
if r and len(r) > 0:
broken_conditions += r
if s and len(s) > 0:
satisfied_conditions += s


# apply index
idx = 0
for r in broken_conditions:
idx += 1
r["index"] = idx
return broken_conditions, satisfied_conditions
8 changes: 8 additions & 0 deletions dist/idp-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,18 @@ Resources:
- textract:GetDocumentTextDetection
- textract:GetDocumentAnalysis
- textract:AnalyzeDocument
- textract:AnalyzeID
- textract:AnalyzeExpense
- textract:DetectDocumentText
- textract:StartDocumentAnalysis
- textract:StartDocumentTextDetection
- comprehend:DetectEntities
- comprehend:DetectPiiEntities
- comprehend:ContainsPiiEntities
- comprehend:DescribePiiEntitiesDetectionJob
- comprehend:ListPiiEntitiesDetectionJobs
- comprehend:StartPiiEntitiesDetectionJob
- comprehend:StopPiiEntitiesDetectionJob
- comprehend:StartEntitiesDetectionJob
- comprehend:ClassifyDocument
- comprehend:DescribeDocumentClassificationJob
Expand Down
Binary file removed images/.DS_Store
Binary file not shown.
Binary file added images/a2i-custom-rule.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-custom-ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-page1-data-model-mapping.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-quicksight-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-quicksight-dataset.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-quicksight-init.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-quicksight-publish.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-quicksight-visual-histogram.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-quicksight-visual-pie.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/a2i-quicksight-visual-wordcloud.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit fcef49e

Please sign in to comment.