Skip to content

TBBC-AI/doc-classifier-aws

Repository files navigation

any-doc-classifier - TBBC - DeeLab

This project contains source code and supporting files for a serverless application that you can deploy with the AWS Serverless Application Model (AWS SAM) The project exposes an API to manage documents information and be able to classify them with comprehend. It includes the following files and folders:

  • src - Code for the application's Lambda functions.
  • template.yaml - A template that defines the application's AWS resources and reference the stack deployed from the main project.

To get started, deploy the cloudformation template:

  aws cloudformation deploy --template-file infra.yaml --stack-name your-stack-name --capabilities CAPABILITY_NAMED_IAM

Deploy the sample application

To use the AWS SAM CLI, you need the following tools:

To build and deploy the application for the first time reference to the main project:

If you just want to deploy the APIs and apply new changes to this project, run the following in your shell:

Initial Architecture

plot

Documentation

You can check the article in - THE TBBC WEBSITE

sam build --use-container 
cd .aws-sam/build/
# for the first time 
sam deploy --guided 
# afterwards  
sam deploy --config-file ../../samconfig.toml

The first command builds the source of your application. The second command will package and deploy your application to AWS The API Gateway endpoint API will be displayed in the outputs when the deployment is complete.

Project Structure

  • file-manager: this lambda function exposes an Express API with the following routes:
  • extract-data: S3 trigger to take the file and invoke the Textract API.
  • textract-checker: due you can upload files with a couple of pages, we have created a SQS trigger to process asynchronously the files text extraction, the data is then persisted on DynamoDb
  • classify-document :S3 trigger that takes the file and classify it with AWS comprehend (batch process). A prefix for the file must be defined (classify_<file_name>.extension)
  • training-files :This is a lambda function triggered by EventBridge to generate the csv files by obtaining the records pending of sync from DynamoDB
  • classify-extractor :S3 trigger that takes the classification output from AWS Comprehend, extracts the information and persists the classification results (file_name, score, label) in a new DynamoDB table

About

any doc classifier endpoints

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •