Getting started with Amazon Bedrock, RAG, and Vector database in Python
Learn how to build a comprehensive search engine that understands text, images, and video using Amazon Titan Embeddings, Amazon Bedrock, Amazon Nova models and LangChain.
Through Jupyter notebooks, the repository guides you through the process of video understanding, ingesting text from PDFs, generating text and image embeddings, and segmenting the text into meaningful chunks using LangChain. These embeddings are then stored in a FAISS vector database and an Amazon Aurora PostgreSQL database, enabling efficient search and retrieval operations.
Using Amazon Aurora PostgreSQL, you'll store and manage all vector embeddings in one place, making your content searchable through natural language queries.
This project guides you through creating:
- A text and document processing pipeline and image understanding and search system using AWS Cloud Development Kit (CDK) to create four AWS Lambda Functions.
- A video content analysis solution with a unified vector database for semantic search using AWS Cloud Development Kit (CDK) to deploy a scalable and modular architecture for processing audio/video content using Amazon Elastic Container Service (ECS).
By completing this project, you'll know how to:
- Process and analyze text documents using Amazon Titan Embeddings
- Generate embeddings for images and enable visual search
- Extract insights from videos using Amazon Nova models
- Create semantic chunks from content using LangChain
- Build vector databases with FAISS and Aurora PostgreSQL
- Deploy serverless functions for content processing
- Implement multimodal search capabilities
Get ready to unlock the power of multi-modal search and unlock new possibilities in my apps!
Requirements:
- Install boto3 - This is the AWS SDK for Python that allows interacting with AWS services. Install with
pip install boto3
. - Configure AWS credentials - Boto3 needs credentials to make API calls to AWS.
- Install Langchain, a framework for developing applications powered by large language models (LLMs). Install with
pip install langchain
.
💰 Cost to complete:
Jupyter notebook for loading documents from PDFs, extracting and splitting text into semantically meaningful chunks using LangChain, generating text embeddings from those chunks utilizing an , generating embeddings from the text using an Amazon Titan Embeddings G1 - Text models, and storing the embeddings in a FAISS vector database for retrieval.
This notebook demonstrates how to combine Titan Multimodal Embeddings, LangChain and FAISS to build a capable image search application. Titan's embeddings allow representing images and text in a common dense vector space, enabling natural language querying of images. FAISS provides a fast, scalable way to index and search those vectors. And LangChain offers abstractions to hook everything together and surface relevant image results based on a user's query.
By following the steps outlined, you'll be able to preprocess images, generate embeddings, load them into FAISS, and write a simple application that takes in a natural language query, searches the FAISS index, and returns the most semantically relevant images. It's a great example of the power of combining modern AI technologies to build applications.
In this Jupyter Notebook, you'll explore how to store vector embeddings in a vector database using Amazon Aurora and the pgvector extension. This approach is particularly useful for applications that require efficient similarity searches on high-dimensional data, such as natural language processing, image recognition, and recommendation systems.
This Jupyter notebook contains the code to process a video using Amazon Nova models to video understanding. If the video is less than 25MB, it is converted to base64, and if it's larger, it is uploaded to an Amazon S3 bucket, which must be added as a variable in you_bucket.
This notebook demonstrates how to process video and audio content using Amazon Bedrock with Amazon Titan Multimodal Embeddings G1 model for generating embeddings and storing them in a existing Amazon Aurora PostgreSQL database with pgvector for similarity search capabilities.
Create Amazon Aurora PostgreSQL with this Amazon CDK Stack
App | Description | Diagram |
---|---|---|
Building an Amazon Aurora PostgreSQL vector database | Learn how to set up an Amazon Aurora PostgreSQL vector database to multimodal vector embeddings, enabling semantic search, using AWS Cloud Development Kit (CDK) for Python. | ![]() |
Serverless AWS Lamdba Vector Database System for Multi-Modal Document/Image Processing | This serverless solution creates, manages, and queries vector databases for PDF documents and images with Amazon Bedrock embeddings. You can use FAISS vector stores or Aurora PostgreSQL with pgvector for efficient similarity searches across multiple data types. | ![]() |
Ask Your Video: Audio/Video Processing Pipeline with Vector Search | Build a serverless solution that processes video content and makes it searchable using natural language. This solution extracts meaningful information from both audio and video, allowing you to find specific moments using simple queries | ![]() |
- Getting started with Amazon Bedrock, RAG, and Vector database in Python
- Building with Amazon Bedrock and LangChain
- How To Choose Your LLM
- Working With Your Live Data Using LangChain
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.