Skip to content
This repository has been archived by the owner on Nov 2, 2024. It is now read-only.

Latest commit

 

History

History
executable file
·
344 lines (255 loc) · 13.1 KB

112_aws_database.md

File metadata and controls

executable file
·
344 lines (255 loc) · 13.1 KB

AWS Databases

This is not an exhaustive documentation of all the existing AWS Services. These are summarized notes for the AWS Certifications.
To see the complete documentation, please go to: AWS documentation

Choosing the Right Database

The following are some of the guideline questions to ask when choosing the database to use:

  • Is the workload read-heavy, write-heavy or balanced?
  • What are the throughput needs?
  • Will the throughput change, fluctuate or will we have to scale the DB over time?
  • How much data do we store and for how long?
  • Will the size of the data grow?
  • What is the average size of an object in the DB? How frequently are these objects accessed?
  • Should the data be stored for a week or forever?
  • Is the database going to be a source of truth?
  • What are the latency concerns?
  • What is the data model?
  • How will the data be queried?
  • Will the data be structured, semi-structured or unstructured?
  • Do we need strong schema or flexible schema?
  • Do we need reporting? Do we need advanced search?
  • What are the license costs?
  • Can we switch to a cloud native database such as Aurora, DynamoDB, etc?

Back to the top

Database Types on AWS

Database Types AWS Service Description
RDBMS (SQL/OLTP) RDS
Aurora
Great for joins and normalized data
NoSQL DynamoDB
ElastiCache (key/value pairs)
Neptune (graphs)
DocumentDB (json)
No joins, no SQL
Object Store S3 (for big objects)
Glacier (for backups, archive)
For object storage and archival
Data Warehouse Redshift (OLAP)
Athena
SQL Analytics / BI Use cases
Search ElasticSearch (JSON) Free text, unstructured searches
Graphs Neptune Displays relationships between data

Amazon Relational Databases (RDS)

Amazon RDS is a managed database service for relational databases

  • Provides cost-efficient, resizable capacity
  • AWS manages common database administration tasks.
  • AWS provisions an EC2 instance behind the scenes and EBS Volume

DB Engines supported

A DB engine is the specific relational database software that runs on your DB instance. Amazon RDS currently supports the following engines:

  • MySQL
  • PostreSQL
  • Oracle
  • Microsoft SQL Server
  • Aurora
  • MariaDB

Advantages and Disadvantages

Advantages

  • Security through IAM, security groups, KMS, SSL
  • Backup/Snapshot/Point in time restore functionality
  • Support for multi AZ deployment
  • Read replicas
  • Monitoring happens through CloudWatch
  • Managed and scheduled maintenance
  • Pay per hour based on provisioned EC2 and EBS
  • Scaling capability (vertical and horizontal)
  • Storage backed by EBS (GP2 or IO)

Disadvantages

  • No SSH into the instance which hosts the database

Use cases

  • Store relational datasets
  • Perform SQL queries
  • Transactional inserts, deletes, updates

RDS Backups

  • Backups are automatically enabled in RDS
  • AWS RDS provides automated backups:
    • Daily fill backup of the database (during the maintenance window)
    • Transaction logs are backed-up by RDS every 5 minutes which provides the ability to do point in time restores
    • There is a 7 day retention for the backups which can be increased to 35 days
  • DB Snapshots:
    • There are manually triggered backups by the users
    • Retention can be as long as the user wants
    • Helpful for retaining the state of the database for longer period of time

RDS Read Replicas

  • Read replicas helps to scale the read operations
  • We can create up to 5 read replicas
  • These replicas can be within AZ, cross AZ or in different regions
  • The data between the main database and the read replicas is replicated asynchronously => reads are eventually consistent
  • Read replicas can be promoted into their own database
  • Use case for read replicas:
    • Production database is up and running taking on normal load
    • There is a new feature for running some reporting for analytics which may cause slow downs and may overload the database
    • To fix this we can create read replicas for reporting
  • Read replicas are used for SELECT operations (not INSERT, UPDATE, DELETE)
  • Network cost for read replicas:
    • In AWS there is network cost if data goes from one AZ to another
    • In case of cross AZ replication, additional costs may incur because of network traffic
    • To reduce costs, we could have the read replicas in the same AZ

RDS Multi AZ (Disaster Recovery)

  • RDS Multi AZ replication is done using synchronous replication
  • In case of multi AZ configuration we get one DNS name
  • In case of the main database goes down, the traffic is automatically re-routed to the failover database
  • Multi AZ is not used for scaling
  • The read replicas can be set up as Multi AZ for Disaster Recovery

Back to the top

RDS Security

Encryption at rest

  • Possibility to encrypt the master and read replicas with AWS KMS - AES-256 encryption
  • Encryption has to be defined at the launch time
  • If the master is not encrypted, the read replicas cannot be encrypted
  • Transparent Data Encryption (TDE) is available for Oracle and SQL Server

Encryption in flight

  • Uses SSL certificates to encrypt data from client to RDS in flight
  • It is required SSL a trust certificate when connecting to database
  • To enforce SSL:
    • PostgeSQL:

      rds.force_ssl=1 in the AWS RDS Console (Parameter Groups)
    • MySQL:

      GRANT USAGE ON *.* To 'user'@'%' REQUIRE SSL; 

Encrypting RDS backups

  • Snapshots of un-encrypted RDS databases are un-encrypted
  • Snapshots of encrypted RDS databases are encrypted
  • We can copy an un-encrypted snapshot into an encrypted one

Encrypt an un-encrypted RDS database

  • Create a snapshot
  • Copy the snapshot and enable encryption for the snapshot
  • Restore the database from the encrypted snapshot
  • Migrate application from the old database to the new one and delete the old database

Network Security and IAM

Network security

  • RDS databases are usually deployed within a private subnet
  • RDS security works by leveraging security groups (similar to EC2), they control who can communicate with the database instance

Access management

  • There are IAM policies which help control who can manage an AWS RDS database (through the RDS API)
  • Traditional username/password can be used to login into the database
  • IAM-based authentication can be used to login into MySQL and PostgreSQL

IAM authentication

  • IAM database authentication works with MySQL and PostgreSQL
  • We don't need a password to authenticate, just an authentication token obtained through IAM and RDS API calls
  • The token has a lifetime of 15 minutes
  • Benefits:
    • Network in/out must be encrypted using SSL
    • IAM is used to centrally manage users instead of DB credentials
    • We can manage IAM roles and EC2 instance profiles for easy integration

Back to the top

Aurora

Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL.

  • Data is held in 6 replicas across 3 AZs
  • Up to 15 read replicas (only 5 for RDS)
  • It has some auto healing capabilities
  • Auto scaling read replicas which can be global
  • Can be global, for disaster recovery purposes or latency purposes
  • Auto scaling of storage form 10GB to 64TB
  • We define an EC2 instance for Aurora instances
  • Same security, monitoring and maintenance features as RDS
  • Aurora has a serverless option

Use cases

  • Similar to RDS, but with less maintenance, more flexibly, more performant at a higher cost

Back to the top

ElastiCache

Managed Redis or Memcached which provides a high performance, resizable, and cost-effective in-memory cache, while removing complexity associated with deploying and managing a distributed cache environment.

  • In-memory data store, sub-millisecond latency
  • Must provision and EC2 instance type
  • Offers support for clustering (Redis)
  • Multi AZ
  • Read replicas (sharding)
  • Security through IAM, SG, KMS, Redis Auth
  • Backups, snapshots and point in time restore (for Redis)
  • It has managed and scheduled maintenance
  • Monitoring happens through CloudWatch

Uses cases

  • Key/value store
  • Frequent reads, less writes
  • Cache results for DB queries
  • Store session data for websites

Back to the top

DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance.

  • Managed NoSQL database, proprietary to AWS
  • Serverless, provisioned capacity
  • Supports auto scaling, provides on demand capacity as well
  • Can replace ElastiCache as a key/value store
  • Highly available, Multi AZ by default\
  • Reads and writes are decoupled, DAX for read cache
  • Reads can be eventually consistent or strongly consistent
  • Security, authentication and authorization is done through IAM
  • Enable DynamoDB Streams to stream all the changes in DynamoDB and integrate with AWS Lambda
  • Provides backup/restore features, Global Table feature
  • Monitoring is done through CloudWatch
  • The tables can only be queried on primary key, sort key or indexes

Use cases

  • Serverless application development
  • Distributed serverless cache

Back to the top

Redshift

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

  • Redshift is based on PostgreSQL, but it's not used for OLTP
  • It's OLAP - only analytical processing (analytics and data warehousing)
  • 10x better performance than other data warehouses, scales to PBs of data
  • Columnar database - data is stored in columns
  • Massively parallel query execution (MPP), highly available
  • Pay as you go based on the instances provisioned
  • Has a SQL interface for performing the queries
  • Great to use with BI tools such as AWS Quicksight or Tableau
  • Data is loaded form S3, DynamoDB, Database Migration Service (DMS), other DBs
  • Can scale from 1 node up to 128 nodes, up to 160 GB of space per node
  • Node types: compute nodes for performing queries, result is sent to leader nodes
  • Redshift Spectrum: perform queries directly against S3 (no need to load the data)
  • Provides backup and restore
  • Security is accomplished with VPC/IAM/KMS
  • Redshift Enhanced VPC Routing: COPY/UNLOAD goes through VPC

Redshift Snapshots and Disaster Recovery

  • Snapshots are point-in-time backups of a cluster, stored internally in S3
  • Snapshots are incremental (only what has changed is saved)
  • Snapshots can be restored into new Redshift clusters
  • Snapshots can be automated: every 8 hours, every 5 GB, or on schedule
  • Manual snapshots: snapshots are retained until we delete them
  • We can configure Redshift to automatically copy snapshots to another regions

Redshift Spectrum

  • Data can be queried from S3 without the need to load it into Redshift
  • We must have a Redshift cluster available to start the query
  • The query is then submitted to thousands of Redshift Spectrum nodes

Back to the top

Neptune

Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets.

  • Highly available across 3 AZ, up to 15 read replicas
  • Provides point-in-time recovery, continuous back-up to S3
  • Support for KMS encryption at rest + HTTPS

Use cases

  • Highly relational data
  • Social networking
  • Knowledge graphs

Back to the top

OpenSearch

OpenSearch is a fully open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analysis.

  • Can search any field, as well as partial matches
  • Complement to another database
  • We can provision a cluster of instances
  • Built-in integration with:
    • Amazon Kinesis Data Firehose
    • AWS IoT
    • Amazon CloudWatch Logs for data ingestion
  • Security through Cognito and IAM, KMS encryption, SSL and VPC
  • Comes with Kibana and Logstahs - ELK stack

Back to the top