All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- s3 other bucket public access restrictions
liveness_probe
andreadiness_probe
for HMS readwrite and HMS readonly.
- Add
restrict_public_buckets = true
to s3 bucket public access settings
- Add variable to configure read-write metastore service ingress.
- Attach service account to s3_inventory job when using IRSA.
- Rename s3_inventory cronjob to match service account name, required on new internal clusters.
- Fixed problem with s3_inventory_repair cronjob when apiary instance_name is not empty.
- Changed bucket policy for
deny_iamroles
to only deny "dangerous" actions, includingGetObject
.
- Variable to enable RDS encryption.
- Add support for configuring k8s pods IAM using IRSA.
- Add support to split customer policy condition.
- Added disallow_incompatible_col_type_changes variable to disable hive validation when schema changes. This variable will help Apache Iceberg to make schema-evolution.
- Add support for cross account access to system schema.
- Added apiary_consumer_iamroles variable to grant cross account access to IAM roles.
- Added apiary_customer_condition variable to restrict access using S3 object tags.
- Add support for cross account access to s3 inventory.
- Add support for Apiary-specific RDS parameter groups.
- Add variable to specify RDS/MySQL parameter value for
max_allowed_packet
(default 128MB).
- If the S3 bucket specifies an expiration TTL in days that is <= the Intelligent-Tiering transition days, don't create
a lifecycle
transition
policy. This will prevent errors like:Error: Error putting S3 lifecycle: InvalidArgument: 'Days' in the Expiration action for filter '(prefix=)' must be greater than 'Days' in the Transition action
- Added
DenyUnsecureCommunication
policy fors3-other.tf
buckets.
- Add variables to configure s3-sqs defaults for spark streaming.
- Disable k8s loadbalancer and route53 entries along with vpc endpoint services.
- S3 HTTPS bucket policy requirements are now properly enforced.
- Only publish S3 Create events to managed logs SQS queue.
- Variable to disable creating s3 logs hive database.
- Terraform 0.12+ formatting.
- Add required version(1.x) for kubernetes provider,to fix issues with 2.x provider.
- Fix colliding Grafana dashboard names for multiple Apiary instances.
- Fix managed bucket policy with empty_customer_accounts.
- Support to override customer accounts per managed schema.
- Add managed_database_host output.
- Configure bucket ownership controls on apiary managed buckets,cross account object writes will be owned by bucket instead of writer.
- Add metastore load balancer outputs.
- Enable SQS events on managed logs bucket.
- Issue 165 Configure metastore IAM roles using apiary bucket prefix.
- Fix init container deployment.
- Issue 165 Use init containers instead of
mysql
commands to initialize mysql users.
mysql
dependency for this terraform module.
- Issue 169 Added S3:GetBucketAcl to cross-account shared buckets
- Variable to disable metastore VPC endpoint services.
- Add
abort_incomplete_multipart_upload_days
to all S3 buckets. - Issue 167 Fix gluesync in ECS deployments.
- Issue 162 Add explicit dependency for S3 public access block to resolve race condition.
- Create
apiary_system
database and buckets. This is pre-work for Ranger access logs Hive tables and other system data. Requiresapiary-metastore-docker
version1.15.0
or above.
- Added support for SSE-KMS encryption in Apiary managed S3 bucket.
- Optional
customer_principal
andproducer_iamroles
in Apiary managed bucket policies.
- Variable to deny IAM roles access to Apiary managed S3 buckets.
- Set min/max size of HMS thread pool based on memory. Max will be set to 1 connection for every 2MB RAM. Min will be 0.25% of max. This will prevent large HMS instances from not having enough threads/connections available.
- Change type of
apiary_managed_schemas
fromlist(any)
tolist(map(string))
to support dynamically-generated schema lists. This type is backward-compatible with previous schema lists. Schema lists were already lists of maps of strings, but this change makes TF 0.12 work in certain circumstances that were causing a fatal TF error.
- Fix multiple instance deployment on k8s.
- If Apiary's default S3 access log management is enabled (i.e.,
var.apiary_log_bucket
is not set by the user), signal the Hive metastore to create the Hive databases3_logs_hive
on startup. This is pre-work to prepare for S3 access-log Hive tables in a future version of Apiary. Requiresapiary-metastore-docker
version1.13.0
or above.
- Per-schema option to send S3 data notifications to an SQS queue. See
enable_data_events_sqs
in the apiary_managed_schemas section of VARIABLES.md
- Changed AWS resources created on a per-schema basis to use Terraform
for_each
instead ofcount
. This includes S3 and SNS resources.- This was done to fix the issue of removing a schema in a later deployment. If the schema removed is not at the end of the
apiary_managed_schemas
list, then when usingcount
, Terraform will see different indexes in the state file for the other resources, and will want to delete and recreate them. Usingfor_each
references them byschema_name
in the state file and fixes this issue.
- This was done to fix the issue of removing a schema in a later deployment. If the schema removed is not at the end of the
- The following variables changed type from
string
tobool
since thestring
was acting as a boolean pre-TF 0.12:db_apply_immediately
,enable_hive_metastore_metrics
,enable_gluesync
,enable_metadata_events
,enable_data_events
,enable_s3_paid_metrics
- Removed variable
s3_block_public_access
- Blocking of public access to Apiary S3 buckets is now mandatory. - Removed quoted variable types in
variables.tf
to follow Terraform 0.12 standards and remove warnings.
- THIS IS A BREAKING CHANGE. When deploying
6.0.0
on an existing Apiary deployment, the following procedure must be followed:- See the
migrate.py
script in thescripts
folder. - This script is used to migrate an Apiary Terraform state file from using
count
for resource indexing to usingfor_each
, which is how apiary-data-lake v6.0.0+ handles indexed resources. Without this script, doing anapply
will want to destroy all your S3 resources and then recreate them because they are stored in the.tfstate
file differently. - The migration script needs some external packages installed (see
migrate_requirements.txt
) and then should run in either Python 2.7+ or Python 3.6+. - This procedure assumes you have a Terraform app called
apiary-terraform-app
that is the application using this module. - Upgrade
apiary-terraform-app
toapiary-data-lake
v5.3.2. This will necessitate using Terraform 0.12+ and resolving any TF 0.12 incompatibilities in your application code. TF 0.12.21+ is recommended (will be required later). - Plan and apply your Terraform app to make sure it is working and up-to-date.
- Install Python 3 if you don't yet have a Python installation.
- Install requirements for this script with
pip install -r migrate_requirements.txt
. - Run this script pointing to your terraform state file. Script can read the state file from either file system or S3. Run it first with dryrun, then live. Example:
python migrate.py --dryrun --statefile s3://<bucket_name>/<path_to_statefile>/terraform.tfstate
python migrate.py --statefile s3://<bucket_name>/<path_to_statefile>/terraform.tfstate
- Note that appropriate AWS credentials will be needed for S3: AWS_PROFILE, AWS_DEFAULT_REGION, etc.
- Upgrade
apiary-terraform-app
to useapiary-data-lake
v6.0.0. If you are not yet using TF 0.12.21+, please upgrade to 0.12.21. - Make only the following changes to your
.tf
file that references theapiary-data-lake
module. Don't make any additions or other changes:- If your app is setting
s3_block_public_access
, remove reference to that variable. Public access blocks are now mandatory. - If your app is setting any of the following variables that changed type to
bool
, change the passed value totrue
orfalse
:db_apply_immediately
,enable_hive_metastore_metrics
,enable_gluesync
,enable_metadata_events
,enable_data_events
,enable_s3_paid_metrics
- If current code is setting those to
"1"
(or anything non-blank), change totrue.
If setting to""
, change tofalse
.
- If your app is setting
- Now run a plan of your
apiary-terraform-app
that is usingapiary-data-lake
v6.0.0. It should show no changes needed. - Now run an apply of the code.
- Now you can make changes to use any other v6.0.0 features or make any other changes you want. E.g, setting
enable_data_events_sqs
in schemas.
- See the
- This version of
apiary-data-lake
requires at least Terraform0.12.21
- Add S3 replication permissions to producer bucket policy.
- Configuration to delete incomplete multi-part S3 uploads.
- Add additional tags to Apiary data buckets using json instead of terraform map.
- Added a tags map to the Apiary S3 data buckets to have additional tags as required.
- Property
s3_object_expiration_days
toapiary_managed_schemas
, which sets number of days after which objects in the Apiary S3 buckets expire - Documentation in
VARIABLES.md
for theapiary_managed_schemas
variable.
- If S3 inventory is enabled, Hive tables will be created for each Apiary schema bucket. They will be updated on a scheduled basis each day, etc.
- Note that the scheduled job is currently only implemented for Kubernetes deployments of Apiary.
- Variable to configure S3 inventory table update schedule -
s3_inventory_update_schedule
.
- Variable to configure
apiary_assume_roles
cross-region S3 access. - Documentation in
VARIABLES.md
for theapiary_assume_roles
variable.
apiary_assume_roles[i].max_session_duration
renamed toapiary_assume_roles[i].max_role_session_duration_seconds
.
- Variable to configure S3 inventory output format.
- Include Size, LastModifiedDate, StorageClass, ETag, IntelligentTieringAccessTier optional fields in S3 inventory.
- Manage logs S3 bucket to capture data bucket access logs, logs bucket will be created when apiary_log_bucket variable is not set.
- apiary_log_bucket variable is optional now.
- Added Prometheus scrape annotations to Kubernetes deployments.
- Disable CloudWatch dashboard when running on Kubernetes.
- Variable to enable Apiary Kafka metastore listener.
- Templates to configure a Grafana dashboard through the
grafana-dashboard
config map
- Variable
atlas_cluster_name
to configure Atlas cluster name for Atlas hive-bridge.
- Reduce k8s Hive Metastore process heapsize from 90 to 85 percent of container memory limit.
- Variable to enable Atlas hive-bridge.
- Support for running Hive Metastore on Kubernetes.
- Upgrade to Terraform version 0.12.
- Configuration variable for
apiary_extensions_version
. - Variable to grant cross account AWS IAM roles write access to Apiary managed S3 buckets using assume policy.
- Variable to enable S3 inventory configuration.
- Variable to enable S3 Block Public Access.
hms_readwrite
VPC endpoint whitelisted principals list now filters out empty elements.- Tag VPC endpoint services.
- Add ansible handler to restart hive metastore services on changes to hive-site.xml and hive-env.sh.
- add TABLE_PARAM_FILTER environment variable to hive-env.sh on EC2 to fix beekeeper.
- Support for running Hive Metastore on EC2 nodes.
- Support for configuring read-only HMS with Ranger audit-only mode.
- Support for running Hive Metastore on EC2 nodes.
- Hive Metastore IAM role names changed from using
ecs-task
tohms
as name root, variableiam_name_root
can be used to keep old names. - Replace hardcoded
us-west-2
as region to variable${var.aws_region}
incloudwatch.tf
- see #112.
- Pass
var.aws_region
tonull_resource.mysql_ro_user
region
flag tomysql_user.sh
script.
region
flag tomysql_user.sh
script.
- Option to configure S3 storage class for cost optimization.
- Change in structure of
apiary_managed_schemas
variable from list to list of maps.
- Support for docker private registry.
- A new variable to specify TABLE_PARAM_FILTER regex for Hive Metastore listener.
- Support for
_
inapiary_managed_schemas
variable. Fixes [#5] (ExpediaGroup/apiary#5). Requires version greater thanv1.1.0
of https://github.com/ExpediaGroup/apiary-metastore-docker
- Pin module to use
terraform-aws-provider v1.60.0
- tag resources that were not yet applying tags - see #98.
- Updated read-only metastore whitelist environment variable name.
- Add
db_apply_immediately
variable to fix #94.
- Fixed ECS widgets in CloudWatch dashboard - see #89.
- Fixes #92.
- Option to configure shared hive databases
- Shortened the name of NLB and Target Groups to allow more characters in the instance name - see #65.
- Use MySQL script instead of Terraform provider to solve Terraform first run issue.
- Refactor ECS task definition Environment variable names.
- Migrate secrets from Hashicorp Vault to AWS SecretsManager.
- Option to enable managed S3 buckets request and data transfer metrics.
- Renamed following variables:
ecs_domain_name
toecs_domain_extension
hms_readonly_instance_count
tohms_ro_ecs_task_count
hms_readwrite_instance_count
tohms_rw_ecs_task_count
- Optimize ECS task S3 policy.