Skip to content

cloudandthings/terraform-aws-s3-inventory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Terraform AWS S3 Inventory Module

A comprehensive Terraform module for managing AWS S3 inventory configurations, including automated inventory reports, Glue catalog integration, and Athena querying capabilities.

Features

  • S3 Inventory Destination Bucket: Creates a dedicated S3 bucket for storing all inventory reports
  • S3 Inventory Management: Creates and configures S3 inventory reports for multiple source buckets
  • Glue Catalog Integration: Sets up Glue database and tables for querying inventory data
  • Unified View: Optional creation of a union view across all inventory tables for cross-bucket analysis
  • Security & Compliance: Configurable encryption, object locking, and IAM and LakeFormation permissions
  • Lifecycle Management: Automated lifecycle rules for inventory report retention

Many features are optional and can be enabled/disabled as required.


Quick Start

module "s3_inventory" {
  source  = "cloudandthings/terraform-aws-s3-inventory/aws"
  version = "~> 1.0"

  # Required variables
  inventory_bucket_name   = "my-company-s3-inventory"
  inventory_database_name = "s3_inventory_db"

  # Source buckets to inventory
  source_bucket_names = [
    "my-app-data-bucket",
    "my-logs-bucket",
    "my-backup-bucket"
  ]

  # Optional: Create a union view for cross-bucket queries
  union_view_name = "all_inventories_view"

  # Optional: Add LakeFormation permissions
  # database_admin_principals = [...]
  # database_read_principals = [...]

}

Usage

See examples dropdown on Terraform Cloud, or browse the GitHub repo.


Querying Your Inventory Data

Once deployed, you can query your S3 inventory data using Amazon Athena.

Query a single bucket's inventory:

SELECT bucket, key, size, last_modified_date, storage_class
FROM s3_inventory_db.my_app_data_bucket
WHERE dt = '2024-08-29-00-00'
ORDER BY size DESC
LIMIT 100;

Query across all buckets (using the union view):

SELECT bucket, COUNT(*) as object_count, SUM(size) as total_size, AVG(size) as avg_size FROM s3_inventory_db.all_inventories_view
WHERE dt >= '2024-08-01-00-00'
GROUP BY bucket
ORDER BY total_size DESC;

Important Considerations

Athena Partition Date Projection

As of 2025, Amazon Athena does not properly support dynamic range projection with the S3 inventory partitioning scheme. When using a dynamic range like "NOW-3MONTHS,NOW" with this module, the Glue tables will return zero rows.

To work around this limitation, this Terraform module defaults to using the beginning of the previous year as the start date. The year is calculated based on when the Terraform plan runs. For example, if today is 2025-08-25, the date range will be defaulted to "2024-01-01-00-00,NOW".

Important: This approach causes Terraform state drift annually when the year changes.

Workaround

To avoid state drift, provide a fixed start date for partition projection, such as:

athena_projection_dt_range = "2025-08-01-00-00,NOW"

Choose your start date based on either:

  • Your specific requirements
  • The date when your S3 inventories were first deployed

Costs

  • S3 inventory reports are charged per million objects listed
  • Additional S3 storage costs for inventory files
  • Athena charges apply when querying the data
  • Consider lifecycle rules to manage long-term storage costs

Contributing

Direct contributions are welcome.

See CONTRIBUTING.md for further information.


License

This project is currently unlicensed. Please contact the maintaining team to add a license.


This module was created from terraform-aws-template


Documentation


Inputs

Name Description Type Default Required
apply_default_inventory_lifecyle_rules Whether to attach default lifecycle rules to the S3 inventory bucket bool true no
athena_projection_dt_range Date range for Athena partition projection (format: START_DATE,END_DATE). If null then a value will be generated, see README for more information. string null no
attach_default_inventory_bucket_policy Whether to attach a default bucket policy to the S3 inventory bucket bool true no
create_inventory_bucket Whether to create the S3 inventory bucket bool true no
create_inventory_database Whether to create the Glue database for S3 inventory bool true no
database_admin_principals List of principal ARNs that will be allowed to manage (create, update, delete) the Glue database and its tables list(string) [] no
database_read_principals List of principal ARNs that will be allowed to read from the Glue database (query tables, describe metadata) list(string) [] no
enable_bucket_inventory_configs Whether to create S3 inventory configurations for the specified buckets bool true no
inventory_bucket_encryption_config Map containing server-side encryption configuration for the S3 inventory bucket. any {} no
inventory_bucket_lifecycle_rules List of lifecycle rules to apply to the S3 inventory bucket any [] no
inventory_bucket_name Name of the S3 inventory bucket string n/a yes
inventory_bucket_object_lock_mode Object Lock mode for the S3 inventory bucket (GOVERNANCE or COMPLIANCE) string "GOVERNANCE" no
inventory_bucket_object_lock_retention_days Number of days to retain objects with Object Lock (null to disable Object Lock) number null no
inventory_config_encryption Map containing encryption settings for the S3 inventory configuration. any {} no
inventory_config_frequency Frequency of the S3 inventory report generation string "Daily" no
inventory_config_name Name identifier for the S3 inventory configuration string "daily" no
inventory_config_object_versions Which object versions to include in the inventory report string "All" no
inventory_database_name Name of the S3 inventory Glue database string n/a yes
inventory_optional_fields List of optional fields to include in the S3 inventory report list(string)
[
"Size",
"LastModifiedDate",
"IsMultipartUploaded",
"ReplicationStatus",
"EncryptionStatus",
"BucketKeyStatus",
"StorageClass",
"IntelligentTieringAccessTier",
"ETag",
"ChecksumAlgorithm",
"ObjectLockRetainUntilDate",
"ObjectLockMode",
"ObjectLockLegalHoldStatus",
"ObjectAccessControlList",
"ObjectOwner"
]
no
source_bucket_names List of S3 bucket names to create inventory reports for list(string) [] no
union_view_name Name for the Athena view over S3 inventory data from all the source buckets string null no

Modules

Name Source Version
inventory_bucket terraform-aws-modules/s3-bucket/aws 4.6.0

Outputs

Name Description
athena_projection_dt_range The value used for projection.dt.range on the Glue table

Providers

Name Version
aws >= 5, < 7

Requirements

Name Version
terraform ~> 1.0
aws >= 5, < 7

Resources

Name Type
aws_glue_catalog_database.s3_inventory resource
aws_glue_catalog_table.s3_inventory resource
aws_glue_catalog_table.view resource
aws_lakeformation_permissions.inventory_database_admin resource
aws_lakeformation_permissions.inventory_database_read resource
aws_lakeformation_permissions.inventory_tables_admin resource
aws_lakeformation_permissions.inventory_tables_read resource
aws_s3_bucket_inventory.this resource
aws_caller_identity.current data source
aws_iam_policy_document.inventory_bucket_policy data source
aws_region.current data source

About

S3 inventory reports, including Glue catalog and Athena integration

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •