-
Notifications
You must be signed in to change notification settings - Fork 0
Add migration scripts integration tests #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: integ-migration-scripts-integration-tests
Are you sure you want to change the base?
Add migration scripts integration tests #55
Conversation
Refining structure for migration scripts
Changes: - The cardinality calculation script has been added. - `main.py` and `cardinality.py` have been combined into a single `cardinality.py` file. - A README has been added for the script. - `timestream_utils.py` has been updated. - The class `timestreamUtility` has been renamed to `TimestreamUtility`. - Static functions for validating dimension, database, and table names have been added to `TimestreamUtility`. - The function `comma_separated_list` has been added to `TimestreamUtility`. - Broken `utils` imports in `timestream_utils.py` have been fixed. - `timestream_client` renamed to `timestream_write_client`, for clarity. - On line 105, the undefined variable `parition_count` has been corrected to `partition_count`. - `timestream_utils.py` has been reformatted with `ruff`. - `s3_utils.py` has been updated. - Broken `utils` import has been fixed. - The class `s3Utility` has been changed to `S3Utility`. - `unload.py` has been updated. - References to `s3Utility` and `timestreamUtility` have been updated. - Wording of various CLI options have been fixed. - All CLI options use `-` instead of `_` between words. - `datetime` object properly used. - Reformatted with `ruff`.
* Initial commit for integrating ingestion script. Signed-off-by: forestmvey <[email protected]> * Fixing ingestion_test naming for ingestion script. Signed-off-by: forestmvey <[email protected]> * Removing unecessary deprecation warning removal. Signed-off-by: forestmvey <[email protected]> * Fixing dead references to tests. Signed-off-by: forestmvey <[email protected]> * Update to using python3 for virtual environment. Signed-off-by: forestmvey <[email protected]> * Adding directories to the appropriate migration scripts in parent README. Signed-off-by: forestmvey <[email protected]> * Configuring docker to delete influx image after successful execution. Minor documentation revisions. Signed-off-by: forestmvey <[email protected]> * Adding test for resume functionality. Signed-off-by: forestmvey <[email protected]> --------- Signed-off-by: forestmvey <[email protected]>
Adding validation script
Changes: - `transform.py` has been added, allowing already-unloaded Timestream for LiveAnalytics data to be transformed into line protocol using Athena. - `athena_utils.py` has been added. - `TimestreamUtility` now has a wrapper function for `list_tables`. - `S3Utility` now has the method `s3_bucket_exists` for checking the existence of an S3 bucket. - The region for the `S3Utility` constructor defaults to `None`, causing the AWS configured default region to be used. An `S3Utility` object can now be created simply with `S3Utility()`. - A single `requirements.txt` file has been added to `liveanalytics_migration_scripts/` and is intended to be used for all scripts within. - All other `requirements.txt` files have been removed. - The path where data would be unloaded by `unload.py` used to include a space for example, `unload-2025-05-07 22:31:43`. This space has been replaced with `-`.
… README (#49) * Initial commit for migration README, starting timestream for InfluxDB migration README. Signed-off-by: forestmvey <[email protected]> * Initial version of the Timestream for InfluxDB migration README. Signed-off-by: forestmvey <[email protected]> * Minor revisions to workflow stages and updating script names and parameters. Signed-off-by: forestmvey <[email protected]> --------- Signed-off-by: forestmvey <[email protected]>
Changes: - Add `--start-time` and `--end-time` to `cardinality.py`. - Fix typos `validation/README.md`.
Consolidating installation docs, minor cleanup
…eam for InfluxDB (awslabs#213) * Initial structure for Timestream to InfluxDB migration Refining structure for migration scripts * Add cardinality calculation script (#45) Changes: - The cardinality calculation script has been added. - `main.py` and `cardinality.py` have been combined into a single `cardinality.py` file. - A README has been added for the script. - `timestream_utils.py` has been updated. - The class `timestreamUtility` has been renamed to `TimestreamUtility`. - Static functions for validating dimension, database, and table names have been added to `TimestreamUtility`. - The function `comma_separated_list` has been added to `TimestreamUtility`. - Broken `utils` imports in `timestream_utils.py` have been fixed. - `timestream_client` renamed to `timestream_write_client`, for clarity. - On line 105, the undefined variable `parition_count` has been corrected to `partition_count`. - `timestream_utils.py` has been reformatted with `ruff`. - `s3_utils.py` has been updated. - Broken `utils` import has been fixed. - The class `s3Utility` has been changed to `S3Utility`. - `unload.py` has been updated. - References to `s3Utility` and `timestreamUtility` have been updated. - Wording of various CLI options have been fixed. - All CLI options use `-` instead of `_` between words. - `datetime` object properly used. - Reformatted with `ruff`. * Adding InfluxDB ingestion script (#46) * Initial commit for integrating ingestion script. Signed-off-by: forestmvey <[email protected]> * Fixing ingestion_test naming for ingestion script. Signed-off-by: forestmvey <[email protected]> * Removing unecessary deprecation warning removal. Signed-off-by: forestmvey <[email protected]> * Fixing dead references to tests. Signed-off-by: forestmvey <[email protected]> * Update to using python3 for virtual environment. Signed-off-by: forestmvey <[email protected]> * Adding directories to the appropriate migration scripts in parent README. Signed-off-by: forestmvey <[email protected]> * Configuring docker to delete influx image after successful execution. Minor documentation revisions. Signed-off-by: forestmvey <[email protected]> * Adding test for resume functionality. Signed-off-by: forestmvey <[email protected]> --------- Signed-off-by: forestmvey <[email protected]> * Add migration validation script (#47) Adding validation script * Add transform script (#48) Changes: - `transform.py` has been added, allowing already-unloaded Timestream for LiveAnalytics data to be transformed into line protocol using Athena. - `athena_utils.py` has been added. - `TimestreamUtility` now has a wrapper function for `list_tables`. - `S3Utility` now has the method `s3_bucket_exists` for checking the existence of an S3 bucket. - The region for the `S3Utility` constructor defaults to `None`, causing the AWS configured default region to be used. An `S3Utility` object can now be created simply with `S3Utility()`. - A single `requirements.txt` file has been added to `liveanalytics_migration_scripts/` and is intended to be used for all scripts within. - All other `requirements.txt` files have been removed. - The path where data would be unloaded by `unload.py` used to include a space for example, `unload-2025-05-07 22:31:43`. This space has been replaced with `-`. * Add LiveAnalytics migration readme and Timestream for InfluxDB target README (#49) * Initial commit for migration README, starting timestream for InfluxDB migration README. Signed-off-by: forestmvey <[email protected]> * Initial version of the Timestream for InfluxDB migration README. Signed-off-by: forestmvey <[email protected]> * Minor revisions to workflow stages and updating script names and parameters. Signed-off-by: forestmvey <[email protected]> --------- Signed-off-by: forestmvey <[email protected]> * additional rebase cleanup * Consolidating env, fixing links * Add time range to cardinality script (#52) Changes: - Add `--start-time` and `--end-time` to `cardinality.py`. - Fix typos `validation/README.md`. * Consolidating installation docs, minor cleanup (#54) Consolidating installation docs, minor cleanup * restructuring --------- Signed-off-by: forestmvey <[email protected]> Co-authored-by: Trevor Bonas <[email protected]> Co-authored-by: Forest Vey <[email protected]>
…eam-tools into dev-migration-scripts-integration-tests
Signed-off-by: forestmvey <[email protected]>
…eam-tools into dev-migration-scripts-integration-tests
…mazon-timestream-tools into dev-migration-scripts-integration-tests
… into dev-migration-scripts-integration-tests
tools/python/liveanalytics_migration_scripts/test_scripts/influxdb-restart.sh
Outdated
Show resolved
Hide resolved
tools/python/liveanalytics_migration_scripts/integration_test.py
Outdated
Show resolved
Hide resolved
…uxdb-restart.sh Co-authored-by: Forest Vey <[email protected]>
tools/python/liveanalytics_migration_scripts/integration_test.py
Outdated
Show resolved
Hide resolved
@@ -0,0 +1 @@ | |||
from .unload import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having issues executing any of the scripts using the non-module method:
python3 unload/unload.py --export-table --database InfluxDBMetrics --table boltdb_reads_total --start-time '2020-01-01 11:11:11' --enable-dynamodb-logger true
Traceback (most recent call last):
File "/Users/.../liveanalytics_migration_scripts/unload/unload.py", line 11, in <module>
from unload.utils.logger_utils import create_logger
File "/Users/.../liveanalytics_migration_scripts/unload/unload.py", line 11, in <module>
from unload.utils.logger_utils import create_logger
ModuleNotFoundError: No module named 'unload.utils'; 'unload' is not a package
I am however able to run the scripts as modules:
python3 -m unload.unload --export-table --database InfluxDBMetrics --table boltdb_reads_total --start-time '2020-01-01 11:11:11' --enable-dynamodb-logger true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed now. All scripts should be able to be executed directly, from any directory. Let me know if you if you still have any issues.
@@ -81,3 +81,40 @@ Create a virtual environment using `venv` and install required dependencies. | |||
- [Timestream for InfluxDB](./targets/timestream_for_influxdb/README.md) | |||
- [RDS for PostgreSQL](./targets/rds_for_postgresql/README.md) | |||
|
|||
## Testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on moving this testing section out into its own readme in the tests directory?
Also, I think it might make more sense to separate out integration tests for common utils (cardinality, unload) from the migration integration tests. Then, we could add in end-to-end tests which combines unload + migration. Having this kind of structure makes it easier for me to understand which flow is being tested, while making a distinction between integration/functional tests and end-to-end/smoke tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for additional context of how Im defining integration/e2e tests: something like integ/common.py (which tests cardinality, unload) and integ/timestream_for_influxdb (which tests transform+ingestion+validation) and in the future maybe integ/postgres. for end-to-end, have e2e/la_to_influx (card + unload + t/i/v) and maybe e2e/la_to_postgres. more a suggestion than anything, happy to discuss further - hopefully this will make it easier for us to think about how we go about designing the wrapper script if it comes down to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can create a README.md
in tests/
.
We can separate tests by creating multiple test cases. Currently, there is only one, MigrationTest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a README.md
for testing and tests have now been separated into test cases.
Changes:
main()
function that accepts a list of arguments. This enables testing and a single wrapper script in the future.validator.py
now returns exit code1
if any exceptions occur or if the counts don't match.sync_line_protocol_to_storage
has been added tos3_utils.py
and allows line protocol files to be downloaded to a local directory. This is crucial for integration tests to work.test_scripts
directory has been added, with scripts that set up a local InfluxDB Docker container for testing.timestream-export-logs
added to.gitignore
.