Skip to content

Add migration scripts integration tests #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 74 commits into
base: integ-migration-scripts-integration-tests
Choose a base branch
from

Conversation

trevorbonas
Copy link

@trevorbonas trevorbonas commented May 20, 2025

Changes:

  • Integration tests have been added that run all scripts end-to-end.
  • All scripts now use absolute import paths, meaning they can be run from any directory.
  • All scripts have a new main() function that accepts a list of arguments. This enables testing and a single wrapper script in the future.
  • validator.py now returns exit code 1 if any exceptions occur or if the counts don't match.
  • The method sync_line_protocol_to_storage has been added to s3_utils.py and allows line protocol files to be downloaded to a local directory. This is crucial for integration tests to work.
  • A test_scripts directory has been added, with scripts that set up a local InfluxDB Docker container for testing.
  • timestream-export-logs added to .gitignore.

bobigbal and others added 30 commits May 2, 2025 21:45
Refining structure for migration scripts
Changes:
- The cardinality calculation script has been added.
    - `main.py` and `cardinality.py` have been combined into a single `cardinality.py` file.
    - A README has been added for the script.
- `timestream_utils.py` has been updated.
    - The class `timestreamUtility` has been renamed to `TimestreamUtility`.
    - Static functions for validating dimension, database, and table names have been added to `TimestreamUtility`.
    - The function `comma_separated_list` has been added to `TimestreamUtility`.
    - Broken `utils` imports in `timestream_utils.py` have been fixed.
    - `timestream_client` renamed to `timestream_write_client`, for clarity.
    - On line 105, the undefined variable `parition_count` has been corrected to `partition_count`.
    - `timestream_utils.py` has been reformatted with `ruff`.
- `s3_utils.py` has been updated.
    - Broken `utils` import has been fixed.
    - The class `s3Utility` has been changed to `S3Utility`.
- `unload.py` has been updated.
    - References to `s3Utility` and `timestreamUtility` have been updated.
    - Wording of various CLI options have been fixed.
    - All CLI options use `-` instead of `_` between words.
    - `datetime` object properly used.
    - Reformatted with `ruff`.
* Initial commit for integrating ingestion script.

Signed-off-by: forestmvey <[email protected]>

* Fixing ingestion_test naming for ingestion script.

Signed-off-by: forestmvey <[email protected]>

* Removing unecessary deprecation warning removal.

Signed-off-by: forestmvey <[email protected]>

* Fixing dead references to tests.

Signed-off-by: forestmvey <[email protected]>

* Update to using python3 for virtual environment.

Signed-off-by: forestmvey <[email protected]>

* Adding directories to the appropriate migration scripts in parent README.

Signed-off-by: forestmvey <[email protected]>

* Configuring docker to delete influx image after successful execution. Minor documentation revisions.

Signed-off-by: forestmvey <[email protected]>

* Adding test for resume functionality.

Signed-off-by: forestmvey <[email protected]>

---------

Signed-off-by: forestmvey <[email protected]>
Adding validation script
Changes:
- `transform.py` has been added, allowing already-unloaded Timestream for LiveAnalytics data to be transformed into line protocol using Athena.
- `athena_utils.py` has been added.
- `TimestreamUtility` now has a wrapper function for `list_tables`.
- `S3Utility` now has the method `s3_bucket_exists` for checking the existence of an S3 bucket.
- The region for the `S3Utility` constructor defaults to `None`, causing the AWS configured default region to be used. An `S3Utility` object can now be created simply with `S3Utility()`.
- A single `requirements.txt` file has been added to `liveanalytics_migration_scripts/` and is intended to be used for all scripts within.
- All other `requirements.txt` files have been removed.
- The path where data would be unloaded by `unload.py` used to include a space for example,
`unload-2025-05-07 22:31:43`. This space has been replaced with `-`.
… README (#49)

* Initial commit for migration README, starting timestream for InfluxDB migration README.

Signed-off-by: forestmvey <[email protected]>

* Initial version of the Timestream for InfluxDB migration README.

Signed-off-by: forestmvey <[email protected]>

* Minor revisions to workflow stages and updating script names and parameters.

Signed-off-by: forestmvey <[email protected]>

---------

Signed-off-by: forestmvey <[email protected]>
Changes:
- Add `--start-time` and `--end-time` to `cardinality.py`.
- Fix typos `validation/README.md`.
Consolidating installation docs, minor cleanup
…eam for InfluxDB (awslabs#213)

* Initial structure for Timestream to InfluxDB migration

Refining structure for migration scripts

* Add cardinality calculation script (#45)

Changes:
- The cardinality calculation script has been added.
    - `main.py` and `cardinality.py` have been combined into a single `cardinality.py` file.
    - A README has been added for the script.
- `timestream_utils.py` has been updated.
    - The class `timestreamUtility` has been renamed to `TimestreamUtility`.
    - Static functions for validating dimension, database, and table names have been added to `TimestreamUtility`.
    - The function `comma_separated_list` has been added to `TimestreamUtility`.
    - Broken `utils` imports in `timestream_utils.py` have been fixed.
    - `timestream_client` renamed to `timestream_write_client`, for clarity.
    - On line 105, the undefined variable `parition_count` has been corrected to `partition_count`.
    - `timestream_utils.py` has been reformatted with `ruff`.
- `s3_utils.py` has been updated.
    - Broken `utils` import has been fixed.
    - The class `s3Utility` has been changed to `S3Utility`.
- `unload.py` has been updated.
    - References to `s3Utility` and `timestreamUtility` have been updated.
    - Wording of various CLI options have been fixed.
    - All CLI options use `-` instead of `_` between words.
    - `datetime` object properly used.
    - Reformatted with `ruff`.

* Adding InfluxDB ingestion script (#46)

* Initial commit for integrating ingestion script.

Signed-off-by: forestmvey <[email protected]>

* Fixing ingestion_test naming for ingestion script.

Signed-off-by: forestmvey <[email protected]>

* Removing unecessary deprecation warning removal.

Signed-off-by: forestmvey <[email protected]>

* Fixing dead references to tests.

Signed-off-by: forestmvey <[email protected]>

* Update to using python3 for virtual environment.

Signed-off-by: forestmvey <[email protected]>

* Adding directories to the appropriate migration scripts in parent README.

Signed-off-by: forestmvey <[email protected]>

* Configuring docker to delete influx image after successful execution. Minor documentation revisions.

Signed-off-by: forestmvey <[email protected]>

* Adding test for resume functionality.

Signed-off-by: forestmvey <[email protected]>

---------

Signed-off-by: forestmvey <[email protected]>

* Add migration validation script  (#47)

Adding validation script

* Add transform script (#48)

Changes:
- `transform.py` has been added, allowing already-unloaded Timestream for LiveAnalytics data to be transformed into line protocol using Athena.
- `athena_utils.py` has been added.
- `TimestreamUtility` now has a wrapper function for `list_tables`.
- `S3Utility` now has the method `s3_bucket_exists` for checking the existence of an S3 bucket.
- The region for the `S3Utility` constructor defaults to `None`, causing the AWS configured default region to be used. An `S3Utility` object can now be created simply with `S3Utility()`.
- A single `requirements.txt` file has been added to `liveanalytics_migration_scripts/` and is intended to be used for all scripts within.
- All other `requirements.txt` files have been removed.
- The path where data would be unloaded by `unload.py` used to include a space for example,
`unload-2025-05-07 22:31:43`. This space has been replaced with `-`.

* Add LiveAnalytics migration readme and Timestream for InfluxDB target README (#49)

* Initial commit for migration README, starting timestream for InfluxDB migration README.

Signed-off-by: forestmvey <[email protected]>

* Initial version of the Timestream for InfluxDB migration README.

Signed-off-by: forestmvey <[email protected]>

* Minor revisions to workflow stages and updating script names and parameters.

Signed-off-by: forestmvey <[email protected]>

---------

Signed-off-by: forestmvey <[email protected]>

* additional rebase cleanup

* Consolidating env, fixing links

* Add time range to cardinality script (#52)

Changes:
- Add `--start-time` and `--end-time` to `cardinality.py`.
- Fix typos `validation/README.md`.

* Consolidating installation docs, minor cleanup (#54)

Consolidating installation docs, minor cleanup

* restructuring

---------

Signed-off-by: forestmvey <[email protected]>
Co-authored-by: Trevor Bonas <[email protected]>
Co-authored-by: Forest Vey <[email protected]>
…eam-tools into dev-migration-scripts-integration-tests
@trevorbonas trevorbonas marked this pull request as ready for review May 20, 2025 22:33
@@ -0,0 +1 @@
from .unload import *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having issues executing any of the scripts using the non-module method:

python3 unload/unload.py --export-table --database InfluxDBMetrics --table boltdb_reads_total --start-time '2020-01-01 11:11:11' --enable-dynamodb-logger true
Traceback (most recent call last):
  File "/Users/.../liveanalytics_migration_scripts/unload/unload.py", line 11, in <module>
    from unload.utils.logger_utils import create_logger
  File "/Users/.../liveanalytics_migration_scripts/unload/unload.py", line 11, in <module>
    from unload.utils.logger_utils import create_logger
ModuleNotFoundError: No module named 'unload.utils'; 'unload' is not a package

I am however able to run the scripts as modules:

python3 -m unload.unload --export-table --database InfluxDBMetrics --table boltdb_reads_total --start-time '2020-01-01 11:11:11' --enable-dynamodb-logger true

Copy link
Author

@trevorbonas trevorbonas May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now. All scripts should be able to be executed directly, from any directory. Let me know if you if you still have any issues.

@@ -81,3 +81,40 @@ Create a virtual environment using `venv` and install required dependencies.
- [Timestream for InfluxDB](./targets/timestream_for_influxdb/README.md)
- [RDS for PostgreSQL](./targets/rds_for_postgresql/README.md)

## Testing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on moving this testing section out into its own readme in the tests directory?

Also, I think it might make more sense to separate out integration tests for common utils (cardinality, unload) from the migration integration tests. Then, we could add in end-to-end tests which combines unload + migration. Having this kind of structure makes it easier for me to understand which flow is being tested, while making a distinction between integration/functional tests and end-to-end/smoke tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for additional context of how Im defining integration/e2e tests: something like integ/common.py (which tests cardinality, unload) and integ/timestream_for_influxdb (which tests transform+ingestion+validation) and in the future maybe integ/postgres. for end-to-end, have e2e/la_to_influx (card + unload + t/i/v) and maybe e2e/la_to_postgres. more a suggestion than anything, happy to discuss further - hopefully this will make it easier for us to think about how we go about designing the wrapper script if it comes down to it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can create a README.md in tests/.

We can separate tests by creating multiple test cases. Currently, there is only one, MigrationTest.

Copy link
Author

@trevorbonas trevorbonas May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a README.md for testing and tests have now been separated into test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants