Skip to content
This repository was archived by the owner on May 1, 2024. It is now read-only.

GA360 Importing into Snowflake #715

Closed
wants to merge 1 commit into from
Closed

Conversation

pwnage101
Copy link
Contributor

@pwnage101 pwnage101 commented Apr 3, 2019

DE-1374

analytics-secure PR: https://github.com/edx-ops/analytics-secure/pull/234

Analytics Pipeline Pull Request

Make sure that the following steps are done before merging:

  • If you have a migration please contact data engineering team before merging.
  • Before merging run full acceptance tests suite and provide URL for the acceptance tests run.
  • A member of data engineering team has approved the pull request.

@pwnage101 pwnage101 force-pushed the pwnage101/snowflake branch 14 times, most recently from 0009464 to 3a66285 Compare April 11, 2019 18:20
@pwnage101 pwnage101 force-pushed the pwnage101/snowflake branch 4 times, most recently from dbc02ba to d2497bf Compare April 18, 2019 16:58
@pwnage101 pwnage101 changed the title Pwnage101/snowflake GA360 Importing into Snowflake Apr 18, 2019
@pwnage101 pwnage101 force-pushed the pwnage101/snowflake branch 2 times, most recently from c1b5ba7 to 0ea353b Compare April 18, 2019 17:50
@pwnage101 pwnage101 assigned macdiesel and unassigned macdiesel Apr 18, 2019
@pwnage101 pwnage101 requested a review from macdiesel April 18, 2019 17:53
"""
The null sequence in the data to be copied. Default is Hive NULL (\\N).
"""
return r'\\N'
Copy link
Contributor Author

@pwnage101 pwnage101 Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewers: I just moved these two methods into an abstract subclass.

@@ -75,6 +75,7 @@ edx.analytics.tasks =
run-vertica-sql-scripts = edx.analytics.tasks.warehouse.run_vertica_sql_scripts:RunVerticaSqlScriptTask
test-vertica-sqoop = edx.analytics.tasks.common.vertica_export:VerticaSchemaToBigQueryTask
load-ga-permissions = edx.analytics.tasks.warehouse.load_ga_permissions:LoadGoogleAnalyticsPermissionsWorkflow
ga-imports = edx.analytics.tasks.warehouse.ga_imports:SnowflakeLoadGATask
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure which class to reference here. It seems to work as-is, despite the fact that i'm calling a different task from the command line (SnowflakeLoadGAIntervalTask).

@pwnage101 pwnage101 force-pushed the pwnage101/snowflake branch from 57c5d25 to bff93e8 Compare April 18, 2019 18:37
@pwnage101 pwnage101 force-pushed the pwnage101/snowflake branch from bff93e8 to 25c64d2 Compare April 18, 2019 18:40
@codecov
Copy link

codecov bot commented Apr 18, 2019

Codecov Report

Merging #715 into master will decrease coverage by 0.23%.
The diff coverage is 49.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #715      +/-   ##
==========================================
- Coverage   75.18%   74.95%   -0.24%     
==========================================
  Files         203      205       +2     
  Lines       22888    23073     +185     
==========================================
+ Hits        17209    17294      +85     
- Misses       5679     5779     +100
Impacted Files Coverage Δ
edx/analytics/tasks/common/bigquery_load.py 25.75% <0%> (ø) ⬆️
edx/analytics/tasks/util/tests/test_s3_util.py 100% <100%> (ø) ⬆️
...lytics/tasks/warehouse/load_warehouse_snowflake.py 47.71% <100%> (ø) ⬆️
...x/analytics/tasks/warehouse/load_ga_permissions.py 48.19% <100%> (ø) ⬆️
edx/analytics/tasks/util/s3_util.py 53.62% <27.27%> (-10.56%) ⬇️
edx/analytics/tasks/common/snowflake_load.py 31.39% <38.46%> (+1.58%) ⬆️
edx/analytics/tasks/util/url.py 76.74% <40%> (-15.06%) ⬇️
edx/analytics/tasks/common/bigquery_extract.py 42.2% <42.2%> (ø)
edx/analytics/tasks/warehouse/ga_imports.py 75.67% <75.67%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad0d76e...25c64d2. Read the comment docs.

if not aws_secret_access_key:
aws_secret_access_key = self._get_s3_config('aws_secret_access_key')
if 'host' not in kwargs:
kwargs['host'] = self._get_s3_config('host') or 's3.amazonaws.com'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewers: All of this logic was removed because:

  1. The aws access/secret key was duplicate code (this logic already exists inside of the luigi.contrib.s3 library).
  2. The host argument can be supplied in the luigi .cfg file instead of in code. Besides, once we upgrade to boto3 there will be no need for setting this default.

@property
def output_format(self):
"""
The type of compression to use for the output files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean the output format here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the output file format, or the compression format for the output file, same thing right?

Copy link
Contributor

@HassanJaveed84 HassanJaveed84 Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found it a little ambiguous, as both output_compression_type and output_format methods have the same docstrings. One is the exported file format and the other is the compression type to use for exported files. Just thought we could use different docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, whoops, fixing

@pwnage101
Copy link
Contributor Author

Per a conversation with @brianhw I'm going to split this up into two branches: one including the updates that don't require a newer Luigi, and one including updates that do require a newer Luigi.

@pwnage101
Copy link
Contributor Author

Closing in favor of https://github.com/edx/edx-analytics-pipeline/pull/721

@pwnage101 pwnage101 closed this Apr 19, 2019
@pwnage101 pwnage101 deleted the pwnage101/snowflake branch April 19, 2019 17:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants