Here are the high level steps to build a data integration pipeline:
- Add new data integration pipeline
- Configure source
- Configure destination
- Run pipeline and start sync
Mage.Data.Integration.Demo_.Salesforce.Snowflake.mp4
- Open Mage in your browser and click the
[+ New pipeline]
button. - Click the dropdown menu option Data integration.
- Click the dropdown menu under Select source and choose the option you want to load data from (e.g. Amplitude).
- Depending on the chosen source, you’ll need to enter credentials and options into the section
labeled Configuration. For example, if you chose Amplitude, you’ll need to enter credentials
like this:
Best practices: you can interpolate values in the configuration using the following syntax:
api_key: abc456 secret_key: "{{ env_var('SECRET_KEY') }}"
"{{ env_var('SECRET_KEY') }}"
: this will extract the value from theSECRET_KEY
key in your environment variables."{{ variables('SECRET_KEY') }}"
: this will extract the value from theSECRET_KEY
key in your runtime variables.
- After you enter in all the credentials, click the button
[Fetch list of streams]
under the section labeled Select stream. - Shortly after clicking the above button, click the new dropdown menu under the section labeled Select stream. Then, choose the stream (aka table) you want to load data from.
After selecting a stream (aka table), you’ll need to configure the schema.
Configuring the schema informs your pipeline on which fields to synchronize, how to determine if a record is unique, and what to do if their are conflicts (aka duplicate records).
Here are the steps you can optionally go through:
- Selected field(s):
- Check the box next to the field name to include the field in your synchronization.
- Uncheck the ones you don’t want to sync.
- Field type(s)
- Each field will have a default field type.
- Add additional field types or remove them if they don’t fit your needs.
- Unique field(s)
- On the right of the field names, there is a box you can check that will determine which field(s) need to have unique values.
- If the box is un-checkable, that means you cannot use that field as a unique field.
- Bookmark field(s)
- Under the column labeled Bookmark, check the box to use the field as a way to keep track of progress during synchronization.
- Upon every synchronization, these columns are used to pick up from where the previous synchronization left off. In addition, if a synchronization fails midway, these bookmark columns are used to track the record that was most recently successful.
- Replication method
FULL_TABLE
: synchronize the entire set of records from the source.INCREMENTAL
: synchronize the records starting after the most recent bookmarked record from the previous synchronization run.
- Unique conflict method: choose how to handle duplicate records
IGNORE
: skip the new record if it’s a duplicate of an existing record.UPDATE
: update the existing record with the new record’s properties.
- Click the dropdown menu under Select destination and choose the option you want to export data to (e.g. Snowflake).
- Depending on the chosen source, you’ll need to enter credentials and options into the section
labeled Configuration. For example, if you chose Snowflake, you’ll need to enter credentials
like this:
Best practices: you can interpolate values in the configuration using the following syntax:
account: ... database: ... password: "{{ env_var('PASSWORD') }}" schema: ... table: ... username: ... warehouse: ...
"{{ env_var('PASSWORD') }}"
: this will extract the value from thePASSWORD
key in your environment variables."{{ variables('PASSWORD') }}"
: this will extract the value from thePASSWORD
key in your runtime variables.
Once you’re done configuring your pipeline, go back to the pipeline’s trigger page by clicking the name of your pipeline in your header.
The breadcrumbs in your header could look like this: Pipelines / pipeline name / Edit
.
Once you’re on the pipeline triggers page,
create a new scheduled trigger and
choose the @once
interval. For more schedules, read the other options here.
After you create a scheduled trigger, click the [Start trigger]
button at the top of the page.
You’ll see a new pipeline run appear shortly on the screen.
You can click the logs for that pipeline run to view the progress of your synchronization.
If you get stuck, run into problems, or just want someone to walk you through these steps, please join our
Slack
and someone will help you ASAP.