- Pull the
mage-airepository andcdinto it. - Run Mage in a development Docker container using
./scripts/dev.sh [PROJECT NAME]. This starts Mage and allows us to make changes in realtime. See this page for more details. - Open another terminal and run:
This will open a shell in the Docker container and allow us to interact with the integrations.
docker exec -it mage-ai-server-1 bash - Uninstall the existing
mage-integrationspackage usingpip:pip3 uninstall -y mage-integrations
cdintomage_integrations/.cd mage_integrations/
Run
touch ./mage_integrations/TEST_CATALOG.json &&
touch ./mage_integrations/TEST_CONFIG_S.json &&
touch ./mage_integrations/TEST_CONFIG_D.json &&
touch ./mage_integrations/TEST_STATE.json &&
touch ./mage_integrations/TEST_OUTPUT &&
echo "{}" >> ./mage_integrations/TEST_STATE.jsonTo create the following files:
TEST_CATALOG.json
TEST_CONFIG_S.json
TEST_CONFIG_D.json
TEST_STATE.json
TEST_OUTPUTWe'll be using these files to test the integration.
Populate TEST_CONFIG_S.json with a sample configuration, this is found at:
mage_integrations/mage_integrations/sources/[INTEGRATION]/templates/config.json
For the GitHub integration, this is:
{
"access_token": "abcdefghijklmnopqrstuvwxyz1234567890ABCD",
"repository": "mage-ai/mage-ai",
"start_date": "2021-01-01T00:00:00Z",
"request_timeout": 300,
"base_url": "https://api.github.com"
}Run the following command to discover streams for your source:
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--discover \
--discover_streamsNow, you should see a list of streams that you can sync from the source. Grab a few of interest!
The output from GitHub looks like this:
[
{
"stream": "commits",
"tap_stream_id": "commits"
},
{
"stream": "comments",
"tap_stream_id": "comments"
},
...
]Now test grabbing schemas for a few streams above and output the data to our catalog file. This should be passed, in string format, as a list of strings, e.g. '["commits", "comments"]'.
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--discover \
--selected_streams SCHEMAS > mage_integrations/TEST_CATALOG.jsonFor example, for the GitHub source:
python3 mage_integrations/sources/github/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--discover \
--selected_streams '["commits"]' > mage_integrations/TEST_CATALOG.jsonYour catalog will now contain the schemas for the streams you selected. We need to enable a schema now— this is usually handled by the Mage UI, but we can do it ourselves.
For each stream in TEST_CATALOG.json, find the nested metadata key and add a "selected": true:
...
"stream": "commits",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"table-key-properties": [
"sha"
],
"forced-replication-method": "INCREMENTAL",
"valid-replication-keys": "updated_at",
"inclusion": "available",
"selected": true
}
},
...Additonally, also add to add "selected": true to at least one column in each stream for SQL sources.
{
"stream": "commits",
"tap_stream_id": "commits",
"selected": true
}Finally! It's time to test our stream execution. Run the following command to execute the stream and save the output to a file:
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json > mage_integrations/TEST_OUTPUTCheck TEST_OUTPUT to see real-time results!
Now, let's test writing our output to a destination. Populate the destination config file with a sample configuration in a similar manner to the source config, then run:
python3 mage_integrations/destinations/postgresql/__init__.py \
--config mage_integrations/TEST_CONFIG_D.json \
--state mage_integrations/TEST_STATE.json \
--input_file_path mage_integrations/TEST_OUTPUT \
--debugTo write TEST_OUTPUT to your destination. Note: you will need a sample data source to write to.
This will test pulling from the target and writing to the destination:
python3 mage_integrations/sources/[SOURCE_INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json | python3 mage_integrations/destinations/[TARGET_INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_D.json \
--state mage_integrations/TEST_STATE.json \
--debugFor example, an end-to-end GitHub to Postgres data integration:
python3 mage_integrations/sources/github/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json | python3 mage_integrations/destinations/postgres/__init__.py \
--config mage_integrations/TEST_CONFIG_D.json \
--state mage_integrations/TEST_STATE.json \
--debugOnce you've tested your tap in the terminal, it's time to test it in Mage.
First, return to your terminal and run pip install -U mage_integrations/ in your mage_integrations directory. That will build our new mage-integrations package and make the changes you made available to the UI.
Open up Mage (localhost:3000 in dev) and create a new data integration pipeline:
Select your source from the list:
Now, perform the following in the Mage UI to verify a working source:
- Test the connection
- View and select streams
- Sync one stream to a destination
- If you're adding a tap in a PR, be sure to add logs of the source and show data in the destination table to the PR description.
- If incremental sync is supported, please also test it: check if the state is updated and fetched correctly.
You can count the number of records in your stream with the following command:
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json \
--count_records \
--selected_streams '["your_stream"]'Use this template to perform a sample query of your data:
python3 mage_integrations/sources/freshdesk/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--query_json '{"_end_date": null, "_execution_date": "2022-11-17T21:05:53.341319", "_execution_partition": "444/20221117T210443", "_start_date": null, "_limit": 1000, "_offset": 0}' \
--state mage_integrations/TEST_STATE.json
