Skip to content

Latest commit

 

History

History
186 lines (147 loc) · 6.15 KB

TUTORIAL.md

File metadata and controls

186 lines (147 loc) · 6.15 KB

Integration step-by-step tutorial

I prepared several tools for this. All tools/files are located in root directory of this project. In case of any issues do no hesitate to contact me.

1. Start services

All services used for this project are described in docker-compose.yaml file. Run the following command to start all services.

docker compose up -d

After all services are up, you can check following links:

Credentials for GoodData.CN are:

  • username: [email protected]
  • password: demo123
  • token (for REST APIs or SDKs): YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz

2. Check MindsDB, GoodData.CN content (optional)

See MindsDB Documentation and GoodData.CN Documentation or GoodData University for further overview.

3. Connect to SQL gateway and import insights

The gooddata-fdw service exposes Postgres database that allows you to import insights from GoodData.CN as foreign tables using Foreign Data Wrapper technology and consume them by other data consumers. I recommend using a database manager to connect to Postgres database. For the purpose of this demo DBeaver was used.

The Postgres is accessible with the following settings:

  • host: localhost
  • port: 2543
  • username: gooddata
  • password: gooddata123
  • database: gooddata

Executes the following command to create a connection with GoodData.CN instance.

CREATE SERVER multicorn_gooddata FOREIGN DATA WRAPPER multicorn
  OPTIONS (
    wrapper 'gooddata_fdw.GoodDataForeignDataWrapper',
    host 'http://gooddata-cn-ce:3000',
    token 'YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz',
    headers_host 'localhost'
  );

To import insights to the Postgres database as foreign tables, execute the following command:

CALL import_gooddata(workspace := 'demo', object_type := 'insights');

After these steps, you should be able to access insight revenue in time from the Postgres database. Let us try to query it!

SELECT * FROM gooddata.demo.revenue_in_time;

4. MindsDB

First, we need to add the data source we want to work with. We will add Postgres database (SQL gateway) set in the previous step using the following command in running the MindsDB instance which is accessible from http://127.0.0.1:47334/.

CREATE DATABASE fdw_gooddata
    WITH ENGINE = "postgres",
    PARAMETERS = {
        "user": "gooddata",
        "password": "gooddata123",
        "host": "gooddata-fdw", 
        "port": "5432",
        "database": "gooddata"
};

Let us check if the data source was added successfully.

SELECT * FROM fdw_gooddata.demo.revenue_in_time;

We will connect original data source connected to GoodData as well for importing predictions back to the original data source.

CREATE DATABASE ds_gooddata
    WITH ENGINE = "postgres",
    PARAMETERS = {
    "user": "demouser",
    "password": "demopass",
    "host": "gooddata-cn-ce",
    "port": "5432",
    "database": "demo"
};

Now we are ready to create a model for predicting revenue. We want to forecast revenue for each category in month-time granularity. The query below creates such a model.

CREATE PREDICTOR mindsdb.forecast_revenue
FROM fdw_gooddata
(SELECT date_month, region, revenue FROM demo.revenue_in_time)
PREDICT revenue

ORDER BY date_month
GROUP BY region

WINDOW 50
HORIZON 12;

The command triggers the training of the model. It is good to describe keywords WINDOW and HORIZON. WINDOW keyword specifies the number of rows to "look back" into when making a prediction, and the HORIZON keyword specifies the number of future predictions. We can check the progress of the predictor in the following table. We are waiting for the predictor's status to be completed.

SELECT *
FROM mindsdb.predictors
WHERE name = 'forecast_revenue';

When training of the predictor is finished. Let us show the forecast of revenue for the South region.

SELECT m.date_month as date,
m.revenue as revenue
FROM mindsdb.forecast_revenue as m
JOIN fdw_gooddata.demo.revenue_in_time as t
WHERE t.date_month > LATEST AND t.region = 'South'
LIMIT 4;

As we can see, we can receive a forecast for the South region. Let us visualize it back in GoodData. For that purpose, we will create a table in origin data source "ds_gooddata" with predictions.

CREATE OR REPLACE TABLE ds_gooddata.demo.forecast (
    SELECT m.date_month as prediction_date_month, 
           m.revenue as prediction, 
           m.region as prediction_region, 
           t.revenue as revenue 
           FROM mindsdb.forecast_revenue as m 
    JOIN fdw_gooddata.demo.revenue_in_time as t
);

5. Prediction embedding

Our predictions are stored in the original data source. The original data source is accessible with the following settings:

  • host: localhost
  • port: 5432
  • username: demouser
  • password: demopass
  • database: demo

Let us check that the forecast table was created back in original data source.

SELECT * FROM demo.demo.forecast f;

We can see that the structure of imported predictions is not ideal. Let us create a view that we can use back in visualization.

CREATE OR REPLACE 
VIEW demo.revenue_forecast AS
SELECT
    CAST(f.prediction_date_month  AS DATE) AS prediction_date,
    f.prediction_region AS prediction_region,
    CAST(f.revenue  AS FLOAT) AS revenue,
    CASE
        WHEN f.revenue IS NULL THEN f.prediction
        ELSE NULL
        END AS revenue_forecast
FROM
    demo.demo.forecast f
WHERE
    f.prediction_region IS NOT NULL;

6. Visualization of predictions in GoodData.CN

There is a Jupyter notebook setup_data.ipynb for setting up new LDM, creating insight with predictions and updating a dashboard. Execute the code in the notebook, and you should see the following insight with predictions in the GoodData.CN. The actual revenue is distinguished from the predicted revenue by color.

Visualization of past and predicted revenue