From 23d73b8a21d3760a5f6f88982bd8875575bf5d52 Mon Sep 17 00:00:00 2001 From: Vineet Kumar Date: Wed, 13 Oct 2021 09:59:08 +0000 Subject: [PATCH] GitBook: [#59] v0.3 --- README.md | 1 - anomalies.md | 1 - anomaly-definitions.md | 14 ++--- anomaly-detection.md | 1 - datasets.md | 3 +- development.md | 140 ++++++----------------------------------- getting-started.md | 16 +++-- installation.md | 38 +++++------ root-cause-analysis.md | 7 +-- settings.md | 27 ++++++++ sources.md | 1 - why-cueobserve.md | 1 - 12 files changed, 81 insertions(+), 169 deletions(-) diff --git a/README.md b/README.md index e7cffa5..b970801 100644 --- a/README.md +++ b/README.md @@ -54,4 +54,3 @@ When an anomaly detection job runs, CueObserve does the following: ## Support For general help using CueLake, read the documentation, or go to [Github Discussions](https://github.com/cuebook/CueObserve/discussions). To report a bug or request a feature, open a [Github issue](https://github.com/cuebook/CueObserve/issues). - diff --git a/anomalies.md b/anomalies.md index b267148..0faaf9e 100644 --- a/anomalies.md +++ b/anomalies.md @@ -13,4 +13,3 @@ Anomaly cards follow a template. If you want, you can modify the templates. ![Hourly Anomaly card](.gitbook/assets/anomalycard_hourly_cropped.png) ![Daily Anomaly card](.gitbook/assets/anomalycard_daily_cropped.png) - diff --git a/anomaly-definitions.md b/anomaly-definitions.md index 7c7ff34..49a1137 100644 --- a/anomaly-definitions.md +++ b/anomaly-definitions.md @@ -6,14 +6,14 @@ To define an anomaly job, you 1. Select a dataset 2. Select a measure from the dataset -3. Select a dimension to split the measure _\(optional\)_ +3. Select a dimension to split the measure _(optional)_ 4. Select an anomaly rule ![](.gitbook/assets/anomalydefinitions.png) ## Split Measure by Dimension -`Measure` \[`Dimension` `Limit` \] \[`High/Low`\] +`Measure` \[`Dimension` `Limit` ] \[`High/Low`] To split a measure by a dimension, select the dimension and then limit the number of unique dimension values you want to split into. @@ -47,7 +47,7 @@ Minimum Average Value limits the number of dimension values based on the measure ![](.gitbook/assets/minavgvalue.png) -In the example above, only states where _average\(Orders\) >= 10_ will be selected. If your granularity is daily, this means daily average orders. If your granularity is hourly, this means hourly average orders. +In the example above, only states where _average(Orders) >= 10_ will be selected. If your granularity is daily, this means daily average orders. If your granularity is hourly, this means hourly average orders. ## Anomaly Detection Algorithms @@ -60,9 +60,9 @@ CueObserve offers the following algorithms for anomaly detection. ### Prophet -This algorithm uses the open-source [Prophet](https://github.com/facebook/prophet) procedure to generate a forecast for the timeseries. It then compares the actual value with the forecasted value. If the actual value is outside the forecast's confidence range \(_grey band in the image below_\), it marks the actual value as an anomalous data point. +This algorithm uses the open-source [Prophet](https://github.com/facebook/prophet) procedure to generate a forecast for the timeseries. It then compares the actual value with the forecasted value. If the actual value is outside the forecast's confidence range (_grey band in the image below_), it marks the actual value as an anomalous data point. -The metric's percentage deviation \(_45% in the image below_\) is calculated with respect to the threshold of the forecast's confidence range. +The metric's percentage deviation (_45% in the image below_) is calculated with respect to the threshold of the forecast's confidence range. ![](.gitbook/assets/anomalydeviation.png) @@ -84,7 +84,5 @@ _Anomaly when Value greater than `X`_ _Anomaly when Value not between `X` and `Y`_ -\_\_ - - +__ diff --git a/anomaly-detection.md b/anomaly-detection.md index f2b09a1..70ab82f 100644 --- a/anomaly-detection.md +++ b/anomaly-detection.md @@ -29,4 +29,3 @@ Next CueObserve combines the actual data with the forecasted data from Prophet a CueObserve saves the actual data with the bands and the forecast in its database. If the latest anomalous data point is not older than a certain time threshold, CueObserve publishes it as an anomaly and saves the dimension value and its contribution. The aforementioned time threshold depends on the granularity. It is 5 days if the granularity is daily and 1 day if the granularity is hourly. Finally, CueObserve stores all the individual results of the process along with the metadata in a format for easy visual representation in the UI. - diff --git a/datasets.md b/datasets.md index 04519b1..95038a3 100644 --- a/datasets.md +++ b/datasets.md @@ -10,7 +10,7 @@ You write a SQL GROUP BY query with aggregate functions to roll-up your data. Yo 1. Dataset must have only one timestamp column. This timestamp column is used to generate timeseries data for anomaly detection. 2. Dataset must have at least one aggregate column. CueObserve currently supports only COUNT or SUM as aggregate functions. Aggregate columns must be mapped as measures. -3. Dataset can have one or more dimension columns \(optional\). +3. Dataset can have one or more dimension columns (optional). ## SQL GROUP BY Query @@ -30,4 +30,3 @@ ORDER BY 1 ``` Since the last time bucket might be partial, CueObserve ignores the last time bucket when generating timeseries. - diff --git a/development.md b/development.md index 2f53425..a4f050b 100644 --- a/development.md +++ b/development.md @@ -9,64 +9,34 @@ description: >- ### Overview -CueObserve has 5 basic components: +CueObserve has multi-service architecture, with services as mentioned: -1. Frontend single-page application written on [ReactJS](https://reactjs.org/). -2. Backend based on [Django](https://www.djangoproject.com/) \(python framework\), which is responsible for the communication with the frontend application via REST APIs. -3. [Celery](https://docs.celeryproject.org/) to execute the tasks asynchronously. Tasks like anomaly detection are handled by Celery. -4. [Celery beat](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html) scheduler to trigger the scheduled tasks. -5. [Redis](https://redis.io/documentation) to handle the task queue of Celery. +1. `Frontend` single-page application written on [ReactJS](https://reactjs.org). It's code can be found in `ui` folder and runs on [http://localhost:3000/](https://reactjs.org). +2. `API` is based on [Django](https://www.djangoproject.com) (python framework) & uses REST API. It is the main service, responsible for connections, authentication and anomaly. +3. `Alerts` micro-service, currently responsible for sending alerting/notifications only to slack. It's code is in `alerts-api` folder and runs on [localhost:8100](http://localhost:8100). +4. [Celery](https://docs.celeryproject.org) to execute the tasks asynchronously. Tasks like anomaly detection are handled by Celery. +5. [Celery beat](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html) scheduler to trigger the scheduled tasks. +6. [Redis](https://redis.io/documentation) to handle the task queue of Celery. -### Getting code +### Getting code & starting development servers Get the code by cloning our open source [github repo](https://github.com/cuebook/cueobserve) -```text +``` git clone https://github.com/cuebook/CueObserve.git cd CueObserve +docker-compose -f docker-compose-dev.yml --env-file .env up --build ``` -### Frontend Development - -The code for frontend is in `/ui` directory. CueObserve uses `npm` as the package manager. - -**Prerequisites:** - -1. Node >= 12 -2. npm >= 6 - -```bash -cd ui -npm install # install dependencies -npm start # start development server -``` - -This starts the frontend server on [http://localhost:3000/](https://reactjs.org/) +`docker-compose`'s build command will pull several components and install them on local, so this will take a few minutes to complete. ### Backend Development The code for the backend is in `/api` directory. As mentioned in the overview it is based on Django framework. -**Prerequisite:** - -1. Python 3.7 -2. PostgreSQL Server running locally or on server \(Optional\) - -#### Setup Virtual Environment & Install Dependencies - -Setting up a virtual environment is necessary to have your python libraries for this project stored separately so that there is no conflict with other projects. - -```bash -cd api -python3 -m virtualenv myenv # Create Python3 virtual environment -source myenv/bin/activate # Activate virtual environment - -pip install -r requirements.txt # Install project dependencies -``` - #### Configure environment variables -The environment variables required to run the backend server can be found in `api/.env.dev`. The file looks like below: +Configure environment variables as you need for the backend server : ```bash export ENVIRONMENT=dev @@ -84,97 +54,23 @@ export DJANGO_SUPERUSER_PASSWORD="admin" export DJANGO_SUPERUSER_EMAIL="admin@domain.com" ## AUTHENTICATION -export IS_AUTHENTICATION_REQUIRED=False +export `=False ``` Change the values based on your running PostgreSQL instance. If you do not wish to use PostgreSQL as your database for development, comment lines 4-8 and CueObserve will create a SQLite database file at the location `api/db/db.sqlite3`. -After changing the values, source the file to initialize all the environment variables. - -```text -source .env.dev -``` - -Then run the following commands to migrate the schema to your database and load static data required by CueObserve: - -```bash -python manage.py migrate # Migrate db schema -python manage.py loaddata seeddata/*.json # Load seed data in database -``` - -After the above steps are completed successfully, we can start our backend server by running: - -```text -python manage.py runserver -``` - -This starts the backend server on [http://localhost:8000/](https://reactjs.org/). +The backend server can be accessed on [http://localhost:8000/](https://www.djangoproject.com). #### Celery Development -CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks. - -**Starting Redis Server** - -Redis server can be easily started by its official docker image. - -```bash -docker run -dp 6379:6379 redis # Run redis docker on port 6379 -``` - -#### Start Celery Beat - -To start celery beat service, activate the virtual environment created for the backend server and then source the .env.dev file to export all required environment variables. - -```bash -cd api -source myenv/bin/activate # Activate virtual environment -source .env.dev # Export environment variables. -celery -A app beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --detach # Run celery beat service -``` - -#### Start Celery - -To start the celery service, its same as backend or celery beat, first activate the virual env created and then source .env.dev file to export all required environment variables. Celery service doesn't reloads on code changes so we have to install some additional libraries to make it happen. - -```text -cd api -source myenv/bin/activate # Activate virtual environment -source .env.dev # Export environment variables - -pip install watchdog pyyaml argh # Additional libraries to reload celery on code changes -watchmedo auto-restart -- celery -A app worker -l info --purge # Run celery -``` - -After these three services are running, you can trigger a task or wait for a scheduled task to run. - -### Building Docker Image - -To build the docker image, run the following command in root directory: - -```text -docker build -t . -``` - -To run the built image exposed on port 3000: - -```text -docker run -dp 3000:3000 -``` +CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks. ### Testing At the moment, we have test cases only for the backend service, test cases for UI are in our roadmap. -Backend test environment is light and doesn't depend on services like Redis, Celery or Celery-Beat, they are mocked instead. Backend for API and services is tested using [PyTest](https://docs.pytest.org/en/6.2.x/). - - To run the test cases virtual environment should be activated and then source .env.dev file to export all required environment variables. +Backend for API and services is tested using [PyTest](https://docs.pytest.org/en/6.2.x/). To run test cases `exec` into cueo-backend and run command -```text -cd api -source myenv/bin/activate # Activate virtual environment -source .env.dev # Export environment variables - -pytest # Run tests ``` - +pytest +``` diff --git a/getting-started.md b/getting-started.md index b620368..ff3510c 100644 --- a/getting-started.md +++ b/getting-started.md @@ -2,21 +2,26 @@ ## Install via Docker-Compose +``` +wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml +docker-compose -f cueobserve-docker-compose.yml up -d +``` + **Development Mode:** -```text +``` docker-compose -f docker-compose-dev.yml up -d ``` **OR Production Mode:** -```text +``` docker-compose up -d ``` -**OR** Install via Docker **\(Deprecated Method\)** +**OR** Install via Docker **(Deprecated Method)** -```text +``` docker run -p 3000:3000 cuebook/cueobserve ``` @@ -26,7 +31,7 @@ Now visit [localhost:3000](http://localhost:3000) in your browser. Go to the Connections screen to create a connection. -![](.gitbook/assets/addconnection%20%281%29.png) +![](<.gitbook/assets/addconnection (1).png>) ## Add Dataset @@ -41,4 +46,3 @@ Once you have created an anomaly job, click on the \`Run\` icon button to trigge ![](.gitbook/assets/anomalydefinitions.png) Once the job is successful, go to the Anomalies screen to view your anomalies. - diff --git a/installation.md b/installation.md index 087f5f7..d796f02 100644 --- a/installation.md +++ b/installation.md @@ -2,25 +2,20 @@ ## Install via Docker -```text -docker run -p 3000:3000 cuebook/cueobserve +``` +wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml +docker-compose -f cueobserve-docker-compose.yml up -d ``` Now visit [localhost:3000](http://localhost:3000) in your browser. -By default, CueObserve uses sqlite as its database \(not recommended for production use, please refer below to use Postgres as the database for CueObserve\). If you want data to persist across runs, specify a local folder location \(as below\) where db.sqlite3 file can be stored. - -```text -docker run -v :/code/db -p 3000:3000 cuebook/cueobserve -``` - ## Use Postgres as the application database SQLite is the default storage database for CueObserve. However, it might not be suitable for production. To use Postgres instead, do the following: Create a `.env` file with given variables: -```text +``` POSTGRES_DB_SCHEMA=cueobserve POSTGRES_DB_USERNAME=postgres POSTGRES_DB_PASSWORD=postgres @@ -28,40 +23,40 @@ POSTGRES_DB_HOST=localhost POSTGRES_DB_PORT=5432 ``` -```text -docker run --env-file .env -dp 3000:3000 cuebook/cueobserve ``` - -In case your Postgres is hosted locally, pass the flag `--network="host"` to connect docker to the localhost of the machine. +wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml +docker-compose --env-file .env -f cueobserve-docker-compose.yml up -d +``` ## Authentication -CueObserve comes with built-in authentication \(powered by Django\). By default authentication is disabled, to enable authentication create a `.env` file with the given variables or add these variables in the already created `.env` file with Postgres credentials. +CueObserve comes with built-in authentication (powered by Django). By default authentication is disabled, to enable authentication create a `.env` file with the given variables or add these variables in the already created `.env` file with Postgres credentials. -```text +``` DJANGO_SUPERUSER_USERNAME= DJANGO_SUPERUSER_PASSWORD= DJANGO_SUPERUSER_EMAIL= IS_AUTHENTICATION_REQUIRED=True ``` -```text -docker run --env-file .env -dp 3000:3000 cuebook/cueobserve +``` +wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml +docker-compose --env-file .env -f cueobserve-docker-compose.yml up -d ``` If authentication is enabled you can access the [Django Admin](https://docs.djangoproject.com/en/3.2/ref/contrib/admin/) console to do the database operations with a nice UI. To access Django Admin go to [http://localhost:3000/admin](http://localhost:3000/admin) and enter the username and password provided in the `.env` file. ## Email Notification -CueObserve comes with built-in email alert notification system\(powered by Django\). By default email notifications are disabled, to enable notifications create a `.env` file with the given variables or add these variables in the already created `.env` file. +CueObserve comes with built-in email alert notification system(powered by Django). By default email notifications are disabled, to enable notifications create a `.env` file with the given variables or add these variables in the already created `.env` file. -```text +``` EMAIL_HOST="smtp.gmail.com" EMAIL_HOST_USER= EMAIL_HOST_PASSWORD= ``` -Allow less secure apps: ON for your given EMAIL\_HOST\_USER email Id, click on [enable access to less secure app](https://myaccount.google.com/lesssecureapps?pli=1&rapt=AEjHL4N7wse3vhCsvRv-aWy8kKeEGDZS2YDbW1SfTL17HVhtemi7zZW5gzbZSBnrNgknL_gPBDn3xVo0qUj-W6NuaYTSU7agQQ) +Allow less secure apps: ON for your given EMAIL_HOST_USER email Id, click on [enable access to less secure app](https://myaccount.google.com/lesssecureapps?pli=1\&rapt=AEjHL4N7wse3vhCsvRv-aWy8kKeEGDZS2YDbW1SfTL17HVhtemi7zZW5gzbZSBnrNgknL_gPBDn3xVo0qUj-W6NuaYTSU7agQQ) Unlock Captcha for your gmail account, click on [Unlock Captcha](https://accounts.google.com/b/0/UnlockCaptcha) @@ -69,5 +64,4 @@ Unlock Captcha for your gmail account, click on [Unlock Captcha](https://account ## Infra Requirements -The minimum infrastructure requirement for CueObserve is _1 GB RAM/ 1 CPU_. If Multiple CPUs\(cores\) are provided, they can be utilized by tasks like Anomaly Detection & Root Cause Analysis for faster processing. - +The minimum infrastructure requirement for CueObserve is _1 GB RAM/ 1 CPU_. If Multiple CPUs(cores) are provided, they can be utilized by tasks like Anomaly Detection & Root Cause Analysis for faster processing. diff --git a/root-cause-analysis.md b/root-cause-analysis.md index dbf66db..59bbb84 100644 --- a/root-cause-analysis.md +++ b/root-cause-analysis.md @@ -8,7 +8,7 @@ To do root cause analysis on an anomaly card, click the `Analyze` button in the CueObserve picks the latest anomalous data point as the parent anomaly. It then starts looking for child anomalies across all other dimensions in the dataset. -Say you an anomaly card \(below\) where Orders measure for state = TX had an anomaly on 2021-08-15. The actual number of orders was 1770, which was 45% higher than the expected value. When I click the `Analyze` button, CueObserve starts analyzing the dataset. +Say you an anomaly card (below) where Orders measure for state = TX had an anomaly on 2021-08-15. The actual number of orders was 1770, which was 45% higher than the expected value. When I click the `Analyze` button, CueObserve starts analyzing the dataset. ![](.gitbook/assets/rca_analyze.png) @@ -24,7 +24,7 @@ The orders dataset has 2 additional dimensions - _Brand_ and _Color_. It splits In the RCA results table, each child anomaly appears as a row. -In the example above, the anomalous segment of _Brand = None_ is equivalent to the dataset filter of _\(state = TX and Brand = None\)_. +In the example above, the anomalous segment of _Brand = None_ is equivalent to the dataset filter of _(state = TX and Brand = None)_. The contribution percentages displayed are with respect to the parent anomaly. Remember the parent anomaly had Orders as 1770. @@ -32,7 +32,7 @@ The contribution percentages displayed are with respect to the parent anomaly. R Under the hood, CueObserve does the following: -It takes the latest anomalous data point from the card. The anomalous data point is defined by its \(X, Y\) values: +It takes the latest anomalous data point from the card. The anomalous data point is defined by its (X, Y) values: * X value: time period * Y value: measure for a dimension value. e.g. Orders where state = TX @@ -40,4 +40,3 @@ It takes the latest anomalous data point from the card. The anomalous data point It applies the dimension value filter on the original dataset. It then runs anomaly detection jobs on every other dimension in this filtered dataset. It splits the filtered dataset by each dimension. It inherits the split limit from the original anomaly definition. If the original anomaly definition doesn't have a split, it limits the split to dimension values that have a minimum contribution of 1% to the filtered measure value. - diff --git a/settings.md b/settings.md index 3eee75d..356f1ee 100644 --- a/settings.md +++ b/settings.md @@ -27,5 +27,32 @@ Next, create two channels in Slack. Add the app to these two channels. ![](.gitbook/assets/screenshot-from-2021-08-26-17-52-09.png) +## Webhook URL +CueObserve supports Webhook URL for receiving alert messages. There are two type of alerts : +1. Anomaly alerts, which are sent when an anomaly is detected in data. The response will have json data (as below) and _base64_ encoded image. + + + + ``` + { + "subject": subject, + "message": message, + "details": details, + "AnomalyDefinitionId": anomalyDefId, + "AnomalyId": anomalyId, + } + ``` +2. App Monitoring alerts, which are sent when an anomaly detection job fails. The response will have json data as formatted: + + + + ``` + { + "subject": subject, + "message": message + } + ``` + +To subscribe to these alerts, configure your Webhook URL in CueObserve _settings screen_. diff --git a/sources.md b/sources.md index 205331b..4eb4bed 100644 --- a/sources.md +++ b/sources.md @@ -122,4 +122,3 @@ WHERE CreatedTS >= DATEADD(DAY, -400, cast(GETDATE() as date)) -- limit historic GROUP BY format(CreatedTS,'yyyy-MM-dd 00:00:00'), City, State ORDER BY 1 ``` - diff --git a/why-cueobserve.md b/why-cueobserve.md index 1f47647..674eaaa 100644 --- a/why-cueobserve.md +++ b/why-cueobserve.md @@ -19,4 +19,3 @@ Since the data consumer can choose which metrics to monitor and how deep to moni ## One-click Root Cause Analysis Data consumers can do root cause analysis on anomalies with just one click. This reduces time to action. -