Skip to content

Commit

Permalink
Merge pull request #214 from cuebook/master
Browse files Browse the repository at this point in the history
Merging updated gitbook documentation
  • Loading branch information
vincue authored Dec 23, 2021
2 parents eb995e4 + ba28e1e commit a305178
Show file tree
Hide file tree
Showing 11 changed files with 81 additions and 168 deletions.
1 change: 0 additions & 1 deletion anomalies.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,3 @@ Anomaly cards follow a template. If you want, you can modify the templates.
![Hourly Anomaly card](.gitbook/assets/anomalycard_hourly_cropped.png)

![Daily Anomaly card](.gitbook/assets/anomalycard_daily_cropped.png)

14 changes: 6 additions & 8 deletions anomaly-definitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@ To define an anomaly job, you

1. Select a dataset
2. Select a measure from the dataset
3. Select a dimension to split the measure _\(optional\)_
3. Select a dimension to split the measure _(optional)_
4. Select an anomaly rule

![](.gitbook/assets/anomalydefinitions.png)

## Split Measure by Dimension

`Measure` \[`Dimension` `Limit` \] \[`High/Low`\]
`Measure` \[`Dimension` `Limit` ] \[`High/Low`]

To split a measure by a dimension, select the dimension and then limit the number of unique dimension values you want to split into.

Expand Down Expand Up @@ -47,7 +47,7 @@ Minimum Average Value limits the number of dimension values based on the measure

![](.gitbook/assets/minavgvalue.png)

In the example above, only states where _average\(Orders\) >= 10_ will be selected. If your granularity is daily, this means daily average orders. If your granularity is hourly, this means hourly average orders.
In the example above, only states where _average(Orders) >= 10_ will be selected. If your granularity is daily, this means daily average orders. If your granularity is hourly, this means hourly average orders.

## Anomaly Detection Algorithms

Expand All @@ -60,9 +60,9 @@ CueObserve offers the following algorithms for anomaly detection.

### Prophet

This algorithm uses the open-source [Prophet](https://github.com/facebook/prophet) procedure to generate a forecast for the timeseries. It then compares the actual value with the forecasted value. If the actual value is outside the forecast's confidence range \(_grey band in the image below_\), it marks the actual value as an anomalous data point.
This algorithm uses the open-source [Prophet](https://github.com/facebook/prophet) procedure to generate a forecast for the timeseries. It then compares the actual value with the forecasted value. If the actual value is outside the forecast's confidence range (_grey band in the image below_), it marks the actual value as an anomalous data point.

The metric's percentage deviation \(_45% in the image below_\) is calculated with respect to the threshold of the forecast's confidence range.
The metric's percentage deviation (_45% in the image below_) is calculated with respect to the threshold of the forecast's confidence range.

![](.gitbook/assets/anomalydeviation.png)

Expand All @@ -84,7 +84,5 @@ _Anomaly when Value greater than `X`_

_Anomaly when Value not between `X` and `Y`_

\_\_


__

1 change: 0 additions & 1 deletion anomaly-detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,3 @@ Next CueObserve combines the actual data with the forecasted data from Prophet a
CueObserve saves the actual data with the bands and the forecast in its database. If the latest anomalous data point is not older than a certain time threshold, CueObserve publishes it as an anomaly and saves the dimension value and its contribution. The aforementioned time threshold depends on the granularity. It is 5 days if the granularity is daily and 1 day if the granularity is hourly.

Finally, CueObserve stores all the individual results of the process along with the metadata in a format for easy visual representation in the UI.

3 changes: 1 addition & 2 deletions datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ You write a SQL GROUP BY query with aggregate functions to roll-up your data. Yo

1. Dataset must have only one timestamp column. This timestamp column is used to generate timeseries data for anomaly detection.
2. Dataset must have at least one aggregate column. CueObserve currently supports only COUNT or SUM as aggregate functions. Aggregate columns must be mapped as measures.
3. Dataset can have one or more dimension columns \(optional\).
3. Dataset can have one or more dimension columns (optional).

## SQL GROUP BY Query

Expand All @@ -30,4 +30,3 @@ ORDER BY 1
```

Since the last time bucket might be partial, CueObserve ignores the last time bucket when generating timeseries.

140 changes: 18 additions & 122 deletions development.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,64 +9,34 @@ description: >-

### Overview

CueObserve has 5 basic components:
CueObserve has multi-service architecture, with services as mentioned:

1. Frontend single-page application written on [ReactJS](https://reactjs.org/).
2. Backend based on [Django](https://www.djangoproject.com/) \(python framework\), which is responsible for the communication with the frontend application via REST APIs.
3. [Celery](https://docs.celeryproject.org/) to execute the tasks asynchronously. Tasks like anomaly detection are handled by Celery.
4. [Celery beat](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html) scheduler to trigger the scheduled tasks.
5. [Redis](https://redis.io/documentation) to handle the task queue of Celery.
1. `Frontend` single-page application written on [ReactJS](https://reactjs.org). It's code can be found in `ui` folder and runs on [http://localhost:3000/](https://reactjs.org).
2. `API` is based on [Django](https://www.djangoproject.com) (python framework) & uses REST API. It is the main service, responsible for connections, authentication and anomaly.
3. `Alerts` micro-service, currently responsible for sending alerting/notifications only to slack. It's code is in `alerts-api` folder and runs on [localhost:8100](http://localhost:8100).
4. [Celery](https://docs.celeryproject.org) to execute the tasks asynchronously. Tasks like anomaly detection are handled by Celery.
5. [Celery beat](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html) scheduler to trigger the scheduled tasks.
6. [Redis](https://redis.io/documentation) to handle the task queue of Celery.

### Getting code
### Getting code & starting development servers

Get the code by cloning our open source [github repo](https://github.com/cuebook/cueobserve)

```text
```
git clone https://github.com/cuebook/CueObserve.git
cd CueObserve
docker-compose -f docker-compose-dev.yml --env-file .env up --build
```

### Frontend Development

The code for frontend is in `/ui` directory. CueObserve uses `npm` as the package manager.

**Prerequisites:**

1. Node >= 12
2. npm >= 6

```bash
cd ui
npm install # install dependencies
npm start # start development server
```

This starts the frontend server on [http://localhost:3000/](https://reactjs.org/)
`docker-compose`'s build command will pull several components and install them on local, so this will take a few minutes to complete.

### Backend Development

The code for the backend is in `/api` directory. As mentioned in the overview it is based on Django framework.

**Prerequisite:**

1. Python 3.7
2. PostgreSQL Server running locally or on server \(Optional\)

#### Setup Virtual Environment & Install Dependencies

Setting up a virtual environment is necessary to have your python libraries for this project stored separately so that there is no conflict with other projects.

```bash
cd api
python3 -m virtualenv myenv # Create Python3 virtual environment
source myenv/bin/activate # Activate virtual environment

pip install -r requirements.txt # Install project dependencies
```

#### Configure environment variables

The environment variables required to run the backend server can be found in `api/.env.dev`. The file looks like below:
Configure environment variables as you need for the backend server :

```bash
export ENVIRONMENT=dev
Expand All @@ -84,97 +54,23 @@ export DJANGO_SUPERUSER_PASSWORD="admin"
export DJANGO_SUPERUSER_EMAIL="[email protected]"

## AUTHENTICATION
export IS_AUTHENTICATION_REQUIRED=False
export `=False
```

Change the values based on your running PostgreSQL instance. If you do not wish to use PostgreSQL as your database for development, comment lines 4-8 and CueObserve will create a SQLite database file at the location `api/db/db.sqlite3`.

After changing the values, source the file to initialize all the environment variables.

```text
source .env.dev
```

Then run the following commands to migrate the schema to your database and load static data required by CueObserve:

```bash
python manage.py migrate # Migrate db schema
python manage.py loaddata seeddata/*.json # Load seed data in database
```

After the above steps are completed successfully, we can start our backend server by running:

```text
python manage.py runserver
```

This starts the backend server on [http://localhost:8000/](https://reactjs.org/).
The backend server can be accessed on [http://localhost:8000/](https://www.djangoproject.com).

#### Celery Development

CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks.

**Starting Redis Server**

Redis server can be easily started by its official docker image.

```bash
docker run -dp 6379:6379 redis # Run redis docker on port 6379
```

#### Start Celery Beat

To start celery beat service, activate the virtual environment created for the backend server and then source the .env.dev file to export all required environment variables.

```bash
cd api
source myenv/bin/activate # Activate virtual environment
source .env.dev # Export environment variables.
celery -A app beat -l info --scheduler django_celery_beat.schedulers:DatabaseScheduler --detach # Run celery beat service
```

#### Start Celery

To start the celery service, its same as backend or celery beat, first activate the virual env created and then source .env.dev file to export all required environment variables. Celery service doesn't reloads on code changes so we have to install some additional libraries to make it happen.

```text
cd api
source myenv/bin/activate # Activate virtual environment
source .env.dev # Export environment variables
pip install watchdog pyyaml argh # Additional libraries to reload celery on code changes
watchmedo auto-restart -- celery -A app worker -l info --purge # Run celery
```

After these three services are running, you can trigger a task or wait for a scheduled task to run.

### Building Docker Image

To build the docker image, run the following command in root directory:

```text
docker build -t <YOUR_TAG_NAME> .
```

To run the built image exposed on port 3000:

```text
docker run -dp 3000:3000 <YOUR_TAG_NAME>
```
CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks.

### Testing

At the moment, we have test cases only for the backend service, test cases for UI are in our roadmap.

Backend test environment is light and doesn't depend on services like Redis, Celery or Celery-Beat, they are mocked instead. Backend for API and services is tested using [PyTest](https://docs.pytest.org/en/6.2.x/).

To run the test cases virtual environment should be activated and then source .env.dev file to export all required environment variables.
Backend for API and services is tested using [PyTest](https://docs.pytest.org/en/6.2.x/). To run test cases `exec` into cueo-backend and run command

```text
cd api
source myenv/bin/activate # Activate virtual environment
source .env.dev # Export environment variables
pytest # Run tests
```

pytest
```
16 changes: 10 additions & 6 deletions getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,26 @@

## Install via Docker-Compose

```
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
docker-compose -f cueobserve-docker-compose.yml up -d
```

**Development Mode:**

```text
```
docker-compose -f docker-compose-dev.yml up -d
```

**OR Production Mode:**

```text
```
docker-compose up -d
```

**OR** Install via Docker **\(Deprecated Method\)**
**OR** Install via Docker **(Deprecated Method)**

```text
```
docker run -p 3000:3000 cuebook/cueobserve
```

Expand All @@ -26,7 +31,7 @@ Now visit [localhost:3000](http://localhost:3000) in your browser.

Go to the Connections screen to create a connection.

![](.gitbook/assets/addconnection%20%281%29.png)
![](<.gitbook/assets/addconnection (1).png>)

## Add Dataset

Expand All @@ -41,4 +46,3 @@ Once you have created an anomaly job, click on the \`Run\` icon button to trigge
![](.gitbook/assets/anomalydefinitions.png)

Once the job is successful, go to the Anomalies screen to view your anomalies.

38 changes: 16 additions & 22 deletions installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,72 +2,66 @@

## Install via Docker

```text
docker run -p 3000:3000 cuebook/cueobserve
```
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
docker-compose -f cueobserve-docker-compose.yml up -d
```

Now visit [localhost:3000](http://localhost:3000) in your browser.

By default, CueObserve uses sqlite as its database \(not recommended for production use, please refer below to use Postgres as the database for CueObserve\). If you want data to persist across runs, specify a local folder location \(as below\) where db.sqlite3 file can be stored.

```text
docker run -v <local folder location>:/code/db -p 3000:3000 cuebook/cueobserve
```

## Use Postgres as the application database

SQLite is the default storage database for CueObserve. However, it might not be suitable for production. To use Postgres instead, do the following:

Create a `.env` file with given variables:

```text
```
POSTGRES_DB_SCHEMA=cueobserve
POSTGRES_DB_USERNAME=postgres
POSTGRES_DB_PASSWORD=postgres
POSTGRES_DB_HOST=localhost
POSTGRES_DB_PORT=5432
```

```text
docker run --env-file .env -dp 3000:3000 cuebook/cueobserve
```

In case your Postgres is hosted locally, pass the flag `--network="host"` to connect docker to the localhost of the machine.
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
docker-compose --env-file .env -f cueobserve-docker-compose.yml up -d
```

## Authentication

CueObserve comes with built-in authentication \(powered by Django\). By default authentication is disabled, to enable authentication create a `.env` file with the given variables or add these variables in the already created `.env` file with Postgres credentials.
CueObserve comes with built-in authentication (powered by Django). By default authentication is disabled, to enable authentication create a `.env` file with the given variables or add these variables in the already created `.env` file with Postgres credentials.

```text
```
DJANGO_SUPERUSER_USERNAME=<USER_NAME>
DJANGO_SUPERUSER_PASSWORD=<PASSWORD>
DJANGO_SUPERUSER_EMAIL=<[email protected]>
IS_AUTHENTICATION_REQUIRED=True
```

```text
docker run --env-file .env -dp 3000:3000 cuebook/cueobserve
```
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
docker-compose --env-file .env -f cueobserve-docker-compose.yml up -d
```

If authentication is enabled you can access the [Django Admin](https://docs.djangoproject.com/en/3.2/ref/contrib/admin/) console to do the database operations with a nice UI. To access Django Admin go to [http://localhost:3000/admin](http://localhost:3000/admin) and enter the username and password provided in the `.env` file.

## Email Notification

CueObserve comes with built-in email alert notification system\(powered by Django\). By default email notifications are disabled, to enable notifications create a `.env` file with the given variables or add these variables in the already created `.env` file.
CueObserve comes with built-in email alert notification system(powered by Django). By default email notifications are disabled, to enable notifications create a `.env` file with the given variables or add these variables in the already created `.env` file.

```text
```
EMAIL_HOST="smtp.gmail.com"
EMAIL_HOST_USER=<[email protected]>
EMAIL_HOST_PASSWORD=<YOUR_EMAIL_PASSWORD>
```

Allow less secure apps: ON for your given EMAIL\_HOST\_USER email Id, click on [enable access to less secure app](https://myaccount.google.com/lesssecureapps?pli=1&rapt=AEjHL4N7wse3vhCsvRv-aWy8kKeEGDZS2YDbW1SfTL17HVhtemi7zZW5gzbZSBnrNgknL_gPBDn3xVo0qUj-W6NuaYTSU7agQQ)
Allow less secure apps: ON for your given EMAIL_HOST_USER email Id, click on [enable access to less secure app](https://myaccount.google.com/lesssecureapps?pli=1\&rapt=AEjHL4N7wse3vhCsvRv-aWy8kKeEGDZS2YDbW1SfTL17HVhtemi7zZW5gzbZSBnrNgknL_gPBDn3xVo0qUj-W6NuaYTSU7agQQ)

Unlock Captcha for your gmail account, click on [Unlock Captcha](https://accounts.google.com/b/0/UnlockCaptcha)



## Infra Requirements

The minimum infrastructure requirement for CueObserve is _1 GB RAM/ 1 CPU_. If Multiple CPUs\(cores\) are provided, they can be utilized by tasks like Anomaly Detection & Root Cause Analysis for faster processing.

The minimum infrastructure requirement for CueObserve is _1 GB RAM/ 1 CPU_. If Multiple CPUs(cores) are provided, they can be utilized by tasks like Anomaly Detection & Root Cause Analysis for faster processing.
Loading

0 comments on commit a305178

Please sign in to comment.