Skip to content

Commit 20f5595

Browse files
authored
Merge pull request #217 from cuebook/master
GitBook: [#63] Non Rollup & Lambda docs
2 parents c63d8c7 + 7a91553 commit 20f5595

39 files changed

+143
-106
lines changed

.gitbook/assets/AddConnection (1).png

76.1 KB
Loading

.gitbook/assets/AddConnection.png

37.3 KB
Loading

.gitbook/assets/Anomalies.png

134 KB
Loading

.gitbook/assets/AnomalyCard_Daily.png

121 KB
Loading
129 KB
Loading
145 KB
Loading
180 KB
Loading
821 KB
Loading
89.4 KB
Loading
89.4 KB
Loading

.gitbook/assets/AnomalyDeviation.png

86.8 KB
Loading
60.5 KB
Loading
60.5 KB
Loading

.gitbook/assets/Dataset_SQL.png

125 KB
Loading
43.7 KB
Loading
43.7 KB
Loading

.gitbook/assets/MinAvgValue.png

11.6 KB
Loading

.gitbook/assets/MinContribution.png

11.9 KB
Loading

.gitbook/assets/Overview.gif

149 KB
Loading
84.6 KB
Loading

.gitbook/assets/Overview_Anomaly.png

84.6 KB
Loading

.gitbook/assets/Overview_RCA (1).png

106 KB
Loading

.gitbook/assets/Overview_RCA.png

106 KB
Loading

.gitbook/assets/RCA_Analyze.png

103 KB
Loading

.gitbook/assets/RCA_Logs.png

39.1 KB
Loading

.gitbook/assets/RCA_Result.png

110 KB
Loading
Loading

.gitbook/assets/TopN.png

10.7 KB
Loading

.gitbook/assets/cueObserve.png

18.1 KB
Loading

.gitbook/assets/new.png

4.1 KB
Loading

README.md

+38-43
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,73 @@
1-
<p align="center">
2-
<a href="https://cueobserve.cuebook.ai" target="_blank">
3-
<img alt="CueObserve Logo" width="300" src="docs/images/cueObserve.png">
4-
</a>
5-
</p>
6-
<p align="center">
7-
<a href="https://codeclimate.com/github/cuebook/CueObserve/maintainability"><img src="https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/maintainability" /></a>
8-
<a href="https://codeclimate.com/github/cuebook/CueObserve/test_coverage"><img src="https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/test_coverage" /></a>
9-
<a href="https://github.com/cuebook/cueobserve/actions/workflows/pr_checks.yml">
10-
<img src="https://github.com/cuebook/cueobserve/actions/workflows/pr_checks.yml/badge.svg" alt="Test Coverage">
11-
</a>
12-
<a href="https://github.com/cuebook/cueobserve/blob/main/LICENSE.md">
13-
<img src="https://img.shields.io/github/license/cuebook/cueobserve" alt="License">
14-
</a>
15-
</p>
16-
<br>
1+
# Overview
2+
3+
[![CueObserve Logo](.gitbook/assets/cueObserve.png)](https://cueobserve.cuebook.ai)
4+
5+
[![](https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/maintainability)](https://codeclimate.com/github/cuebook/CueObserve/maintainability) [![](https://api.codeclimate.com/v1/badges/a70e071b59d5dbc38846/test\_coverage)](https://codeclimate.com/github/cuebook/CueObserve/test\_coverage) [![Test Coverage](https://github.com/cuebook/cueobserve/actions/workflows/pr\_checks.yml/badge.svg) ](https://github.com/cuebook/cueobserve/actions/workflows/pr\_checks.yml)[![License](https://img.shields.io/github/license/cuebook/cueobserve)](https://github.com/cuebook/cueobserve/blob/main/LICENSE.md)
6+
7+
178

189
CueObserve helps you monitor your metrics. Know when, where, and why a metric isn't right.
1910

2011
CueObserve uses **timeseries Anomaly detection** to find **where** and **when** a metric isn't right. It then offers **one-click Root Cause analysis** so that you know **why** a metric isn't right.
2112

2213
CueObserve works with data in your SQL data warehouses and databases. It currently supports Snowflake, BigQuery, Redshift, Druid, Postgres, MySQL, SQL Server and ClickHouse.
2314

15+
![CueObserve Anomaly](<.gitbook/assets/Overview\_Anomaly (1).png>) ![CueObserve RCA](<.gitbook/assets/Overview\_RCA (1).png>)
2416

25-
![CueObserve Anomaly](docs/images/Overview_Anomaly.png)
26-
![CueObserve RCA](docs/images/Overview_RCA.png)
17+
### Getting Started
2718

28-
29-
## Getting Started
3019
Install via Docker
3120

3221
```
3322
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
3423
docker-compose -f cueobserve-docker-compose.yml up -d
3524
```
36-
Now visit [http://localhost:3000](http://localhost:3000) in your browser.
3725

38-
## Demo Video
39-
<a href="http://www.youtube.com/watch?feature=player_embedded&v=VZvgNa65GQU" target="_blank">
40-
<img src="http://img.youtube.com/vi/VZvgNa65GQU/hqdefault.jpg" alt="Watch CueObserve video"/>
41-
</a>
26+
Now visit [http://localhost:3000](http://localhost:3000) in your browser.
27+
28+
### Demo Video
29+
30+
[![Watch CueObserve video](http://img.youtube.com/vi/VZvgNa65GQU/hqdefault.jpg)](http://www.youtube.com/watch?feature=player\_embedded\&v=VZvgNa65GQU)
31+
32+
### How it works
4233

43-
## How it works
4434
You write a SQL GROUP BY query, map its columns as dimensions and measures, and save it as a virtual Dataset.
4535

46-
![Dataset SQL](docs/images/Dataset_SQL_cropped.png)
36+
![Dataset SQL](<.gitbook/assets/Dataset\_SQL\_cropped (1).png>)
4737

48-
![Dataset Schema Map](docs/images/Dataset_Mapping_cropped.png)
38+
![Dataset Schema Map](<.gitbook/assets/Dataset\_Mapping\_cropped (1).png>)
4939

5040
You then define one or more anomaly detection jobs on the dataset.
5141

52-
![Anomaly Definition](docs/images/AnomalyDefinitions.png)
42+
![Anomaly Definition](<.gitbook/assets/AnomalyDefinitions (1).png>)
5343

5444
When an anomaly detection job runs, CueObserve does the following:
45+
5546
1. Executes the SQL GROUP BY query on your data warehouse and stores the result as a Pandas dataframe.
5647
2. Generates one or more timeseries from the dataframe, as defined in your anomaly detection job.
5748
3. Generates a forecast for each timeseries using [Prophet](https://github.com/facebook/prophet).
5849
4. Creates a visual card for each timeseries. Marks the card as an anomaly if the last data point is anomalous.
5950

60-
## Features
61-
- Automated SQL to timeseries transformation.
62-
- Run anomaly detection on the aggregate metric or split it by any dimension. Limit the split to significant dimension values.
63-
- Use Prophet or simple mathematical rules to detect anomalies.
64-
- In-built Scheduler. CueObserve uses Celery as the executor and celery-beat as the scheduler.
65-
- Slack alerts when anomalies are detected.
66-
- Monitoring. Slack alert when a job fails. CueObserve maintains detailed logs.
51+
### Features
52+
53+
* Automated SQL to timeseries transformation.
54+
* Run anomaly detection on the aggregate metric or split it by any dimension. Limit the split to significant dimension values.
55+
* Use Prophet or simple mathematical rules to detect anomalies.
56+
* In-built Scheduler. CueObserve uses Celery as the executor and celery-beat as the scheduler.
57+
* Slack alerts when anomalies are detected.
58+
* Monitoring. Slack alert when a job fails. CueObserve maintains detailed logs.
6759

68-
### Limitations
69-
- Currently supports Prophet for timeseries forecasting.
70-
- Not being built for real-time anomaly detection on streaming data.
60+
#### Limitations
7161

72-
## Support
73-
For general help using CueObserve, read the [documentation](https://cueobserve.cuebook.ai/), or go to [Github Discussions](https://github.com/cuebook/cueobserve/discussions).
62+
* Currently supports Prophet for timeseries forecasting.
63+
* Not being built for real-time anomaly detection on streaming data.
64+
65+
### Support
66+
67+
For general help using CueObserve, read the [documentation](https://cueobserve.cuebook.ai), or go to [Github Discussions](https://github.com/cuebook/cueobserve/discussions).
7468

7569
To report a bug or request a feature, open an [issue](https://github.com/cuebook/cueobserve/issues).
7670

77-
## Contributing
78-
We'd love contributions to CueObserve. Before you contribute, please first discuss the change you wish to make via an [issue](https://github.com/cuebook/cueobserve/issues) or a [discussion](https://github.com/cuebook/cueobserve/discussions). Contributors are expected to adhere to our [code of conduct](https://github.com/cuebook/cueobserve/blob/main/CODE_OF_CONDUCT.md).
71+
### Contributing
72+
73+
We'd love contributions to CueObserve. Before you contribute, please first discuss the change you wish to make via an [issue](https://github.com/cuebook/cueobserve/issues) or a [discussion](https://github.com/cuebook/cueobserve/discussions). Contributors are expected to adhere to our [code of conduct](https://github.com/cuebook/cueobserve/blob/main/CODE\_OF\_CONDUCT.md).

anomalies.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
Anomalies screen lists all published anomalies. Click on a row to view its anomaly card.
44

5-
![](.gitbook/assets/anomalies.png)
5+
![](.gitbook/assets/Anomalies.png)
66

77
Daily anomalies automatically unpublish if there's no anomaly for the next 5 days. Hourly anomalies unpublish after 1 day.
88

99
## Anomaly Cards
1010

1111
Anomaly cards follow a template. If you want, you can modify the templates.
1212

13-
![Hourly Anomaly card](.gitbook/assets/anomalycard_hourly_cropped.png)
13+
![Hourly Anomaly card](.gitbook/assets/AnomalyCard\_Hourly\_cropped.png)
1414

15-
![Daily Anomaly card](.gitbook/assets/anomalycard_daily_cropped.png)
15+
![Daily Anomaly card](.gitbook/assets/AnomalyCard\_Daily\_cropped.png)

anomaly-definitions.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
You can define one or more anomaly detection jobs on a dataset. The anomaly detection job can monitor a measure at an aggregate level or split the measure by a dimension.
44

5-
To define an anomaly job, you
5+
To define an anomaly job, you&#x20;
66

77
1. Select a dataset
88
2. Select a measure from the dataset
99
3. Select a dimension to split the measure _(optional)_
1010
4. Select an anomaly rule
1111

12-
![](.gitbook/assets/anomalydefinitions.png)
12+
![](.gitbook/assets/AnomalyDefinitions.png)
1313

1414
## Split Measure by Dimension
1515

@@ -19,7 +19,7 @@ To split a measure by a dimension, select the dimension and then limit the numbe
1919

2020
Choose the optional **High/Low** to detect only one type of anomalies. Choose **High** for an increase in measure or **Low** for a drop in measure.
2121

22-
![](.gitbook/assets/anomalydefinition_cuel.gif)
22+
![](.gitbook/assets/AnomalyDefinition\_CueL.gif)
2323

2424
### Limit Dimension Values
2525

@@ -31,21 +31,21 @@ Top N limits the number of dimension values based on the dimension value's contr
3131

3232
Say you want to monitor Orders measure. But you want to monitor it for your top 10 states only. You would then define anomaly something like below:
3333

34-
![](.gitbook/assets/topn.png)
34+
![](.gitbook/assets/TopN.png)
3535

3636
#### Min % Contribution
3737

3838
Minimum % Contribution limits the number of dimension values based on the dimension value's contribution to the measure.
3939

4040
Say you want to monitor Orders measure for every state that contributed at least 2% to the total Orders, your anomaly definition would look something like below:
4141

42-
![](.gitbook/assets/mincontribution.png)
42+
![](.gitbook/assets/MinContribution.png)
4343

4444
#### Min Avg Value
4545

4646
Minimum Average Value limits the number of dimension values based on the measure's average value.
4747

48-
![](.gitbook/assets/minavgvalue.png)
48+
![](.gitbook/assets/MinAvgValue.png)
4949

5050
In the example above, only states where _average(Orders) >= 10_ will be selected. If your granularity is daily, this means daily average orders. If your granularity is hourly, this means hourly average orders.
5151

@@ -64,7 +64,7 @@ This algorithm uses the open-source [Prophet](https://github.com/facebook/prophe
6464

6565
The metric's percentage deviation (_45% in the image below_) is calculated with respect to the threshold of the forecast's confidence range.
6666

67-
![](.gitbook/assets/anomalydeviation.png)
67+
![](.gitbook/assets/AnomalyDeviation.png)
6868

6969
### Percentage Change
7070

datasets.md

+13-2
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,26 @@
22

33
Datasets are similar to aggregated SQL VIEWS of your data. When you run an anomaly detection job, the associated dataset's SQL query is run and the results are stored as a Pandas dataframe in memory.
44

5-
![](.gitbook/assets/dataset_sql.png)
5+
![](.gitbook/assets/Dataset\_SQL.png)
66

77
You write a SQL GROUP BY query with aggregate functions to roll-up your data. You then map the columns as dimensions or measures.
88

9-
![](.gitbook/assets/dataset_mapping_cropped.png)
9+
![](.gitbook/assets/Dataset\_Mapping\_cropped.png)
1010

1111
1. Dataset must have only one timestamp column. This timestamp column is used to generate timeseries data for anomaly detection.
1212
2. Dataset must have at least one aggregate column. CueObserve currently supports only COUNT or SUM as aggregate functions. Aggregate columns must be mapped as measures.
1313
3. Dataset can have one or more dimension columns (optional).
14+
4. Dataset can be classified as a non-rollup dataset, details are provided below.
15+
16+
### **Non-Rollup Datasets**
17+
18+
A dataset can be created as a non-rollup dataset using a switch to inform the system that it does not need to roll up aggregate the data during the pre-processing of the data.
19+
20+
![Non Roll-up switch](.gitbook/assets/new.png)
21+
22+
By default, all datasets are "rolled up" i.e. metric data points are aggregated(summed up) on the timestamp buckets for a specific dimension value.
23+
24+
But for metrics like percentage etc. such aggregation might not be relevant, so one can specify to the system that it is a non-rollup dataset. Currently we support only single dimension on Non-rollup datasets to avoid duplicate timestamp values after pre-processing.
1425

1526
## SQL GROUP BY Query
1627

development.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ description: >-
1212
CueObserve has multi-service architecture, with services as mentioned:
1313

1414
1. `Frontend` single-page application written on [ReactJS](https://reactjs.org). It's code can be found in `ui` folder and runs on [http://localhost:3000/](https://reactjs.org).
15-
2. `API` is based on [Django](https://www.djangoproject.com) (python framework) & uses REST API. It is the main service, responsible for connections, authentication and anomaly.
15+
2. `API` is based on [Django](https://www.djangoproject.com) (python framework) & uses REST API. It is the main service, responsible for connections, authentication and anomaly.&#x20;
1616
3. `Alerts` micro-service, currently responsible for sending alerting/notifications only to slack. It's code is in `alerts-api` folder and runs on [localhost:8100](http://localhost:8100).
1717
4. [Celery](https://docs.celeryproject.org) to execute the tasks asynchronously. Tasks like anomaly detection are handled by Celery.
1818
5. [Celery beat](https://docs.celeryproject.org/en/stable/userguide/periodic-tasks.html) scheduler to trigger the scheduled tasks.
@@ -25,14 +25,14 @@ Get the code by cloning our open source [github repo](https://github.com/cuebook
2525
```
2626
git clone https://github.com/cuebook/CueObserve.git
2727
cd CueObserve
28-
docker-compose -f docker-compose-dev.yml --env-file .env up --build
28+
docker-compose -f docker-compose-dev.yml --env-file .env.dev up --build
2929
```
3030

3131
`docker-compose`'s build command will pull several components and install them on local, so this will take a few minutes to complete.
3232

3333
### Backend Development
3434

35-
The code for the backend is in `/api` directory. As mentioned in the overview it is based on Django framework.
35+
The code for the backend is in `/api` directory. As mentioned in the overview it is based on Django framework.&#x20;
3636

3737
#### Configure environment variables
3838

@@ -57,17 +57,17 @@ export DJANGO_SUPERUSER_EMAIL="[email protected]"
5757
export `=False
5858
```
5959

60-
Change the values based on your running PostgreSQL instance. If you do not wish to use PostgreSQL as your database for development, comment lines 4-8 and CueObserve will create a SQLite database file at the location `api/db/db.sqlite3`.
60+
Change the values based on your running PostgreSQL instance. If you do not wish to use PostgreSQL as your database for development, comment lines 4-8 and CueObserve will create a SQLite database file at the location `api/db/db.sqlite3`.&#x20;
6161

62-
The backend server can be accessed on [http://localhost:8000/](https://www.djangoproject.com).
62+
The backend server can be accessed on [http://localhost:8000/](https://www.djangoproject.com).&#x20;
6363

64-
#### Celery Development
64+
#### Celery Development&#x20;
6565

66-
CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks.
66+
CueObserve uses Celery for executing asynchronous tasks like anomaly detection. There are three components needed to run an asynchronous task, i.e. Redis, Celery and Celery Beat. Redis is used as the message queue by Celery, so before starting Celery services, Redis server should be running. Celery Beat is used as the scheduler and is responsible to trigger the scheduled tasks. Celery workers are used to execute the tasks.&#x20;
6767

6868
### Testing
6969

70-
At the moment, we have test cases only for the backend service, test cases for UI are in our roadmap.
70+
At the moment, we have test cases only for the backend service, test cases for UI are in our roadmap.&#x20;
7171

7272
Backend for API and services is tested using [PyTest](https://docs.pytest.org/en/6.2.x/). To run test cases `exec` into cueo-backend and run command
7373

getting-started.md

+8-20
Original file line numberDiff line numberDiff line change
@@ -3,35 +3,23 @@
33
## Install via Docker-Compose
44

55
```
6-
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose.yml -q -O cueobserve-docker-compose.yml
7-
docker-compose -f cueobserve-docker-compose.yml up -d
6+
mkdir -p ~/cuebook
7+
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/docker-compose-prod.yml -q -O ~/cuebook/docker-compose-prod.yml
8+
wget https://raw.githubusercontent.com/cuebook/CueObserve/latest_release/.env -q -O ~/cuebook/.env
9+
cd ~/cuebook
810
```
911

10-
**Development Mode:**
11-
12-
```
13-
docker-compose -f docker-compose-dev.yml up -d
14-
```
15-
16-
**OR Production Mode:**
17-
18-
```
19-
docker-compose up -d
20-
```
21-
22-
**OR** Install via Docker **(Deprecated Method)**
23-
2412
```
25-
docker run -p 3000:3000 cuebook/cueobserve
13+
docker-compose -f docker-compose-prod.yml --env-file .env up -d
2614
```
2715

28-
Now visit [localhost:3000](http://localhost:3000) in your browser.
16+
Now visit [localhost:3000](http://localhost:3000) in your browser.&#x20;
2917

3018
## Add Connection
3119

3220
Go to the Connections screen to create a connection.
3321

34-
![](<.gitbook/assets/addconnection (1).png>)
22+
![](<.gitbook/assets/AddConnection (1).png>)
3523

3624
## Add Dataset
3725

@@ -43,6 +31,6 @@ Create an anomaly detection job on your dataset. See [Anomaly Definitions](anoma
4331

4432
Once you have created an anomaly job, click on the \`Run\` icon button to trigger the anomaly job. It might take a few seconds for the job to execute.
4533

46-
![](.gitbook/assets/anomalydefinitions.png)
34+
![](.gitbook/assets/AnomalyDefinitions.png)
4735

4836
Once the job is successful, go to the Anomalies screen to view your anomalies.

0 commit comments

Comments
 (0)