Skip to content

Commit 6d01dc0

Browse files
author
Bapt Abl
committed
Update readme
1 parent ae3b752 commit 6d01dc0

File tree

1 file changed

+164
-61
lines changed

1 file changed

+164
-61
lines changed

README.md

Lines changed: 164 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,48 @@
1-
Elasticsearch Geo Point clustering aggregation plugin
2-
=====================================================
1+
## Elasticsearch Geo Point clustering aggregation plugin
32

4-
This aggregations computes a geohash precision from a `zoom` and a `distance` (in pixel).
5-
It groups points (from `field` parameter) into buckets that represent geohash cells and computes each bucket's center.
6-
Then it merges these cells if the distance between two clusters' centers is lower than the `distance` parameter.
3+
This plugin extends Elasticsearch with a `geo_point_clustering` aggregation, allowing to fetch [geo_point](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/geo-point.html) documents as clusters of points.
4+
It is very similar to what is done with the official [geohash_grid aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/search-aggregations-bucket-geohashgrid-aggregation.html) except that final clusters are not bound to the geohash grid.
75

6+
For example, at zoom level 1 with points across France, `geohash_grid` agg will output 3 clusters stuck to geohash cells u, e, s, while `geo_point_clustering` will merge these clusters into one.
7+
This is done during the reduce phase.
8+
9+
Contrary to `geohash_grid` aggregation, buckets keys are a tuple(centroid, geohash cells) instead of geohash cells only, because one cluster can be linked to several geohash cells, due to the cluster merge process during the reduce phase.
10+
11+
Please note that [geo_shape data type](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/geo-shape.html) is not supported.
12+
13+
14+
## Usage
15+
### Install
16+
17+
Install plugin with:
18+
`./bin/elasticsearch-plugin install https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.10.2.0/geopoint-clustering-aggregation-7.10.2.0.zip`
19+
20+
The first 3 digits of plugin version is Elasticsearch versioning. The last digit is used for plugin versioning under an elasticsearch version.
21+
22+
Available releases:
23+
| elasticsearch version | plugin version | plugin url |
24+
| --------------------- | -------------- | ---------- |
25+
| 6.0.1 | 6.0.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.0.1.0/geopoint-clustering-aggregation-6.0.1.0.zip|
26+
| 6.1.4 | 6.1.4.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.1.4.0/geopoint-clustering-aggregation-6.1.4.0.zip|
27+
| 6.2.4 | 6.2.4.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.2.4.0/geopoint-clustering-aggregation-6.2.4.0.zip|
28+
| 6.3.2 | 6.3.2.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.3.2.0/geopoint-clustering-aggregation-6.3.2.0.zip|
29+
| 6.4.3 | 6.4.3.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.4.3.0/geopoint-clustering-aggregation-6.4.3.0.zip|
30+
| 6.5.4 | 6.5.4.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.5.4.0/geopoint-clustering-aggregation-6.5.4.0.zip|
31+
| 6.6.2 | 6.6.2.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.6.2.0/geopoint-clustering-aggregation-6.6.2.0.zip|
32+
| 6.7.1 | 6.7.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.7.1.0/geopoint-clustering-aggregation-6.7.1.0.zip|
33+
| 6.8.2 | 6.8.2.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.8.2.0/geopoint-clustering-aggregation-6.8.2.0.zip|
34+
| 7.0.1 | 7.0.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.0.1.0/geopoint-clustering-aggregation-7.0.1.0.zip|
35+
| 7.1.1 | 7.1.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.1.1.0/geopoint-clustering-aggregation-7.1.1.0.zip|
36+
| 7.2.0 | 7.2.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.2.0.0/geopoint-clustering-aggregation-7.2.0.0.zip|
37+
| 7.4.0 | 7.4.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.4.0.0/geopoint-clustering-aggregation-7.4.0.0.zip|
38+
| 7.5.1 | 7.5.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.5.1.0/geopoint-clustering-aggregation-7.5.1.0.zip|
39+
| 7.6.0 | 7.6.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.6.0.0/geopoint-clustering-aggregation-7.6.0.0.zip|
40+
| 7.7.0 | 7.7.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.7.0.0/geopoint-clustering-aggregation-7.7.0.0.zip|
41+
42+
43+
44+
### Quickstart
45+
#### Intro
846
```json
947
{
1048
"aggregations": {
@@ -18,76 +56,141 @@ Then it merges these cells if the distance between two clusters' centers is lowe
1856
}
1957
```
2058
Input parameters :
21-
- `field`: must be of type geo_point.
22-
- `zoom`: mandatory integer parameter between 0 and 20. It represents the zoom level used in the request to aggregate geo points.
23-
- `radius`: radius in pixel. It is used to compute a geohash precision then to merge cluters based on this distance. Default to `50`.
24-
- `ratio`: ratio used to make a second merging pass. If the value is `0`, no second pass is made. Default to `2`.
25-
- `extent`: Extent of the tiles. Default to `256`
59+
- `field`: must be of type [geo_point](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/geo-point.html)
60+
- `zoom`: mandatory integer parameter between 0 and 25. It represents the zoom level used in the request to aggregate geo points
61+
- `radius`: radius in pixel. It is used during the reduce phase to merge close clusters. Default to `40`
62+
- `ratio`: ratio used to make a second merging pass during the reduce phase. If the value is `0`, no second pass is made. Default to `0`
63+
- `extent`: Extent of the tiles. Default to `256`
2664

27-
For example :
2865

66+
#### Real-life example
67+
68+
Create an index:
2969
```json
70+
PUT test
3071
{
31-
"aggregations" : {
32-
"my_cluster_aggregation" : {
33-
"geo_point_clustering": {
34-
"field": "geo_point",
35-
"zoom": 1,
36-
"radius": 50
37-
}
38-
}
72+
"settings": {
73+
"number_of_shards": 1,
74+
"number_of_replicas": 0
75+
},
76+
"mappings": {
77+
"properties": {
78+
"location": {
79+
"type": "geo_point"
80+
}
3981
}
82+
}
4083
}
4184
```
4285

86+
Push some points:
87+
```json
88+
POST test/_bulk?refresh
89+
{"index":{"_id":1}}
90+
{"location":[2.454929, 48.821578]}
91+
{"index":{"_id":2}}
92+
{"location":[2.245858, 48.86914]}
93+
{"index":{"_id":3}}
94+
{"location":[2.240358, 48.863481]}
95+
{"index":{"_id":4}}
96+
{"location":[2.25292, 48.847176]}
97+
{"index":{"_id":5}}
98+
{"location":[2.279111, 48.872383]}
99+
{"index":{"_id":6}}
100+
{"location":[2.336267, 48.822021]}
101+
{"index":{"_id":7}}
102+
{"location":[2.338677, 48.822672]}
103+
{"index":{"_id":8}}
104+
{"location":[2.336643, 48.822493]}
105+
{"index":{"_id":9}}
106+
{"location":[2.438465, 48.84204]}
107+
{"index":{"_id":10}}
108+
{"location":[2.381554, 48.835382]}
109+
{"index":{"_id":11}}
110+
{"location":[2.407744, 48.83733]}
111+
{"index":{"_id":12}}
112+
{"location":[2.34521, 48.849358]}
113+
{"index":{"_id":13}}
114+
{"location":[2.252938, 48.846041]}
115+
{"index":{"_id":14}}
116+
{"location":[2.279715, 48.871775]}
117+
{"index":{"_id":15}}
118+
{"location":[2.380629, 48.879757]}
119+
```
120+
121+
Perform an aggregation:
43122
```json
123+
POST test/_search?size=0
44124
{
45-
"aggregations": {
46-
"my_cluster_aggregation": {
47-
"buckets": [
48-
{
49-
"geohash_grids": [
50-
"u0"
51-
],
52-
"doc_count": 90293,
53-
"centroid": {
54-
"lat": 48.8468417795375,
55-
"lon": 2.331401154398918
56-
}
57-
}
58-
]
59-
}
60-
}
125+
"aggregations": {
126+
"clusters": {
127+
"geo_point_clustering": {
128+
"field": "location",
129+
"zoom": 9
130+
}}}}
131+
```
61132

62-
}
133+
Result:
134+
```json
135+
"aggregations" : {
136+
"clusters" : {
137+
"buckets" : [
138+
{
139+
"geohash_grids" : [
140+
"u09wn",
141+
"u09tz",
142+
"u09ty",
143+
"u09tx",
144+
"u09tv",
145+
"u09tt"
146+
],
147+
"doc_count" : 9,
148+
"centroid" : {
149+
"lat" : 48.83695897646248,
150+
"lon" : 2.380013056099415
151+
}
152+
},
153+
{
154+
"geohash_grids" : [
155+
"u09w5",
156+
"u09tg",
157+
"u09tf"
158+
],
159+
"doc_count" : 6,
160+
"centroid" : {
161+
"lat" : 48.86166598415002,
162+
"lon" : 2.258483301848173
163+
}
164+
}
165+
]
166+
}
63167
```
64168

65-
Installation
66-
------------
67169

68-
Plugin versions are available for (at least) all minor versions of Elasticsearch since 6.0.
170+
## Development environment setup
171+
### Build
69172

70-
The first 3 digits of plugin version is Elasticsearch versioning. The last digit is used for plugin versioning under an elasticsearch version.
173+
Requires Java 14 or 15.
174+
Requires Gradle 6.6.1 (but you should use the packaged gradlew included in this repo anyway).
71175

72-
To install it, launch this command in Elasticsearch directory replacing the url by the correct link for your Elasticsearch version (see table)
73-
`./bin/elasticsearch-plugin install https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.7.0.0/geopoint-clustering-aggregation-7.7.0.0.zip`
176+
### Development Environment Setup
74177

75-
| elasticsearch version | plugin version | plugin url |
76-
| --------------------- | -------------- | ---------- |
77-
| 6.0.1 | 6.0.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.0.1.0/geopoint-clustering-aggregation-6.0.1.0.zip|
78-
| 6.1.4 | 6.1.4.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.1.4.0/geopoint-clustering-aggregation-6.1.4.0.zip|
79-
| 6.2.4 | 6.2.4.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.2.4.0/geopoint-clustering-aggregation-6.2.4.0.zip|
80-
| 6.3.2 | 6.3.2.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.3.2.0/geopoint-clustering-aggregation-6.3.2.0.zip|
81-
| 6.4.3 | 6.4.3.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.4.3.0/geopoint-clustering-aggregation-6.4.3.0.zip|
82-
| 6.5.4 | 6.5.4.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.5.4.0/geopoint-clustering-aggregation-6.5.4.0.zip|
83-
| 6.6.2 | 6.6.2.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.6.2.0/geopoint-clustering-aggregation-6.6.2.0.zip|
84-
| 6.7.1 | 6.7.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.7.1.0/geopoint-clustering-aggregation-6.7.1.0.zip|
85-
| 6.8.2 | 6.8.2.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v6.8.2.0/geopoint-clustering-aggregation-6.8.2.0.zip|
86-
| 7.0.1 | 7.0.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.0.1.0/geopoint-clustering-aggregation-7.0.1.0.zip|
87-
| 7.1.1 | 7.1.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.1.1.0/geopoint-clustering-aggregation-7.1.1.0.zip|
88-
| 7.2.0 | 7.2.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.2.0.0/geopoint-clustering-aggregation-7.2.0.0.zip|
89-
| 7.4.0 | 7.4.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.4.0.0/geopoint-clustering-aggregation-7.4.0.0.zip|
90-
| 7.5.1 | 7.5.1.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.5.1.0/geopoint-clustering-aggregation-7.5.1.0.zip|
91-
| 7.6.0 | 7.6.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.6.0.0/geopoint-clustering-aggregation-7.6.0.0.zip|
92-
| 7.7.0 | 7.7.0.0 | https://github.com/opendatasoft/elasticsearch-aggregation-geoclustering/releases/download/v7.7.0.0/geopoint-clustering-aggregation-7.7.0.0.zip|
93-
178+
Build the plugin using gradle:
179+
``` shell
180+
./gradlew build
181+
```
182+
183+
or
184+
``` shell
185+
./gradlew assemble # (to avoid the test suite)
186+
```
187+
188+
Then the following command will start a dockerized ES and will install the previously built plugin:
189+
``` shell
190+
docker-compose up
191+
```
192+
193+
Please be careful during development: you'll need to manually rebuild the .zip using `./gradlew build` on each code
194+
change before running `docker-compose` up again.
195+
196+
> NOTE: In `docker-compose.yml` you can uncomment the debug env and attach a REMOTE JVM on `*:5005` to debug the plugin.

0 commit comments

Comments
 (0)