📍 command-line tool for clustering geolocations.
- Uses DBSCAN or OPTICS to perform clustering.
- Outputs clustering results as
json
,txt
andgeojson
. - Creates a kepler.gl visualization of clusters.
A cluster is created when a certain number of points (defined with --size
) each are within a given distance (defined with --distance
) of at least one other point in the cluster.
Install with pip:
# with kepler.gl visualization support
pip install geoclustering[full]
# only text-based output
pip install geoclustering
If the full
install fails, you might need to install kepler.gl build dependencies:
# macos
brew install proj gdal
Usage: geoclustering [OPTIONS] FILENAME
Tool to cluster geolocations. A cluster is created when a certain number of
points (defined with --size) each are within a given distance (defined with
--distance) of at least one other point in the cluster. Input is supplied as
a csv file. At a minimum, each row needs to have a 'lat' and a 'lon' column.
Other rows are reflected to the output.
Options:
-d, --distance FLOAT (in km) Max. distance between two points in
a cluster. [required]
-s, --size INTEGER Min. number of points in a cluster.
[required]
-o, --output PATH Output directory for results. Default:
./output
-a, --algorithm [dbscan|optics]
Clustering algorithm to be used. `optics`
produces tighter clusters but is slower.
Default: dbscan
--open Open the generated visualization in the
default browser automatically.
--debug Print debug output.
--help Show this message and exit.
Inputs are supplied as a .csv
file. At a minimum, each row needs to have a lat
and a `lon`` column. Other rows are reflected to the output.
id,name,lat,lon
1,Bonnibelle Mathwen,40.1324085,64.4911086
...
If at least one cluster was found, the tool outputs a folder with output as json
, geojson
, txt
, csv
files. A kepler.gl html
file is generated as well.
Encodes an array of clusters, each containing an array of points.
[
{
"cluster_id": 0,
"points": [
{
"id": 9,
"name": "Rosanna Foggo",
"lat": -6.2074293,
"lon": 106.8915948
}
]
}
]
Encodes a single FeatureCollection
, containing all points as Feature
objects.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
106.891595,
-6.207429
]
},
"properties": {
"id": 9,
"name": "Rosanna Foggo",
"cluster_id": 0
}
}
]
}
Encodes cluster as blocks separated by a newline, where each line in a cluster block contains one point.
Cluster 0
id 9, name Rosanna Foggo, lat -6.2074293, lon 106.8915948
// ...
Encodes each event in one line with cluster_id
information associated.
cluster_id,name,lat,lon
9,Rosanna Foggo,-6.2074293,106.8915948
...
It is assumed that you are using Python3.9+. It is encouraged to setup a virtualenv for development.
# install dependencies & dev-dependencies
# PIP
pip install -e .[dev,full]
# PIPENV
pipenv install --dev -e .
# install a git hook that runs the code formatter before each commit.
pre-commit install
We use Black as our code formatter. If you don't want to use the pre-commit
hook, you can run the formatter manually or via an editor plugin.
- Update version.py
- Run
scripts/release.sh
- Confirm GH action completed successfully