diff --git a/docs-overrides/main.html b/docs-overrides/main.html index f7d1cd6462..48ab95c1bd 100644 --- a/docs-overrides/main.html +++ b/docs-overrides/main.html @@ -165,7 +165,7 @@

Install SedonaDB, or run Sedona on distributed systems when you need additional scale.
- + Install SedonaDB @@ -176,6 +176,130 @@

+ +
+
+

Deploy Sedona where you need it

+
Choose the right runtime for your infrastructure, from local setups to distributed and cloud-native systems.
+
+ + + +
+
+ +
+ Batch +
+
+

SedonaSpark

+
Distributed batch processing on Apache Spark clusters.
+ +
+ +
+
+ +
+ Streaming +
+
+

SedonaFlink

+
Real-time spatial analytics using Apache Flink.
+ +
+ +
+
+ +
+ Batch +
+
+

SedonaSnow

+
+ Native spatial support inside Snowflake environments. +
+ +
+ +
+
+ +
+ Cloud +
+
+

Sedona in the Cloud

+
Integrated spatial support in your preferred cloud environment
+ +
+ +
+
+
+ +
@@ -378,128 +502,6 @@

Familiar

--> - -
-
-

Deploy Sedona where you need it

-
Choose the right runtime for your infrastructure, from local setups to distributed and cloud-native systems.
-
- -
-
- -
- Local -
-
-

SedonaDB

-
Standalone runtime for local processing and development.
- -
- -
-
- -
- Batch -
-
-

SedonaSpark

-
Distributed batch processing on Apache Spark clusters.
- -
- -
-
- -
- Streaming -
-
-

SedonaFlink

-
Real-time spatial analytics using Apache Flink.
- -
- -
-
- -
- Batch -
-
-

SedonaSnow

-
- Native spatial support inside Snowflake environments. -
- -
- -
-
- -
- Cloud -
-
-

Sedona in the Cloud

-
Integrated spatial support in your preferred cloud environment
- -
- -
-
-
diff --git a/docs/image/nyc_base_water.png b/docs/image/nyc_base_water.png new file mode 100644 index 0000000000..7cea91b9bd Binary files /dev/null and b/docs/image/nyc_base_water.png differ diff --git a/docs/sedonaflink.md b/docs/sedonaflink.md new file mode 100644 index 0000000000..a555aad5f4 --- /dev/null +++ b/docs/sedonaflink.md @@ -0,0 +1,111 @@ + + +# SedonaFlink + +SedonaFlink integrates geospatial functions into Apache Flink, making it an excellent option for streaming pipelines that utilize geospatial data. + +Here are some example SedonaFlink use cases: + +* Read geospatial data from Kafka and write to Iceberg +* [Analyze real-time traffic density](https://www.alibabacloud.com/help/en/flink/realtime-flink/use-cases/analyze-traffic-density-with-flink-and-apache-sedona) +* Real-time network planning and optimization for telecommunication + +Here are some example code snippets: + +=== "Java" + + ```java + sedona.createTemporaryView("myTable", tbl) + Table geomTbl = sedona.sqlQuery("SELECT ST_GeomFromWKT(geom_polygon) as geom_polygon, name_polygon FROM myTable") + geomTbl.execute().print() + ``` + +=== "PyFlink" + + ```python + table_env.sql_query("SELECT ST_ASBinary(ST_Point(1.0, 2.0))").execute().collect() + ``` + +## Key features + +* **Real-time geospatial stream processing** for low-latency processing needs. +* **Scalable** processing suitable for large streaming pipelines. +* **Event time processing** with Flink’s time-windowing. +* **Exactly once** processing guarantees. +* **Portable** and easy to run in any Flink runtime. +* **Open source** and managed according to the Apache Software Foundation's guidelines. + +## Why Sedona on Flink? + +Flink is built for streaming data, and Sedona enhances it with geospatial functionality. + +Most geospatial processing occurs in batch systems such as Spark or PostGIS, which is fine for lower-latency use cases. + +Sedona on Flink shines when you need to process geospatial data in real-time. + +Flink can deliver millisecond-level latency for geospatial queries. + +Flink has solid fault tolerance, so your geospatial pipelines won't lose data, even when things break. + +Sedona on Flink runs anywhere Flink runs, including Kubernetes, YARN, and standalone clusters. + +## How It Works + +Sedona integrates directly into Flink's Table API and SQL engine. + +You register Sedona's spatial functions when you set up your Flink environment. Then, you can use functions such as `ST_Point`, `ST_Contains`, and `ST_Distance` in your SQL queries. + +Sedona works with both Flink's DataStream API and Table API. Use whichever fits your workflow. + +The spatial operations run as part of Flink's distributed execution, so your geospatial computations are automatically parallelized across your cluster. + +Sedona stores geometries as binary data in Flink's internal format. This keeps memory usage low and processing fast. + +When you perform spatial joins, Sedona utilizes spatial indexing under the hood, enabling it to execute queries quickly. + +Flink's checkpointing system handles fault tolerance. If a node crashes, your geospatial state is restored from the last checkpoint. + +You read geospatial data from sources such as Kafka or file systems, process it using Sedona's spatial functions, and write the results to sinks such as Iceberg. + +The entire SedonaFlink pipeline runs continuously, allowing new events to flow through your spatial transformations in real-time. + +## Comparison with alternatives + +For small datasets, you may not need a distributed cluster and can use SedonaDB. + +For large batch pipelines, you can use SedonaSpark. + +Here are some direct comparisons of SedonaFlink vs. streaming alternatives. + +**SedonaFlink vs. Sedona on Spark Structured Streaming** + +Spark Streaming uses micro-batches, whereas Flink processes events one at a time. This can provide Flink with lower latency for some workflows. + +Flink's state management is also more sophisticated. + +Use Spark if you're already invested in the Spark ecosystem and the Spark Structured Streaming latency is sufficiently low for your use case. Use Flink if you need very low latency. + +**Sedona on Flink vs. PostGIS** + +PostGIS is great for storing and querying geospatial data for OLTP workflows. But it's not built for streaming. + +If you use PostGIS for streaming workflows, you need to constantly query the database from your stream processor, which adds latency and puts load on your database. + +SedonaFlink processes geospatial data in-flight, eliminating the need for database round-trips. diff --git a/docs/sedonasnow.md b/docs/sedonasnow.md new file mode 100644 index 0000000000..28a5021ed5 --- /dev/null +++ b/docs/sedonasnow.md @@ -0,0 +1,55 @@ + + +# SedonaSnow + +SedonaSnow brings 200+ Apache Sedona geospatial functions directly into your Snowflake environment to complement the native Snowflake spatial functions. + +## Key Advantages + +* **200+ spatial functions**: Such as 3D distance, geometry validation, precision reduction +* **Fast spatial joins**: Sedona has special optimizations for performant spatial joins +* **Seamless integration**: Works alongside Snowflake's native functions +* **No data movement**: Everything stays in Snowflake + +## Get Started + +Here’s an example of how to run some queries on Snowflake tables with SedonaSnow. + +```sql +USE DATABASE SEDONASNOW; + +SELECT SEDONA.ST_GeomFromWKT(wkt) AS geom +FROM your_table; + +SELECT SEDONA.ST_3DDistance(geom1, geom2) FROM spatial_data; +``` + +Here’s an example of a spatial join: + +```sql +SELECT * FROM lefts, rights +WHERE lefts.cellId = rights.cellId; +``` + +You can see how SedonaSnow seamlessly integrates into your current Snowflake environment. + +## Next steps + +SedonaSnow is an excellent option if you're doing serious spatial analysis in Snowflake. It is fast and provides a wide range of spatial functions. SedonaSnow removes the limitations of Snowflake's built-in spatial functions without forcing you to move your data to another platform. diff --git a/docs/sedonaspark.md b/docs/sedonaspark.md new file mode 100644 index 0000000000..3d67573c5e --- /dev/null +++ b/docs/sedonaspark.md @@ -0,0 +1,137 @@ + + +# SedonaSpark + +SedonaSpark extends Apache Spark with a rich set of out-of-the-box distributed Spatial Datasets and functions that efficiently load, process, and analyze large-scale spatial data across machines. SedonaSpark is an excellent option for datasets too large for a single machine. + +=== "SQL" + + ```sql + SELECT superhero.name + FROM city, superhero + WHERE ST_Contains(city.geom, superhero.geom) + AND city.name = 'Gotham' + ``` + +=== "PySpark" + + ```python + sedona.sql( + """ + SELECT superhero.name + FROM city, superhero + WHERE ST_Contains(city.geom, superhero.geom) + AND city.name = 'Gotham' + """ + ) + ``` + +=== "Java" + + ```java + Dataset result = spark.sql( + "SELECT superhero.name " + + "FROM city, superhero " + + "WHERE ST_Contains(city.geom, superhero.geom) " + + "AND city.name = 'Gotham'" + ); + ``` + +=== "Scala" + + ```scala + sedona.sql(""" + SELECT superhero.name + FROM city, superhero + WHERE ST_Contains(city.geom, superhero.geom) + AND city.name = 'Gotham' + """) + ``` + +=== "R" + + ```r + result <- sql(" + SELECT superhero.name + FROM city, superhero + WHERE ST_Contains(city.geom, superhero.geom) + AND city.name = 'Gotham' + ") + ``` + +## Key features + +* **Blazing fast**: SedonaSpark executes computations in parallel on many nodes in a cluster so that large computations can run fast. +* Supports **various file formats**, including GeoJSON, Shapefile, GeoParquet, STAC, JDBC, OSM PBF, CSV, and PostGIS. +* Exposes several **language APIs,** including SQL, Python, Java, Scala, and R. +* **Scalable**: Horizontally scale to tens, hundreds, or thousands of nodes depending on the size of your data. You can process massive spatial datasets with SedonaSpark. +* **Portable**: Easy to run in a custom environment, locally or in the cloud with AWS EMR, Microsoft Fabric, or Google DataProc. +* **Extensible**: You can extend SedonaSpark with your custom logic that suits your specific geospatial data analysis needs. +* **Open source**: Apache Sedona is an open-source project managed in accordance with the Apache Software Foundation's guidelines. +* Extra functionality like [nearest neighbor searching](https://sedona.apache.org/latest/api/sql/NearestNeighbourSearching/) and geostats like [DBSCAN](https://sedona.apache.org/latest/tutorial/sql/#cluster-with-dbscan) + +## Portability + +It’s easy to run SedonaSpark locally, with Docker, or on any popular cloud. + +SedonaSpark is designed to be run in any environment where Spark can run. Many cloud vendors have Spark runtimes, and Sedona can be added as a library dependency. + +Running Sedona locally is handy, allowing you to iterate on code before deploying it to production datasets. + +## Spark and Sedona example with vector data + +Let’s take a look at how to perform a workflow on a vector dataset with Spark and Sedona. + +Let’s use the base water data supplied by the Overture Maps Foundation to map all the bodies of water in the New York City area. Start by reading the data and creating a view: + +``` +base_water = sedona.table("open_data.overture_maps_foundation.base_water") +base_water.createOrReplaceTempView("base_water_view") +``` + +Now filter the dataset to include the bodies of water in the New York City area. + +```python +spot = "POLYGON ((-74.174194 40.509623, -73.635864 40.509623, -73.635864 40.93634, -74.174194 40.93634, -74.174194 40.509623))" +query = f""" +select id, geometry from base_water_view +where ST_Contains(ST_GeomFromWKT('{spot}'), geometry) +""" +res = sedona.sql(query) +``` + +Sedona integrates seamlessly with popular graphing libraries, making it easy to create graphs from a Sedona DataFrame. You can build a map with just two lines of code: + +```python +kepler_map = SedonaKepler.create_map() +SedonaKepler.add_df(kepler_map, df=res, name="Tri-state water") +``` + +The map looks amazing! + +![New York City water](../image/nyc_base_water.png) + +You can easily see all of the rivers, lakes, and swimming pools in the New York City area with this map. + +## Have questions? + +Feel free to start a GitHub Discussion or join the Discord community to ask the developers any questions you may have. + +We look forward to collaborating with you! diff --git a/mkdocs.yml b/mkdocs.yml index b87c504fc0..e53a0231f3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -20,19 +20,10 @@ site_url: https://sedona.apache.org site_description: Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark, Apache Flink, and Snowflake, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. nav: - Home: index.md - - Setup: - - Overview: setup/overview.md - - Supported platforms: - - Sedona with Apache Spark: - - Modules: setup/modules.md - - Language wrappers: setup/platform.md - - Sedona with Apache Flink: - - Modules: setup/flink/modules.md - - Language wrappers: setup/flink/platform.md - - Sedona with Snowflake: - - Modules: setup/snowflake/modules.md - - Maven Central coordinate: setup/maven-coordinates.md - - Install with Apache Spark: + - SedonaDB: 'https://sedona.apache.org/sedonadb/' + - SedonaSpark: + - Home: sedonaspark.md + - Install: - Install Sedona Scala/Java: setup/install-scala.md - Install Sedona Python: setup/install-python.md - Install Sedona R: api/rdocs @@ -46,15 +37,11 @@ nav: - Install on Microsoft Fabric: setup/fabric.md - Set up Spark cluster manually: setup/cluster.md - Install on Azure Synapse Analytics: setup/azure-synapse-analytics.md - - Install with Apache Flink: - - Install Sedona Scala/Java: setup/flink/install-scala.md - - Install Sedona Python: setup/flink/install-python.md - - Install with Snowflake: - - Install Sedona SQL: setup/snowflake/install.md - - Release notes: setup/release-notes.md - - Download: download.md - - Programming Guides: - - Sedona with Apache Spark: + - Overview: setup/overview.md + - Modules: setup/modules.md + - Language wrappers: setup/platform.md + - Maven Central coordinate: setup/maven-coordinates.md + - Programming Guide: - Spatial DataFrame / SQL app: tutorial/sql.md - Raster DataFrame / SQL app: tutorial/raster.md - Pure SQL environment: tutorial/sql-pure-sql.md @@ -81,16 +68,7 @@ nav: - Benchmark: tutorial/benchmark.md - Tune RDD application: tutorial/Advanced-Tutorial-Tune-your-Application.md - Storing large raster geometries in Parquet files: tutorial/storing-blobs-in-parquet.md - - Sedona with Apache Flink: - - Spatial SQL app (Flink): tutorial/flink/sql.md - - Spatial SQL app (PyFlink): tutorial/flink/pyflink-sql.md - - Sedona with Snowflake: - - Spatial SQL app (Snowflake): tutorial/snowflake/sql.md - - Examples: - - Scala/Java: tutorial/demo.md - - Python: tutorial/jupyter-notebook.md - - API Docs: - - Sedona with Apache Spark: + - API: - SQL: - Quick start: api/sql/Overview.md - Vector data: @@ -130,24 +108,47 @@ nav: - RDD: api/viz/java-api.md - Sedona R: api/rdocs - Sedona Python: api/pydocs - - Sedona with Apache Flink: + + - SedonaFlink: + - Home: sedonaflink.md + - Install: + - Install Sedona Scala/Java: setup/flink/install-scala.md + - Install Sedona Python: setup/flink/install-python.md + - Modules: setup/flink/modules.md + - Language wrappers: setup/flink/platform.md + - Programming Guides: + - Sedona with Apache Flink: + - Spatial SQL app (Flink): tutorial/flink/sql.md + - Spatial SQL app (PyFlink): tutorial/flink/pyflink-sql.md + - Examples: + - Scala/Java: tutorial/demo.md + - Python: tutorial/jupyter-notebook.md + - API: - SQL: - Overview (Flink): api/flink/Overview.md - Constructor (Flink): api/flink/Constructor.md - Function (Flink): api/flink/Function.md - Aggregator (Flink): api/flink/Aggregator.md - Predicate (Flink): api/flink/Predicate.md - - Sedona with Snowflake: + + - SedonaSnow: + - Home: sedonasnow.md + - Install: + - Install Sedona SQL: setup/snowflake/install.md + - Spatial SQL app (Snowflake): tutorial/snowflake/sql.md + - Modules: setup/snowflake/modules.md + - API: - SQL: - Overview (Snowflake): api/snowflake/vector-data/Overview.md - Constructor (Snowflake): api/snowflake/vector-data/Constructor.md - Function (Snowflake): api/snowflake/vector-data/Function.md - Aggregate Function (Snowflake): api/snowflake/vector-data/AggregateFunction.md - Predicate (Snowflake): api/snowflake/vector-data/Predicate.md - - SedonaDB: 'https://sedona.apache.org/sedonadb/' - SpatialBench: 'https://sedona.apache.org/spatialbench/' - Blog: blog/index.md - Community: + - Download: download.md + - Release notes: setup/release-notes.md - Compile the code: setup/compile.md - Community: community/contact.md - Contributor Guide: