Skip to content

Commit f647480

Browse files
authored
Merge pull request #56 from jiayuasu/master
Push GeoSpark 0.5.1
2 parents 5117a95 + b53433e commit f647480

24 files changed

+328
-224
lines changed

README.md

+29-132
Original file line numberDiff line numberDiff line change
@@ -3,142 +3,48 @@
33
[![Build Status](https://travis-ci.org/jiayuasu/GeoSpark.svg?branch=master)](https://travis-ci.org/jiayuasu/GeoSpark) [![Maven Central](https://maven-badges.herokuapp.com/maven-central/org.datasyslab/geospark/badge.svg)](https://maven-badges.herokuapp.com/maven-central/org.datasyslab/geospark)
44
[![Join the chat at https://gitter.im/geospark-datasys/Lobby](https://badges.gitter.im/geospark-datasys/Lobby.svg)](https://gitter.im/geospark-datasys/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
55

6-
GeoSpark is listed as **Infrastructure Project** on **Apache Spark Official Third Party Project Page** ([http://spark.apache.org/third-party-projects.html](http://spark.apache.org/third-party-projects.html))
7-
8-
GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs) that efficiently load, process, and analyze large-scale spatial data across machines. GeoSpark provides APIs for Apache Spark programmer to easily develop their spatial analysis programs with Spatial Resilient Distributed Datasets (SRDDs) which have in house support for geometrical and Spatial Queries (Range, K Nearest Neighbors, Join).
6+
``` Supported Apache Spark version: 2.0+(Master branch), 1.0+(1.X branch) ```
97

8+
GeoSpark is listed as **Infrastructure Project** on [**Apache Spark Official Third Party Project Page**](http://spark.apache.org/third-party-projects.html)
109

10+
GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs) that efficiently load, process, and analyze large-scale spatial data across machines. GeoSpark provides APIs for Apache Spark programmer to easily develop their spatial analysis programs with Spatial Resilient Distributed Datasets (SRDDs) which have in house support for geometrical and Spatial Queries (Range, K Nearest Neighbors, Join).
1111

12-
GeoSpark artifacts are hosted in Maven Central. You can add a Maven dependency with the following coordinates:
1312

14-
The following version supports Apache Spark 2.X versions:
1513

16-
```
17-
groupId: org.datasyslab
18-
artifactId: geospark
19-
version: 0.5.0
20-
```
14+
GeoSpark artifacts are hosted in Maven Central: [**Maven Central Coordinates**](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Maven-Central-Coordinates)
2115

22-
The following version supports Apache Spark 1.X versions:
2316

24-
```
25-
groupId: org.datasyslab
26-
artifactId: geospark
27-
version: 0.5.0-spark-1.x
28-
```
2917

30-
## Version information ([Full List](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Full-Version-Release-notes))
18+
# Version information ([more](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Full-Version-Release-notes))
3119

3220

3321
| Version | Summary |
3422
|:----------------: |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
35-
| 0.5.0| **Major updates:** We are pleased to announce the initial version of [Babylon](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon) a large-scale in-memory geospatial visualization system extending GeoSpark. Babylon and GeoSpark are integrated together. You can just import GeoSpark and enjoy! More details are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon);
36-
| 0.4.0| **Major updates:** ([Example](https://github.com/DataSystemsLab/GeoSpark/blob/master/src/main/java/org/datasyslab/geospark/showcase/Example.java)) 1. Refactor constrcutor API usage. 2. Simplify Spatial Join Query API. 3. Add native support for LineStringRDD; **Functionality enhancement:** 1. Release the persist function back to users. 2. Add more exception explanations.|
37-
38-
##News
39-
* GeoSpark Gitter Chat is now online! Chat with our GeoSpark users and ask questions!
40-
* **Babylon Visualization Framework** on GeoSpark is now available!
41-
Babylon is a large-scale in-memory geospatial visualization system. More details are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon)
42-
43-
<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/usrail.png" width="250">
44-
<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/nycheatmap.png" width="250">
45-
<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/ustweet.png" width="250">
46-
47-
## How to get started (For GeoSpark Scala and Java developers)
48-
49-
50-
### Prerequisites
51-
52-
1. Apache Spark 2.X releases (Apache Spark 1.X releases support available in GeoSpark for Spark 1.X branch)
53-
2. JDK 1.7 or later
54-
3. You might need to modify the dependencies in "POM.xml" and make it consistent with your environment.
55-
56-
Note: GeoSpark Master branch supports Apache Spark 2.X releases and GeoSpark for Spark 1.X branch supports Apache Spark 1.X releases. Please refer to the proper branch you need.
57-
58-
### How to use GeoSpark APIs in an interactive Spark shell (Scala)
59-
60-
1. Have your Spark cluster ready.
61-
2. Download [pre-compiled GeoSpark jar](https://github.com/DataSystemsLab/GeoSpark/releases) under "Release" tag.
62-
3. Run Spark shell with GeoSpark as a dependency.
63-
64-
`
65-
./bin/spark-shell --jars GeoSpark_COMPILED.jar
66-
`
67-
68-
3. You can now call GeoSpark APIs directly in your Spark shell!
69-
70-
### How to use GeoSpark APIs in a self-contained Spark application (Scala and Java)
71-
72-
1. Create your own Apache Spark project in Scala or Java
73-
2. Add GeoSpark Maven coordinates into your project dependencies.
74-
4. You can now use GeoSpark APIs in your Spark program!
75-
5. Use spark-submit to submit your compiled self-contained Spark program.
76-
77-
### GeoSpark Programming Examples (Scala)
78-
79-
[GeoSpark Scala Example for GeoSpark 0.4 or later](https://gist.github.com/jiayuasu/e3571e982c518bb522e6c6c962207255)
80-
81-
[GeoSpark Scala Example for GeoSpark 0.3.x](https://gist.github.com/jiayuasu/bcecaa2e9e6f280a0f9a72bb7549ffaa)
82-
83-
[Test Data](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/test/resources)
84-
85-
### GeoSpark Programming Examples (Java)
86-
87-
[GeoSpark Java Example](https://github.com/DataSystemsLab/GeoSpark/blob/master/src/main/java/org/datasyslab/geospark/showcase/Example.java)
88-
89-
[Test Data](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/test/resources)
90-
91-
## Scala and Java API usage
92-
93-
Note: Scala can call Java APIs seamlessly. That means GeoSpark Scala users use the same APIs with GeoSpark Java users.
94-
95-
Please refer to [GeoSpark Scala and Java API Usage](http://www.public.asu.edu/~jiayu2/geospark/javadoc/)
96-
23+
|0.5.1| **Bug fix:** (1) GeoSpark: Fix inaccurate KNN result when K is large (2) GeoSpark: Replace incompatible Spark API call [Issue #55](https://github.com/DataSystemsLab/GeoSpark/issues/55); (3) Babylon: Remove JPG output format temporarily due to the lack of OpenJDK support|
24+
| 0.5.0| **Major updates:** We are pleased to announce the initial version of [Babylon](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon) a large-scale in-memory geospatial visualization system extending GeoSpark. Babylon and GeoSpark are integrated together. You can just import GeoSpark and enjoy! More details are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon)|
9725

26+
# Important features ([more](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Important-Features))
9827
## Spatial Resilient Distributed Datasets (SRDDs)
99-
100-
GeoSpark extends RDDs to form Spatial RDDs (SRDDs) and efficiently partitions SRDD data elements across machines and introduces novel parallelized spatial (geometric operations that follows the Open Geosptial Consortium (OGC) standard) transformations and actions (for SRDD) that provide a more intuitive interface for users to write spatial data analytics programs. Moreover, GeoSpark extends the SRDD layer to execute spatial queries (e.g., Range query, KNN query, and Join query) on large-scale spatial datasets. After geometrical objects are retrieved in the Spatial RDD layer, users can invoke spatial query processing operations provided in the Spatial Query Processing Layer of GeoSpark which runs over the in-memory cluster, decides how spatial object-relational tuples could be stored, indexed, and accessed using SRDDs, and returns the spatial query results required by user.
101-
102-
**Supported Spatial RDDs: PointRDD, RectangleRDD, PolygonRDD, LineStringRDD**
28+
Supported Spatial RDDs: PointRDD, RectangleRDD, PolygonRDD, LineStringRDD
10329

10430
## Supported data format
105-
**Native input format support**
106-
107-
Comma-Separated Values (**FileDataSplitter.CSV**), Tab-separated values (**FileDataSplitter.TSV**), Well-Known Text (**FileDataSplitter.WKT**), and GeoJSON (**FileDataSplitter.GeoJSON**) as the input formats. Users only need to specify input format as Splitter and the start and end offset (if necessary) of spatial fields in one row when call Constructors.
108-
109-
**User-supplied input format mapper**
110-
111-
Examples: [user-supplied input format mapper](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/geospark/showcase)
112-
113-
You can load your mapper like this:
114-
115-
```
116-
mySpatialRDD = PointRDD(sparkContext, InputLocation, userSuppliedMapper);
117-
mySpatialRDD = PointRDD(sparkContext, InputLocation, PartitionNum, userSuppliedMapper);
118-
119-
mySpatialRDD = LineStringRDD(sparkContext, InputLocation, userSuppliedMapper);
120-
mySpatialRDD = LineStringRDD(sparkContext, InputLocation, PartitionNum, userSuppliedMapper);
121-
...
122-
123-
```
124-
125-
## Important features
126-
127-
### Spatial partitioning
128-
129-
GeoSpark supports R-Tree (**GridType.RTREE**) and Voronoi diagram (**GridType.VORONOI**) spatial partitioning methods. Spatial partitioning is to repartition RDD according to objects' spatial locations. Spatial join on spatial paritioned RDD will be very fast.
31+
**Native input format support**: CSV, TSV, WKT, GeoJSON
13032

131-
### Spatial Index
33+
**User-supplied input format mapper**: Any input formats
13234

133-
GeoSpark supports two Spatial Indexes, Quad-Tree (**IndexType.QUADTREE**) and R-Tree (**IndexType.RTREE**). Quad-Tree doesn't support Spatial K Nearest Neighbors query.
35+
## Spatial Partitioning
36+
Supported Spatial Partitioning techniques: R-Tree, Voronoi diagram
13437

135-
### Geometrical operation
38+
## Spatial Index
39+
Supported Spatial Indexes: Quad-Tree and R-Tree. Quad-Tree doesn't support Spatial K Nearest Neighbors query.
13640

137-
GeoSpark currently provides native support for Inside, Overlap, DatasetBoundary, Minimum Bounding Rectangle and Polygon Union in SRDDS following [Open Geospatial Consortium (OGC) standard](http://www.opengeospatial.org/standards).
41+
## Geometrical operation
42+
Inside, Overlap, DatasetBoundary, Minimum Bounding Rectangl, Polygon Union
13843

139-
### Spatial Operation
44+
## Spatial Operation
45+
Spatial Range Query, Spatial Join Query, and Spatial K Nearest Neighbors Query.
14046

141-
GeoSpark so far provides **Spatial Range Query**, **Spatial Join Query**, and **Spatial K Nearest Neighbors Query**.
47+
# GeoSpark Tutorial ([more](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Tutorial))
14248

14349
#Babylon Visualization Framework on GeoSpark
14450
Babylon is a large-scale in-memory geospatial visualization system.
@@ -152,51 +58,42 @@ Babylon and GeoSpark are integrated together. You just need to import GeoSpark a
15258
<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/nycheatmap.png" width="250">
15359
<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/ustweet.png" width="250">
15460

155-
## Publication
61+
# Publication
15662

15763
Jia Yu, Jinxuan Wu, Mohamed Sarwat. ["A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data"](). (demo paper) In Proceeding of IEEE International Conference on Data Engineering ICDE 2016, Helsinki, FI, May 2016
15864

15965
Jia Yu, Jinxuan Wu, Mohamed Sarwat. ["GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data"](http://www.public.asu.edu/~jiayu2/geospark/publication/GeoSpark_ShortPaper.pdf). (short paper) In Proceeding of the ACM International Conference on Advances in Geographic Information Systems ACM SIGSPATIAL GIS 2015, Seattle, WA, USA November 2015
16066

16167

162-
## Acknowledgement
68+
# Acknowledgement
16369

16470
GeoSpark makes use of JTS Plus (An extended JTS Topology Suite Version 1.14) for some geometrical computations.
16571

16672
Please refer to [JTS Topology Suite website](http://tsusiatsoftware.net/jts/main.html) and [JTS Plus](https://github.com/jiayuasu/JTSplus) for more details.
16773

16874

16975

170-
## Contact
76+
# Contact
17177

172-
### Questions
78+
## Questions
17379

17480
* Please join [![Join the chat at https://gitter.im/geospark-datasys/Lobby](https://badges.gitter.im/geospark-datasys/Lobby.svg)](https://gitter.im/geospark-datasys/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
17581

17682
* Email us!
17783

178-
### Contributors
84+
## Contributors
17985
* [Jia Yu](http://www.public.asu.edu/~jiayu2/) (Email: [email protected])
18086

18187
* [Jinxuan Wu](http://www.public.asu.edu/~jinxuanw/) (Email: [email protected])
18288

18389
* [Mohamed Sarwat](http://faculty.engineering.asu.edu/sarwat/) (Email: [email protected])
18490

185-
### Project website
91+
## Project website
18692
Please visit [GeoSpark project wesbite](http://geospark.datasyslab.org) for latest news and releases.
18793

188-
### Data Systems Lab
94+
## Data Systems Lab
18995
GeoSpark is one of the projects under [Data Systems Lab](http://www.datasyslab.org/) at Arizona State University. The mission of Data Systems Lab is designing and developing experimental data management systems (e.g., database systems).
19096

191-
## Thanks for the help from GeoSpark community
192-
We appreciate the help and suggestions from the following GeoSpark users (The list is growing..):
193-
194-
* @gaufung
195-
* @lrojas94
196-
* @mdespriee
197-
* @sabman
198-
* @samchorlton
199-
* @Tsarazin
200-
* @TBuc
201-
* ...
97+
# Thanks for the help from GeoSpark community
98+
We appreciate the help and suggestions from GeoSpark users: [**Thanks List**](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Community-Thanks-List)
20299

pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<modelVersion>4.0.0</modelVersion>
44
<groupId>org.datasyslab</groupId>
55
<artifactId>geospark</artifactId>
6-
<version>0.5.0</version>
6+
<version>0.5.1</version>
77

88
<name>${project.groupId}:${project.artifactId}</name>
99
<description>Geospatial extension for Apache Spark</description>
@@ -58,7 +58,7 @@
5858
<dependency>
5959
<groupId>org.datasyslab</groupId>
6060
<artifactId>JTSplus</artifactId>
61-
<version>0.1.0</version>
61+
<version>0.1.1</version>
6262
</dependency>
6363

6464
<dependency>

src/main/java/org/datasyslab/babylon/README.md

+7-5
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
## Main Features
1414

15-
### Extensible Visualization operator
15+
### Extensible Visualization operator (just like playing LEGO bricks)!
1616

1717
* Support super high resolution image generation: parallel map image rendering
1818
* Visualize Spatial RDD and Spatial Queries (Spatial Range, Spatial K Nearest Neighbors, Spatial Join)
@@ -22,7 +22,7 @@
2222
### Overlay Operator
2323
Overlay one map layer with many other map layers!
2424

25-
### Various Image filter
25+
### Various Image Filter
2626
* Gaussian Blur
2727
* Box Blur
2828
* Embose
@@ -34,10 +34,12 @@ You also can buld your new image filter by easily extending the photo filter!
3434

3535
### Various Image Type
3636
* PNG
37-
* JPEG
37+
* JPG
3838
* GIF
39+
* More!
40+
41+
You also can support your desired image type by easily extending image generator! (JPG format is temporarily unavailable due to the lack of OpenJDK support)
3942

40-
You also can support your desired image type by easily extending the photo filter!
4143

4244

4345
### Current Visualization effect
@@ -50,7 +52,7 @@ You also can support your desired image type by easily extending the photo filte
5052
You also can build your new self-designed effects by easily extending the visualization operator!
5153

5254
### Example
53-
Here is [a runnable single machine exmaple code](https://github.com/jiayuasu/GeoSpark/blob/master/src/main/java/org/datasyslab/babylon/showcase/Example.java). You can clone this repository and directly run it on you local machine!
55+
Here is [a runnable single machine exmaple code](https://github.com/DataSystemsLab/GeoSpark/blob/master/src/main/java/org/datasyslab/babylon/showcase/Example.java). You can clone this repository and directly run it on you local machine!
5456

5557
### Scala and Java API
5658
Please refer to [Babylon Scala and Java API](http://www.public.asu.edu/~jiayu2/geospark/javadoc/latest/).

src/main/java/org/datasyslab/babylon/core/ImageGenerator.java

+6-3
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
import java.util.List;
1212

1313
import org.apache.spark.api.java.JavaPairRDD;
14+
import org.datasyslab.babylon.utils.ImageType;
1415

1516
import scala.Tuple2;
1617

@@ -24,15 +25,16 @@ public abstract class ImageGenerator implements Serializable{
2425
*
2526
* @param distributedPixelImage the distributed pixel image
2627
* @param outputPath the output path
28+
* @param imageType the image type
2729
* @return true, if successful
2830
* @throws Exception the exception
2931
*/
30-
public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distributedPixelImage, String outputPath) throws Exception
32+
public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distributedPixelImage, String outputPath, ImageType imageType) throws Exception
3133
{
3234
List<Tuple2<Integer,ImageSerializableWrapper>> imagePartitions = distributedPixelImage.collect();
3335
for(Tuple2<Integer,ImageSerializableWrapper> imagePartition:imagePartitions)
3436
{
35-
this.SaveAsFile(imagePartition._2.image, outputPath+"-"+imagePartition._1);
37+
this.SaveAsFile(imagePartition._2.image, outputPath+"-"+imagePartition._1, imageType);
3638
}
3739
return true;
3840
}
@@ -42,8 +44,9 @@ public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distribu
4244
*
4345
* @param pixelImage the pixel image
4446
* @param outputPath the output path
47+
* @param imageType the image type
4548
* @return true, if successful
4649
* @throws Exception the exception
4750
*/
48-
public abstract boolean SaveAsFile(BufferedImage pixelImage, String outputPath) throws Exception;
51+
public abstract boolean SaveAsFile(BufferedImage pixelImage, String outputPath, ImageType imageType) throws Exception;
4952
}

src/main/java/org/datasyslab/babylon/extension/imageGenerator/NativeJavaImageGenerator.java

+4-3
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import javax.imageio.ImageIO;
1414

1515
import org.datasyslab.babylon.core.ImageGenerator;
16+
import org.datasyslab.babylon.utils.ImageType;
1617

1718
/**
1819
* The Class NativeJavaImageGenerator.
@@ -23,11 +24,11 @@ public class NativeJavaImageGenerator extends ImageGenerator{
2324
* @see org.datasyslab.babylon.core.ImageGenerator#SaveAsFile(java.awt.image.BufferedImage, java.lang.String)
2425
*/
2526
@Override
26-
public boolean SaveAsFile(BufferedImage pixelImage, String outputPath) {
27-
File outputImage = new File(outputPath+".png");
27+
public boolean SaveAsFile(BufferedImage pixelImage, String outputPath, ImageType imageType) {
28+
File outputImage = new File(outputPath+"."+imageType.getTypeName());
2829
outputImage.getParentFile().mkdirs();
2930
try {
30-
ImageIO.write(pixelImage,"png",outputImage);
31+
ImageIO.write(pixelImage,imageType.getTypeName(),outputImage);
3132
} catch (IOException e) {
3233
e.printStackTrace();
3334
}

0 commit comments

Comments
 (0)