Proposal for Thalassa v0.2 #16

pmav99 · 2021-10-07T12:45:41Z

After discussing with @brey I've started to implement what, at least we hope, can be the basis for version 0.2 of Thalassa. The end result looks something like this:

This is pretty much a complete rewrite. At the moment only the "max elevation" view has been implemented, so some work is needed before we can have feature parity with version 0.1. Nevertheless, we think that:

the code is quite simpler to work with
the design is much cleaner
the UI is more intuitive
it addresses Dynamic data source selection #15
it gives greater flexibility in designing the UI (e.g. it should be possible to dynamically add/remove the Wireframe by selecting a checkbox etc).

An additional feature that should be mentioned is that with the way the code has been structured, it is possible to display the visual components (i.e. the graphs) on jupyterlab, too, which can be particularly helpful when developing/debugging. it looks like this:

How to check it out?

I've pushed the code on the v0.2 branch. I've removed all the non-related code, so no binder and docker dirs etc. I've also updated the README so please check that out too.

For the record, if you have uncommitted changes in your local repo, you might find it easier to create a new git clone and a new virtualenv/conda env.

Known problems

As you will probably notice there is no WMS layer. The reason why I have removed it is the impact it had on performance. In a nutshell, holoviews seems to be doing some type of reprojection when you create an overlay that includes a WMS which seems to be really slow and which happens every time you make a change to the viewport (e.g. pan or zoom). I believe that it will be possible to avoid the slow-down but I haven't had the time to dig in deeper. For the record, this also affects Thalassa v0.1 so let's discuss this on a separate issue.

The text was updated successfully, but these errors were encountered:

pmav99 · 2021-10-07T12:49:10Z

pinging @brey @yosoyjay @cuill @josephzhang8 @saeed-moghimi-noaa @SorooshMani-NOAA (I hope I didn't forget anyone)

any comments are welcome

saeed-moghimi-noaa · 2021-10-07T13:23:05Z

Thanks @pmav99

capability to choose time series line plots from drop down menu

Location / station name
variable to plot (elevation)

Back end cashing to decrease response time

Possibility to convert max elev and mesh to parq file.

Seems like this file format being advertised as the scalable read write for unstructured data similar to zarr for gridded data
it let you to subset lat and lon for specific area. Need more research

SorooshMani-NOAA · 2021-10-07T13:43:02Z

Thank you @pmav99 for the changes and the detailed explanation. I haven't read the code yet, so I comment based on the high level description here and in the read me file.

In the readme I noticed panel serve is used to start Thalassa. Is cli removed and also is Deployment approach #14 considered (issue of launching new server every time a connection is made to the original port)?
Is there any data persistency or each app will have its own data source selector?

WMS performance issue was a good catch. Do you think it's time to talk with PyViz developers about that as @brey has suggested?

pmav99 · 2021-10-07T15:18:57Z

parq file

I am not familiar with this format, any links? Unless you mean https://parquet.apache.org/

saeed-moghimi-noaa · 2021-10-07T16:20:24Z

@pmav99

I think you are right:
https://docs.dask.org/en/latest/generated/dask.dataframe.read_parquet.html

saeed-moghimi-noaa · 2021-10-07T16:22:37Z

@pmav99 Also see here:
holoviz/spatialpandas#1

saeed-moghimi-noaa · 2021-10-07T16:32:54Z

All,

I think there would be great if we can meet with @jbednar. I hope we can get more information on the scalability of pyviz (e.g. holoview) and how datashader supports back end for multiple front-ends access to the same figure e.g. zooming into different regions of the plot.

Thanks,
-Saeed

See this one as well.
https://summit.dask.org/schedule/presentation/22/scaling-geospatial-vector-data/

brey · 2021-10-07T16:37:53Z

I agree that at this stage and before we settle into the new paradigm that the new approach on Thalassa presents it would be great if you can engage the upstream community for more info in order to fix ideas.

jbednar · 2021-10-07T16:56:37Z

holoviews seems to be doing some type of reprojection ... I believe that it will be possible to avoid the slow-down

Yes! The easiest way to avoid the slowdown is to project your data into the display coordinate system (normally Web Mercator) before you view it. That's especially important if you will be working interactively with the data many times in each session; much cheaper to project it once and then view it many times at different zoom levels and locations. If your data is in bare lat/lon values to start with, you can use holoviews.util.transform.lon_lat_to_easting_northing; otherwise try gv.operation.project.

I hope we can get more information on the scalability of pyviz (e.g. holoview) and how datashader supports back end for multiple front-ends access to the same figure e.g. zooming into different regions of the plot.

Here I think you're asking about what happens if you host a dashboard or app that uses Datashader which is then visited by multiple users? By default, each such user will get their own copy of every attribute created by the app, including the data being plotted, which can quickly use up all the available memory. Instead you should use pn.state to cache the dataset so that all visitors share the same copy of the underlying data structures; see pn.state in https://examples.pyviz.org/exoplanets. Once you've done that, each new visitor will be cheap to support, storing just a few extra bits of information specific to their session and allowing the server to host many users simultaneously.

BTW, "PyViz" represents all viz libraries ever (see pyviz.org); here I think you're discussing HoloViz.org.

saeed-moghimi-noaa · 2021-10-08T01:48:33Z

@jbednar

Thanks a lot for taking the time to read, respond to this issue and providing very good pointers. Let us do our homework and get back to you. Best, @saeed-moghimi-noaa

yosoyjay · 2021-10-12T16:43:11Z

The easiest way to avoid the slowdown is to project your data into the display coordinate system (normally Web Mercator) before you view it.

@jbednar Thanks a lot for that explanation! It was clear from profiling that Thalassa was doing a lot of work on pan + zooms, but nothing I saw in the trace suggested it was reprojecting.

@jbednar

This commit is quite messy, but we are still at an early stage of the development and the API is not set yet. Anyhow, it mostly addresses the remarks made by @jbednar at: #16 (comment) In a nutshell: - We follow the recommendation of the [FAQ](https://holoviews.org/FAQ.html) and we stop mixing holoviews and geoviews objects. - We use `pn.State` in order to cache the `xr.Dataset` objects among different requests. - We fix cartopy to < 0.20 since we are affected by this issue: holoviz/geoviews#529

pmav99 · 2021-10-13T16:53:14Z

@jbednar thank you for the suggestions, they were really helpful

I did experiment with both of them and I think that we don't need to do any reprojections after all. Since our input data is in lon/lat we just need to avoid mixing holoviews and geoviews and just stick with geoviews instead. If we do that then performance is great. There is also a cartopy/proj bug that affects performance but we can circumvent it by downgrading to cartopy < 0.20.

Other than that, my preliminary tests show that pn.State seems to be working great, i.e. each worker loads the data just once. The only issue I can see is that bokeh/tornado does not necessarily distribute new requests to idle workers and, occasionally, this can make response slower. But this is probably not related to the holoviz universe

@ all the rest
I (force) pushed an update in the v0.2 branch. This is how things are looking now when working locally. The second dataset has a million nodes and 2 millions faces. I still haven't found the time to deploy this on a server to see how it goes over the network. It would be great if you could test it out with bigger grids

thalassa-2021-10-13_19.26.43.mp4

@jbednar

This commit is quite messy, but we are still at an early stage of the development and the API is not set yet. Anyhow, it mostly addresses the remarks made by @jbednar at: #16 (comment) In a nutshell: - We follow the recommendation of the [FAQ](https://holoviews.org/FAQ.html) and we stop mixing holoviews and geoviews objects. - We use `pn.State` in order to cache the `xr.Dataset` objects among different requests. - We fix cartopy to < 0.20 since we are affected by this issue: holoviz/geoviews#529

cuill · 2021-10-19T21:34:08Z

Thank you @pmav99 for rewriting Thalassa v0.2.

I was able to run it on a virtual machine after making some changes:

ModuleNotFoundError: No module named 'thalassa'
It was fixed after modifyingfrom thalassa.web_ui import Thalassa to from web_ui import Thalassa
ImportError: attempted relative import with no known parent package
This was caused by from . import utils from . import visuals in web_ui.py
So I changed it to from utils import can_be_opened_by_xarray, open_dataset `from visuals import get_trimesh, get_max_elevation, get_wireframe'
PermissionError: [Errno 13] Permission denied: '/tmp/thalassa.log'
I changed path to the current directory in 'config.yml'

Here is what the end results look like for ICOGS3D:

saeed-moghimi-noaa · 2021-11-09T19:12:03Z

All,

Here is a related post:

https://discourse.holoviz.org/t/panel-webapp-with-dask-bokeh-tornado-and-asynchronous-tasks/2388

pmav99 · 2021-11-19T18:38:01Z

The code has been merged to master, so I guess this can be closed now

pmav99 closed this as completed Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for Thalassa v0.2 #16

Proposal for Thalassa v0.2 #16

pmav99 commented Oct 7, 2021

pmav99 commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021 •

edited

Loading

SorooshMani-NOAA commented Oct 7, 2021

pmav99 commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021

brey commented Oct 7, 2021

jbednar commented Oct 7, 2021 •

edited

Loading

saeed-moghimi-noaa commented Oct 8, 2021

yosoyjay commented Oct 12, 2021

pmav99 commented Oct 13, 2021 •

edited

Loading

cuill commented Oct 19, 2021

saeed-moghimi-noaa commented Nov 9, 2021

pmav99 commented Nov 19, 2021

Proposal for Thalassa v0.2 #16

Proposal for Thalassa v0.2 #16

Comments

pmav99 commented Oct 7, 2021

How to check it out?

Known problems

pmav99 commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021 • edited Loading

capability to choose time series line plots from drop down menu

Back end cashing to decrease response time

Possibility to convert max elev and mesh to parq file.

SorooshMani-NOAA commented Oct 7, 2021

pmav99 commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021

saeed-moghimi-noaa commented Oct 7, 2021

brey commented Oct 7, 2021

jbednar commented Oct 7, 2021 • edited Loading

saeed-moghimi-noaa commented Oct 8, 2021

yosoyjay commented Oct 12, 2021

pmav99 commented Oct 13, 2021 • edited Loading

cuill commented Oct 19, 2021

saeed-moghimi-noaa commented Nov 9, 2021

pmav99 commented Nov 19, 2021

saeed-moghimi-noaa commented Oct 7, 2021 •

edited

Loading

jbednar commented Oct 7, 2021 •

edited

Loading

pmav99 commented Oct 13, 2021 •

edited

Loading