Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Thalassa v0.2 #16

Closed
pmav99 opened this issue Oct 7, 2021 · 15 comments
Closed

Proposal for Thalassa v0.2 #16

pmav99 opened this issue Oct 7, 2021 · 15 comments

Comments

@pmav99
Copy link
Collaborator

pmav99 commented Oct 7, 2021

Hi @ALL

After discussing with @brey I've started to implement what, at least we hope, can be the basis for version 0.2 of Thalassa. The end result looks something like this:

image

This is pretty much a complete rewrite. At the moment only the "max elevation" view has been implemented, so some work is needed before we can have feature parity with version 0.1. Nevertheless, we think that:

  • the code is quite simpler to work with
  • the design is much cleaner
  • the UI is more intuitive
  • it addresses Dynamic data source selection #15
  • it gives greater flexibility in designing the UI (e.g. it should be possible to dynamically add/remove the Wireframe by selecting a checkbox etc).

An additional feature that should be mentioned is that with the way the code has been structured, it is possible to display the visual components (i.e. the graphs) on jupyterlab, too, which can be particularly helpful when developing/debugging. it looks like this:

image

How to check it out?

I've pushed the code on the v0.2 branch. I've removed all the non-related code, so no binder and docker dirs etc. I've also updated the README so please check that out too.

For the record, if you have uncommitted changes in your local repo, you might find it easier to create a new git clone and a new virtualenv/conda env.

Known problems

As you will probably notice there is no WMS layer. The reason why I have removed it is the impact it had on performance. In a nutshell, holoviews seems to be doing some type of reprojection when you create an overlay that includes a WMS which seems to be really slow and which happens every time you make a change to the viewport (e.g. pan or zoom). I believe that it will be possible to avoid the slow-down but I haven't had the time to dig in deeper. For the record, this also affects Thalassa v0.1 so let's discuss this on a separate issue.

@pmav99
Copy link
Collaborator Author

pmav99 commented Oct 7, 2021

pinging @brey @yosoyjay @cuill @josephzhang8 @saeed-moghimi-noaa @SorooshMani-NOAA (I hope I didn't forget anyone)

any comments are welcome

@saeed-moghimi-noaa
Copy link
Collaborator

saeed-moghimi-noaa commented Oct 7, 2021

Thanks @pmav99

capability to choose time series line plots from drop down menu

  • Location / station name
  • variable to plot (elevation)

Back end cashing to decrease response time

Possibility to convert max elev and mesh to parq file.

  • Seems like this file format being advertised as the scalable read write for unstructured data similar to zarr for gridded data
  • it let you to subset lat and lon for specific area. Need more research

@SorooshMani-NOAA
Copy link
Collaborator

Thank you @pmav99 for the changes and the detailed explanation. I haven't read the code yet, so I comment based on the high level description here and in the read me file.

  • In the readme I noticed panel serve is used to start Thalassa. Is cli removed and also is Deployment approach #14 considered (issue of launching new server every time a connection is made to the original port)?
  • Is there any data persistency or each app will have its own data source selector?

WMS performance issue was a good catch. Do you think it's time to talk with PyViz developers about that as @brey has suggested?

@pmav99
Copy link
Collaborator Author

pmav99 commented Oct 7, 2021

parq file

I am not familiar with this format, any links? Unless you mean https://parquet.apache.org/

@saeed-moghimi-noaa
Copy link
Collaborator

@saeed-moghimi-noaa
Copy link
Collaborator

@pmav99 Also see here:
holoviz/spatialpandas#1

@saeed-moghimi-noaa
Copy link
Collaborator

All,

I think there would be great if we can meet with @jbednar. I hope we can get more information on the scalability of pyviz (e.g. holoview) and how datashader supports back end for multiple front-ends access to the same figure e.g. zooming into different regions of the plot.

Thanks,
-Saeed

See this one as well.
https://summit.dask.org/schedule/presentation/22/scaling-geospatial-vector-data/

@brey
Copy link
Collaborator

brey commented Oct 7, 2021

I agree that at this stage and before we settle into the new paradigm that the new approach on Thalassa presents it would be great if you can engage the upstream community for more info in order to fix ideas.

@jbednar
Copy link

jbednar commented Oct 7, 2021

holoviews seems to be doing some type of reprojection ... I believe that it will be possible to avoid the slow-down

Yes! The easiest way to avoid the slowdown is to project your data into the display coordinate system (normally Web Mercator) before you view it. That's especially important if you will be working interactively with the data many times in each session; much cheaper to project it once and then view it many times at different zoom levels and locations. If your data is in bare lat/lon values to start with, you can use holoviews.util.transform.lon_lat_to_easting_northing; otherwise try gv.operation.project.

I hope we can get more information on the scalability of pyviz (e.g. holoview) and how datashader supports back end for multiple front-ends access to the same figure e.g. zooming into different regions of the plot.

Here I think you're asking about what happens if you host a dashboard or app that uses Datashader which is then visited by multiple users? By default, each such user will get their own copy of every attribute created by the app, including the data being plotted, which can quickly use up all the available memory. Instead you should use pn.state to cache the dataset so that all visitors share the same copy of the underlying data structures; see pn.state in https://examples.pyviz.org/exoplanets. Once you've done that, each new visitor will be cheap to support, storing just a few extra bits of information specific to their session and allowing the server to host many users simultaneously.

BTW, "PyViz" represents all viz libraries ever (see pyviz.org); here I think you're discussing HoloViz.org.

@saeed-moghimi-noaa
Copy link
Collaborator

@jbednar

Thanks a lot for taking the time to read, respond to this issue and providing very good pointers. Let us do our homework and get back to you. Best, @saeed-moghimi-noaa

@yosoyjay
Copy link
Collaborator

The easiest way to avoid the slowdown is to project your data into the display coordinate system (normally Web Mercator) before you view it.

@jbednar Thanks a lot for that explanation! It was clear from profiling that Thalassa was doing a lot of work on pan + zooms, but nothing I saw in the trace suggested it was reprojecting.

pmav99 added a commit that referenced this issue Oct 13, 2021
This commit is quite messy, but we are still at an early stage of the
development and the API is not set yet. Anyhow, it mostly  addresses the
remarks made by @jbednar at:

#16 (comment)

In a nutshell:

- We follow the recommendation of the [FAQ](https://holoviews.org/FAQ.html)
  and we stop mixing holoviews and geoviews objects.
- We use `pn.State` in order to cache the `xr.Dataset` objects among
  different requests.
- We fix cartopy to < 0.20 since we are affected by this issue:
  holoviz/geoviews#529
@pmav99
Copy link
Collaborator Author

pmav99 commented Oct 13, 2021

@jbednar thank you for the suggestions, they were really helpful

I did experiment with both of them and I think that we don't need to do any reprojections after all. Since our input data is in lon/lat we just need to avoid mixing holoviews and geoviews and just stick with geoviews instead. If we do that then performance is great. There is also a cartopy/proj bug that affects performance but we can circumvent it by downgrading to cartopy < 0.20.

Other than that, my preliminary tests show that pn.State seems to be working great, i.e. each worker loads the data just once. The only issue I can see is that bokeh/tornado does not necessarily distribute new requests to idle workers and, occasionally, this can make response slower. But this is probably not related to the holoviz universe

@ all the rest
I (force) pushed an update in the v0.2 branch. This is how things are looking now when working locally. The second dataset has a million nodes and 2 millions faces. I still haven't found the time to deploy this on a server to see how it goes over the network. It would be great if you could test it out with bigger grids

thalassa-2021-10-13_19.26.43.mp4

pmav99 added a commit that referenced this issue Oct 13, 2021
This commit is quite messy, but we are still at an early stage of the
development and the API is not set yet. Anyhow, it mostly  addresses the
remarks made by @jbednar at:

#16 (comment)

In a nutshell:

- We follow the recommendation of the [FAQ](https://holoviews.org/FAQ.html)
  and we stop mixing holoviews and geoviews objects.
- We use `pn.State` in order to cache the `xr.Dataset` objects among
  different requests.
- We fix cartopy to < 0.20 since we are affected by this issue:
  holoviz/geoviews#529
@cuill
Copy link
Collaborator

cuill commented Oct 19, 2021

Thank you @pmav99 for rewriting Thalassa v0.2.

I was able to run it on a virtual machine after making some changes:

  1. ModuleNotFoundError: No module named 'thalassa'
    It was fixed after modifyingfrom thalassa.web_ui import Thalassa to from web_ui import Thalassa

  2. ImportError: attempted relative import with no known parent package
    This was caused by from . import utils from . import visuals in web_ui.py
    So I changed it to from utils import can_be_opened_by_xarray, open_dataset `from visuals import get_trimesh, get_max_elevation, get_wireframe'

  3. PermissionError: [Errno 13] Permission denied: '/tmp/thalassa.log'
    I changed path to the current directory in 'config.yml'

Here is what the end results look like for ICOGS3D:
Screen Shot 2021-10-19 at 5 18 20 PM

@saeed-moghimi-noaa
Copy link
Collaborator

@pmav99
Copy link
Collaborator Author

pmav99 commented Nov 19, 2021

The code has been merged to master, so I guess this can be closed now

@pmav99 pmav99 closed this as completed Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants