Be polite with external services #60

ptagliolato · 2022-06-28T15:49:41Z

consider to create a caching mechanism for queries against external services in order to reduce the amount of requests.
This could be done by incapsulating the logics of do_Q and subsequent statements (e.g functions like "get_activity_info") in one function accepting e.g. the id and type of an entity: the function could implement the caching mechanism.
We could also consider packages like rcache to implement the mechanism.

oggioniale · 2022-06-29T13:55:05Z

please consider also this https://blog.r-hub.io/2021/07/30/cache/

ptagliolato · 2022-12-20T16:34:02Z

Two points on this issue.
The caching mechanism could be better implemented in the get_id function. This requires some work for managing internal and system variable and exactly define the flow. I stopped myself on this way after first steps in a branch on my fork. The point is that this sort of local cache could be useful to speed up one's work, but it does not fix the current issue,

e.g. during workshops.

A different approach would be that of implementing a reverse-proxy with caching capabilities. I installed on CNR-IREA internal swarm server a version (with some customization) of the docker image from https://github.com/djmaze/docker-caching-proxy.

I made some benchmark by setting the ReLTER::set_deims_base_url("") and invoking ReLTER::get_ilter_observedProperties(sitesNum = 100) and first results seem promising (50% time of the first -not cached- request).
I'm not an nginx expert and I'm sure it should be fine tuned, but it seems to me a possible strategy to ameliorate the workshop scenario.

micha-silver · 2022-12-21T08:04:29Z

Hi Paolo:
Thanks for revisiting this. Maybe I am not following, but what is the use case for caching queries? Do we expect multiple queries from one user for the same data? This might happen in a workshop framework, as you mention, but that's a special case, no?

In a broader view, the audience for the ReLTER package would be researchers working on a particular site. They will generally get the site metadata from DEIMS once, then go on to search for real data, elsewhere. Is that your view also?
If that is indeed the intention, then why do we need, i.e. get_ilter_generalinfo() to return a full list of all sites on DEIMS? (> 1000 sites). Why should we allow that query to run at all without at least a country_name or a site_name ??

And more importantly, following Christoph's request, we need to always use his API to get site data and never request the whole list of sites. Unless I am missing something, we have not yet implemented that. If I am wrong, please correct me!

stopopol · 2023-01-09T10:18:23Z

Hi,

I agree with Micha as I struggling to see the use case here. I think that if we optimise the way the API is queried as well as add additional filter options for the API, then we should have covered most (potential) performance issues.

Setting up a dedicated caching layers seems to be a little too much. Most users are probably not interested in querying all site records and their contents at once.

We can, of course, discuss this during the upcoming eLTER Software workshop.

Best,
Christoph

oggioniale · 2023-01-09T10:59:32Z

Dear @ptagliolato, @micha-silver and @stopopol.
The caching mechanism is not only for the large query but for improve the performances of all the queries from those dedicated to a single site (get_site_info()) to those dedicated to a network (get_network_observedProperties(), get_network_sites()). I thinking, for example, to have 100 simultaneous calls to obtain information from the list of sites of the LTER-Italia network.

The function get_ilter_generalinfo() was designed to get an overall idea of the global network and to answer network management questions (e.g. what is the altimetric distribution of sites in the ILTER network, what is the distribution of sites with respect to biogeographical regions, etc.). These queries will be made with low temporal frequency and during the training could be made simultaneously by a high number (20?) of people. In both cases, caching could be a good solution.

If we feel that the functions dedicated to ILTER are not important, we can also delete them, but before the workshop we cannot touch them.

stopopol · 2023-01-09T12:29:43Z

I've just had a look at the get_ilter_generalinfo function, but the description is not entirely correct (https://docs.ropensci.org/ReLTER/reference/get_ilter_generalinfo.html). There are only ~750 ILTER sites. If the function returns much more than that, it's actually querying all of DEIMS. Looking at the source code (https://github.com/ropensci/ReLTER/blob/main/R/get_ilter_generalinfo.R) the function is actually doing that. The query parameter should be something like this: https://deims.org/api/sites?network=1aa7ccb2-a14b-43d6-90ac-5e0a6bc1d65b&verified=true

EDIT: This also applies to the other ILTER functions, e.g. https://github.com/ropensci/ReLTER/blob/main/R/get_ilter_observedProperties.R

The functions are querying all of DEIMS instead of only querying the ILTER sites. Please update these functions.

EDIT 2: From what I can see, the functions also don't capitalise on the filter functionality of the API. The API can be filtered by site name (https://deims.org/api/sites?name=lago) and country (https://deims.org/api/sites?country=it).

ptagliolato · 2023-01-18T15:35:50Z

@stopopol Last commits (see #83) exploits the DEIMS API filter functionality you suggested (get_ilter_generalinfo). Moreover: it is now mandatory to set at least one of the two filters; the function now avoids any other request per-site (with the exception of cases where the first response from /sites/? returns only 3 sites, this is for temporary compatibility of old tests). I hope this solves the issue on the first function. Other get_ilter* functions must be modified or discarded (their status "questioning" accounts for this).

micha-silver · 2023-01-18T15:41:12Z

@ptagliolato : mentioning here so I won't forget :-)
I think we need a function similar to get_ilter_generalinfo() to query for networks. IIUC, There is, so far, no way to get the DEIMS ID of a network. Maybe the same function with an additional parameter?

ptagliolato · 2023-01-18T15:48:59Z

In order not to do the same mistake in communication, I would involve @stopopol and ask him for an appropriate API to retrieve network-only-related info (networkId, name, related networks for example). We can further discuss this online.

micha-silver · 2023-01-18T16:04:49Z

@ptagliolato Good idea. But it will be after Lyon in any case.

stopopol · 2023-01-18T18:13:06Z

We can discuss feature requests in detail in Lyon or other dedicated (virtual) meetings. In principle, if you need additional filter and query functionality for the rest-api I'll be happy to implement but it depends on the field/entity type. networks should be more straightforward to do but there are different ways to realise it.

micha-silver · 2023-01-18T19:24:16Z

@stopopol Looking at the workshop agenda, the best window for a meeting among us will be Wednesday.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be polite with external services #60

Be polite with external services #60

ptagliolato commented Jun 28, 2022

oggioniale commented Jun 29, 2022

ptagliolato commented Dec 20, 2022

micha-silver commented Dec 21, 2022

stopopol commented Jan 9, 2023

oggioniale commented Jan 9, 2023

stopopol commented Jan 9, 2023 •

edited

Loading

ptagliolato commented Jan 18, 2023

micha-silver commented Jan 18, 2023

ptagliolato commented Jan 18, 2023

micha-silver commented Jan 18, 2023

stopopol commented Jan 18, 2023

micha-silver commented Jan 18, 2023

Be polite with external services #60

Be polite with external services #60

Comments

ptagliolato commented Jun 28, 2022

oggioniale commented Jun 29, 2022

ptagliolato commented Dec 20, 2022

micha-silver commented Dec 21, 2022

stopopol commented Jan 9, 2023

oggioniale commented Jan 9, 2023

stopopol commented Jan 9, 2023 • edited Loading

ptagliolato commented Jan 18, 2023

micha-silver commented Jan 18, 2023

ptagliolato commented Jan 18, 2023

micha-silver commented Jan 18, 2023

stopopol commented Jan 18, 2023

micha-silver commented Jan 18, 2023

stopopol commented Jan 9, 2023 •

edited

Loading