-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be polite with external services #60
Comments
please consider also this https://blog.r-hub.io/2021/07/30/cache/ |
Two points on this issue. e.g. during workshops. A different approach would be that of implementing a reverse-proxy with caching capabilities. I installed on CNR-IREA internal swarm server a version (with some customization) of the docker image from https://github.com/djmaze/docker-caching-proxy. I made some benchmark by setting the ReLTER::set_deims_base_url("") and invoking ReLTER::get_ilter_observedProperties(sitesNum = 100) and first results seem promising (50% time of the first -not cached- request). |
Hi Paolo: In a broader view, the audience for the ReLTER package would be researchers working on a particular site. They will generally get the site metadata from DEIMS once, then go on to search for real data, elsewhere. Is that your view also? And more importantly, following Christoph's request, we need to always use his API to get site data and never request the whole list of sites. Unless I am missing something, we have not yet implemented that. If I am wrong, please correct me! |
Hi, I agree with Micha as I struggling to see the use case here. I think that if we optimise the way the API is queried as well as add additional filter options for the API, then we should have covered most (potential) performance issues. Setting up a dedicated caching layers seems to be a little too much. Most users are probably not interested in querying all site records and their contents at once. We can, of course, discuss this during the upcoming eLTER Software workshop. Best, |
Dear @ptagliolato, @micha-silver and @stopopol. The function get_ilter_generalinfo() was designed to get an overall idea of the global network and to answer network management questions (e.g. what is the altimetric distribution of sites in the ILTER network, what is the distribution of sites with respect to biogeographical regions, etc.). These queries will be made with low temporal frequency and during the training could be made simultaneously by a high number (20?) of people. In both cases, caching could be a good solution. If we feel that the functions dedicated to ILTER are not important, we can also delete them, but before the workshop we cannot touch them. |
I've just had a look at the get_ilter_generalinfo function, but the description is not entirely correct (https://docs.ropensci.org/ReLTER/reference/get_ilter_generalinfo.html). There are only ~750 ILTER sites. If the function returns much more than that, it's actually querying all of DEIMS. Looking at the source code (https://github.com/ropensci/ReLTER/blob/main/R/get_ilter_generalinfo.R) the function is actually doing that. The query parameter should be something like this: https://deims.org/api/sites?network=1aa7ccb2-a14b-43d6-90ac-5e0a6bc1d65b&verified=true EDIT: This also applies to the other ILTER functions, e.g. https://github.com/ropensci/ReLTER/blob/main/R/get_ilter_observedProperties.R The functions are querying all of DEIMS instead of only querying the ILTER sites. Please update these functions. EDIT 2: From what I can see, the functions also don't capitalise on the filter functionality of the API. The API can be filtered by site name (https://deims.org/api/sites?name=lago) and country (https://deims.org/api/sites?country=it). |
@stopopol Last commits (see #83) exploits the DEIMS API filter functionality you suggested (get_ilter_generalinfo). Moreover: it is now mandatory to set at least one of the two filters; the function now avoids any other request per-site (with the exception of cases where the first response from /sites/? returns only 3 sites, this is for temporary compatibility of old tests). I hope this solves the issue on the first function. Other get_ilter* functions must be modified or discarded (their status "questioning" accounts for this). |
@ptagliolato : mentioning here so I won't forget :-) |
In order not to do the same mistake in communication, I would involve @stopopol and ask him for an appropriate API to retrieve network-only-related info (networkId, name, related networks for example). We can further discuss this online. |
@ptagliolato Good idea. But it will be after Lyon in any case. |
We can discuss feature requests in detail in Lyon or other dedicated (virtual) meetings. In principle, if you need additional filter and query functionality for the rest-api I'll be happy to implement but it depends on the field/entity type. networks should be more straightforward to do but there are different ways to realise it. |
@stopopol Looking at the workshop agenda, the best window for a meeting among us will be Wednesday. |
consider to create a caching mechanism for queries against external services in order to reduce the amount of requests.
This could be done by incapsulating the logics of do_Q and subsequent statements (e.g functions like "get_activity_info") in one function accepting e.g. the id and type of an entity: the function could implement the caching mechanism.
We could also consider packages like rcache to implement the mechanism.
The text was updated successfully, but these errors were encountered: