First draft of 0004-greenstand-search-engine#7
First draft of 0004-greenstand-search-engine#7kparikh9 wants to merge 1 commit intoGreenstand:mainfrom
Conversation
|
|
||
| * ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment) | ||
|
|
||
| ## Considered Options |
There was a problem hiding this comment.
Did you consider sphinx? https://stackshare.io/stackups/lucene-vs-sphinx Great for docs search I know
There was a problem hiding this comment.
Or maybe something from this list? https://www.educba.com/elasticsearch-alternatives/
There was a problem hiding this comment.
Thanks for the links, @mckornfield! I'll check these out and see if any fit better than ES
|
|
||
| * Steep learning curve? | ||
| * Requires more experimentation on what architecture is the best for Greenstand's use case (i.e. search over multiple indexes vs. one index) | ||
| * Heavy memory usage (requires 4.0 GB RAM just for ElasticSearch, probably more for Kibana and Logstash) - can be expensive since it requires larger compute servers and this would need to remain on at all times. No newline at end of file |
There was a problem hiding this comment.
Definitely costs a lot as far as resources. Also there's no good auth support in the free versions of ELK
There was a problem hiding this comment.
For ELK, I believe we can set up service accounts to request and use tokens for authorization to pass requests to the Elastic cluster https://www.elastic.co/guide/en/elasticsearch/reference/current/token-authentication-services.html. This don't seem to be limited to Elastic Cloud (which is just a managed-deployment of the ELK stack)
|
|
||
| ## Considered Options | ||
|
|
||
| * ElasticSearch |
There was a problem hiding this comment.
Did you spike these with a sample dataset?
There was a problem hiding this comment.
Yes, I took about 20 rows from the public.planters and public.trees tables and all the rows from the public.organizations table in the treetracker database. I tested autocomplete/search hinting queries on three separate indexes (1 for each table) and on one single index that contained all three types of data rows (planters, trees, organizations).
|
|
||
| ## Decision Drivers | ||
|
|
||
| * ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment) |
There was a problem hiding this comment.
So we just got rid of our ELK stack, which we were using for consolidated logging of microservices. It was a very difficult to manage for the current cloud team and having it deployed into our cluster. I presume we would not need the whole ELK stack to achieve what you are looking to do here? Kibana really stressed our cloud resources. However, maybe there is a more stripped down deployment option that would meet your use case.
|
|
||
| * ElasticSearch | ||
| * Apache Solr | ||
| * Apache Lucene |
There was a problem hiding this comment.
Can you say more about why the Apache projects were not chosen? I don't have experience with either, but I do know that CKAN (our chose data portal) uses Solr.
|
@kparikh9 We generally seek to pursue build before buy and self management of our application platform, however it seems like you have a quick solution here that adds some nice value. I am falling at cautious support for this plan, but I'd like to ask that we incorporate into this ADR a little longer range thinking for bringing the search engine into our cloud, without using Kibana in the future. I think if the philosophy at the start of this paragraph and the longer term plan to in-house the solution are both articulated in the ADR, I would be happy to support and accept this decision. |
|
@kparikh9 sorry for the delay, do you want to also try a bit Solr, I deployed a small node with Solr, it seems pretty interesting: https://dev-k8s.treetracker.org/search/solr/#/mycoll/query?q=publisher_s:*am*&q.op=OR&indent=true |
|
I think Solr is more suitable for our case, IMO, because
Our main goal here is to do full-text search, search planter info, species, org, and others, (and beable to search crossing fields) also, autocompletion, both Solr and ES can do the job, but Solr is a more dedicated search engine with advanced features (ES is more focused on log analysis I think), as the creator of the ES admits:
Here is another opinion:
I think these two has different focus and use case.
Because our goal is to index all Greenstand content, I think the scale of the data is not super huge, I don't think we need a super scalable, distributed solution which ES is good at, but the cost is the maintenance and complexity.
Solr is more open source than ES. |
CC: @dadiorchen