First draft of 0004-greenstand-search-engine by kparikh9 · Pull Request #7 · Greenstand/treetracker-decisions

kparikh9 · 2022-05-19T00:36:23Z

mckornfield · 2022-05-20T23:01:25Z

search_engine.md

+
+* ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment)
+
+## Considered Options


Did you consider sphinx? https://stackshare.io/stackups/lucene-vs-sphinx Great for docs search I know

Or maybe something from this list? https://www.educba.com/elasticsearch-alternatives/

Thanks for the links, @mckornfield! I'll check these out and see if any fit better than ES

mckornfield · 2022-05-20T23:02:14Z

search_engine.md

+
+* Steep learning curve?
+* Requires more experimentation on what architecture is the best for Greenstand's use case (i.e. search over multiple indexes vs. one index)
+* Heavy memory usage (requires 4.0 GB RAM just for ElasticSearch, probably more for Kibana and Logstash) - can be expensive since it requires larger compute servers and this would need to remain on at all times.


Definitely costs a lot as far as resources. Also there's no good auth support in the free versions of ELK

For ELK, I believe we can set up service accounts to request and use tokens for authorization to pass requests to the Elastic cluster https://www.elastic.co/guide/en/elasticsearch/reference/current/token-authentication-services.html. This don't seem to be limited to Elastic Cloud (which is just a managed-deployment of the ELK stack)

mckornfield · 2022-05-20T23:02:41Z

search_engine.md

+
+## Considered Options
+
+* ElasticSearch


Did you spike these with a sample dataset?

Yes, I took about 20 rows from the public.planters and public.trees tables and all the rows from the public.organizations table in the treetracker database. I tested autocomplete/search hinting queries on three separate indexes (1 for each table) and on one single index that contained all three types of data rows (planters, trees, organizations).

ZavenArra · 2022-05-20T21:59:55Z

search_engine.md

+
+## Decision Drivers
+
+* ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment)


So we just got rid of our ELK stack, which we were using for consolidated logging of microservices. It was a very difficult to manage for the current cloud team and having it deployed into our cluster. I presume we would not need the whole ELK stack to achieve what you are looking to do here? Kibana really stressed our cloud resources. However, maybe there is a more stripped down deployment option that would meet your use case.

ZavenArra · 2022-05-20T22:00:36Z

search_engine.md

+
+* ElasticSearch
+* Apache Solr
+* Apache Lucene


Can you say more about why the Apache projects were not chosen? I don't have experience with either, but I do know that CKAN (our chose data portal) uses Solr.

ZavenArra · 2022-06-07T22:27:25Z

@kparikh9 We generally seek to pursue build before buy and self management of our application platform, however it seems like you have a quick solution here that adds some nice value. I am falling at cautious support for this plan, but I'd like to ask that we incorporate into this ADR a little longer range thinking for bringing the search engine into our cloud, without using Kibana in the future. I think if the philosophy at the start of this paragraph and the longer term plan to in-house the solution are both articulated in the ADR, I would be happy to support and accept this decision.

dadiorchen · 2022-06-30T09:04:58Z

@kparikh9 sorry for the delay, do you want to also try a bit Solr, I deployed a small node with Solr, it seems pretty interesting: https://dev-k8s.treetracker.org/search/solr/#/mycoll/query?q=publisher_s:*am*&q.op=OR&indent=true

dadiorchen · 2022-06-30T09:15:42Z

I think Solr is more suitable for our case, IMO, because

Our goal

Our main goal here is to do full-text search, search planter info, species, org, and others, (and beable to search crossing fields) also, autocompletion, both Solr and ES can do the job, but Solr is a more dedicated search engine with advanced features (ES is more focused on log analysis I think), as the creator of the ES admits:

Solr is also a solution for exposing an indexing/search server over HTTP, but I would argue that ElasticSearch provides a much superior distributed model and ease of use (though currently lacking on some of the search features, but not for long, and in any case, the plan is to get all Compass features into ElasticSearch)

(source: https://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage)

Here is another opinion:

Solr has more advantages when it comes to the static data, because of its caches and the ability to use an uninverted reader for faceting and sorting – for example, e-commerce. On the other hand, Elasticsearch is better suited – and much more frequently used – for timeseries data use cases, like log analysis use cases.

I think these two has different focus and use case.

Our scale

Because our goal is to index all Greenstand content, I think the scale of the data is not super huge, I don't think we need a super scalable, distributed solution which ES is good at, but the cost is the maintenance and complexity.

Open source

Solr is more open source than ES.

docs: added proposal for search engine

8280847

mckornfield reviewed May 20, 2022

View reviewed changes

ZavenArra reviewed Jun 7, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First draft of 0004-greenstand-search-engine#7

First draft of 0004-greenstand-search-engine#7
kparikh9 wants to merge 1 commit intoGreenstand:mainfrom
kparikh9:0004-greenstand-search-engine

kparikh9 commented May 19, 2022

Uh oh!

mckornfield May 20, 2022

Uh oh!

mckornfield May 20, 2022

Uh oh!

kparikh9 May 24, 2022 •

edited

Loading

Uh oh!

mckornfield May 20, 2022

Uh oh!

kparikh9 May 24, 2022

Uh oh!

mckornfield May 20, 2022

Uh oh!

kparikh9 May 24, 2022

Uh oh!

ZavenArra May 20, 2022

Uh oh!

ZavenArra May 20, 2022

Uh oh!

ZavenArra commented Jun 7, 2022

Uh oh!

dadiorchen commented Jun 30, 2022

Uh oh!

dadiorchen commented Jun 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		* ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment)

		## Considered Options


		## Decision Drivers

		* ElasticSearch - can integrate well with other products of the Elastic Stack like Kibana, Logstash. Easiest to experiment with, since there are free trials available for Elastic Cloud (managed ElasticSearch deployment)

Conversation

kparikh9 commented May 19, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kparikh9 May 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZavenArra commented Jun 7, 2022

Uh oh!

dadiorchen commented Jun 30, 2022

Uh oh!

dadiorchen commented Jun 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kparikh9 May 24, 2022 •

edited

Loading