Implement a distributed evolutionary algorithm to evolve query answers #216

andre-senna · 2025-01-27T18:30:18Z

The idea is to provide an API in DAS to evolve answers to a query using an evolutionary algorithm.

There are a lot of evolutionary algorithms in the literature (see this wikipedia topic) largely used as optimization techniques for a number of different problems. The algorithm we propose here is also an optimization algorithm that can be used to evolve query answers that best match a given query.

Why evolving a query answer instead of just processing the query?

Pattern Matching queries may have millions or billions of answers. Sometimes we're interested in just a few particular answers that satisfy a given criteria but in order to find them we'd need to iterate through (potentially) all the query answers in order to find them.

With an optimization algorithm, we use the given criteria as fitness function in a way to guide the search through the space of all possible query answers with the goal of finding answers with high fitness value without the need to iterate through all possibilities.

How to use users' criteria to select query answers?

The user will pass a query and will provide a function to evaluate a given query answer for this query. But how we can use this fitness function to select query answers?

The more intuitive way of doing this is to process the query, evaluate all its results using the fitness functions and sort them before delivering the iterator to the user. So the user would iterate through query answers from the one with highest fitness value to the one with lowest. But this is bad because we'd need to visit all the query answers which is exactly what we need to avoid.

Hence, the idea is to sort query answers in advance somehow in order to visit only the most promising ones.

DAS already have means to do this: the Attention Broker. It uses atoms' importance value in a given context in order to sort the answers of a query before delivering them to the caller. However, Attention Broker uses solely atoms' importance as sorting criteria and atoms' importance is related to the frequency which each atom have been relevant to previous queries in the same context, it isn't related to any specific fitness measure.

So the idea here is to provide a way to use the "feedback" we implicitly get from user regarding the quality of each query answer (i.e. the passed fitness function) in order to stimulate the importance of atoms in such a way that better answers' components (i.e. atoms) receive importance boosts, making the better answers being more likely of being selected earlier.

The algorithm

The algorithm we propose here is inspired by BOA: the Bayesian Optimization Algorithm in the sense that it is an evolutionary algorithm which doesn't actually mix individuals from one generation to the next using genetic operators (e.g. crossover, mutation etc) but, instead, it samples the individuals of the population using probabilistic criteria at the beginning of each new generation.

We don't use anything like a Bayesian network, as in BOA. We just use the attention allocation performed by the Attention Broker (which uses Hebbian networks) and bias this allocation by stimuli spreading based on the quality of each query answer from one generation to the next. Here's a high level description of the algorithm.

Input:

Q: Pattern matching query
F: Fitness function mapping from QueryAnswer objects to numbers in [0, 1] (0: useless answer, 1: best possible answer)
N: Number of QueryAnswers to be returned

Output:

An iterator to the best N distinct QueryAnswers found by the algorithm

Determine evolution parameters
a. P - Population size
b. M - Number of individuals selected from population at each generation in order to update Attention Broker before the next generation is selected.
c. Stop criteria (number of generations, target fitness value, minimal evolution in last n cycles, etc)
Sample a population Population[] by executing Q and iterating P times in the result.
Evaluate F(Population[i]) for all individual Population[I].
Select M individuals from current population using their fitness value in any selection method (best M, roulette, tournament, etc)
Use selected individuals to update AttentionBroker.
Iterate steps 2., 3., 4. and 5. until Stop criteria is match. While Iterating, keep record of the best N QueryAnswers.
Return an iterator to the best N QueryAnswers recorded during the evolution process.
In the first generation, we select the first P QueryAnswers to compose our initial population. At this point, we rely on the current relative importance of the atoms stored in the AttentionBroker.

Distributed version of the algorithm.

We propose to implement a distributed version of the algorithm using S servers.

One of the servers (referred to as the leader) submit Q to the Query Agent, asking it to to generate a key that will be used by each of the S servers to iterate through different results of Q.
Each server starts a new generation by sampling its local population using QueryAnswer objects for Q. Because of the way the query were submitted, each server will get different QueryAnswer objects meaning that local population in each server has ZERO intersection with population in other servers.
Each server computes the fitness function for all individuals in the local population. As fitness values are computed, each server shares their individuals (and their respective fitness value) with the other servers.
When all individuals of the population have a fitness value, the leader, which knows all individuals from ll servers, select M individuals using any selection method.
Selected individuals are used to spread stimuli in AttentionBroker by counting all handles in their respective QueryAnswers and passing this handle count to Attention Broker in a STIMULI request. When doing this request, we also need to define that EVALUATION, IMPLICATION and EQUIVALENCE links are also supposed to be used as Hebbian links.
Steps 1. to 5. are repeated until stop criteria is match.

A state machine illustrating the behavior of each node in the algorithm is presented below.

Suggested implementation breakdown

I suggest implementing this card step-by-step in the following order (order is important because it reflects our team's priorities)

Implement non-distributed version of the algorithm
Implement the node control algorithm separately (not integrated with the algorithm above). Here, we can assume a simpler scenario where processing begins with a fixed list of nodes and these nodes never fail. Also, no new node joins after processing begins.
Integrate the algorithm in the distributed nodes using the distributed version of the algorithm described above.
Implement realistic cases in the distributed network. Processing starts with a single node (leader) which assumes it's supposed to do all the job. During the processing, other nodes join and are integrated in the network receiving their shares of computation. Once joined, nodes are reliable and never fail.
Implement fail-proof measurements for non-leader nodes. The distributed algorithm must consider the possibility of one or more nodes fail during the processing. Failed nodes may eventually re-join.
Implement fail-proof measurements for the case where the leader may also fail.

The text was updated successfully, but these errors were encountered:

andre-senna changed the title ~~Implement a basic sketch of a distributed evolutionary algorithm to evolve query answers~~ Implement a distributed evolutionary algorithm to evolve query answers Jan 28, 2025

andre-senna assigned marcocapozzoli Jan 28, 2025

andre-senna added this to the Distributed AI Algorithm milestone Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a distributed evolutionary algorithm to evolve query answers #216

Implement a distributed evolutionary algorithm to evolve query answers #216

andre-senna commented Jan 27, 2025 •

edited

Loading

Implement a distributed evolutionary algorithm to evolve query answers #216

Implement a distributed evolutionary algorithm to evolve query answers #216

Comments

andre-senna commented Jan 27, 2025 • edited Loading

Why evolving a query answer instead of just processing the query?

How to use users' criteria to select query answers?

The algorithm

Distributed version of the algorithm.

Suggested implementation breakdown

andre-senna commented Jan 27, 2025 •

edited

Loading