API change: merger_context is gone

Totktonada · Totktonada · commit 7e1c36aac33f · 2019-05-07T20:19:15.000+03:00
Merger context was removed from the API.

A source and a merger don't more reallocate tuples: a source / a merger
did create a new tuple if stored tuple format differs from acquired one.
The idea was to ensure comparisons are fast. This however had its cost:
we were need to re-create a tuple with another format. Now a source / a
merger don't do that, but a user should ensure tuples has needed
offsets.

Added recommendations how to ensure that comparisons in a merge process
will be fast.
diff --git a/README.md b/README.md
@@ -38,10 +38,6 @@ if not index.unique then
     key_def_inst = key_def_inst:merge(key_def.new(space.index[0].parts))
 end
 
--- Create a merger context.
--- NB: It worth to cache it.
-local ctx = merger.context.new(key_def_inst)
-
 -- Prepare M sources.
 local sources = {}
 for _, conn in ipairs(connects) do
@@ -52,13 +48,13 @@ for _, conn in ipairs(connects) do
 end
 
 -- Merge.
-local merger_inst = merger.new(ctx, sources)
+local merger_inst = merger.new(key_def_inst, sources)
 local res = merger_inst:select()
 ```
 
 ## How to form key parts
 
-The merger expects that each input tuple stream is sorted in the order that
+The merger expects that each input tuple stream is sorted in the order that is
 acquired for a result (via key parts and the `reverse` flag). It performs a
 kind of the merge sort: chooses a source with a minimal / maximal tuple on each
 step, consumes a tuple from this source and repeats.
@@ -202,10 +198,10 @@ limit and GT iterator (with a key extracted from a last fetched tuple).
 Note: such way to implement a cursor / a pagination will work smoothly only
 with unique indexes. See also #3898.
 
-More complex scenarious are possible: using futures (`is_async = true`
-parameters of net.box methods) to fetch a next chunk while merge a current one
-or, say, call a function with several return values (some of them need to be
-skipped manually in a `gen` function to let merger read tuples).
+More complex scenarious are possible: using futures (`is_async = true` option
+of net.box methods) to fetch a next chunk while merge a current one or, say,
+call a function with several return values (some of them need to be skipped
+manually in a `gen` function to let merger read tuples).
 
 Note: When using `is_async = true` net.box option one can lean on the fact that
 net.box writes an answer w/o yield: a partial result cannot be observed.
@@ -250,6 +246,9 @@ indexes) and use vshard API on a client.
 -- See chunked_example_fast/frontend.lua.
 ```
 
+In this example we also cache key_def instances to reuse them for processing
+results from same space and index.
+
 ## Multiplexing requests
 
 Consider the case when a network latency between storage machines and frontend
@@ -261,7 +260,7 @@ one network request. We'll consider approach when a storage function returns
 many box.space.<...>:select(<...>) results instead of one.
 
 One need to skip iproto_data header, two array headers and then run a merger N
-times on the same buffers (with the same or different contexts). No extra data
+times on the same buffers (with the same or different key_defs). No extra data
 copies, no tuples decoding into a Lua memory.
 
 ```lua
@@ -278,7 +277,7 @@ copies, no tuples decoding into a Lua memory.
 
 ## Cascading mergers
 
-The idea is simple: a merger instance itself is a merger source.
+The idea is simple: a merger instance itself is a merge source.
 
 The example below is synthetic to be simple. Real cases when cascading can be
 profitable likely involve additional layers of Tarantool instances between a
@@ -291,7 +290,7 @@ behaviour for a source and a merger looks as the good property of the API.
 <...requires...>
 
 local sources = <...100 sources...>
-local ctx = merger.context.new(key_def.new(<...>))
+local key_def_inst = key_def.new(<...>)
 
 -- Create 10 mergers with 10 sources in each.
 local middleware_mergers = {}
@@ -300,10 +299,46 @@ for i = 1, 10 do
     for j = 1, 10 do
         current_sources[j] = sources[(i - 1) * 10 + j]
     end
-    middleware_mergers[i] = merger.new(ctx, current_sources)
+    middleware_mergers[i] = merger.new(key_def_inst, current_sources)
 end
 
--- Note: Using different contexts will lead to extra copying of
--- tuples.
-local res = merger.new(ctx, middleware_mergers):select()
+local res = merger.new(key_def_inst, middleware_mergers):select()
 ```
+
+## When comparisons are fast?
+
+### In short
+
+If tuples are from a local space and a key_def for a merger is created using
+parts of an index from the space (see the 'How to form key parts' section
+above), then comparisons will be fast (and no extra tuple creations occur).
+
+If tuples are received from net.box, stored into a buffer and created with a
+buffer source, then everything is okay too.
+
+When tuples are created from Lua tables comparisons will be fast too, but the
+case possibly means that extra work is performed to decode a tuple into a Lua
+table (say, in net.box) and then to encode it to a new tuple in a merge source.
+
+When tuples are created with `box.tuple.new()` comparisons likely will be slow.
+
+### In details
+
+First, some background information. Tuples can be created with different tuple
+formats. A format in particular defines which fields have precalculated offsets
+(these offsets are stored within a tuple). When there is a precalculated offset
+reading of the field is faster: it does not require to decode the whole msgpack
+data until the field. When a tuple is obtained from a space all indexed fields
+(all fields that are part of an index from this space) have offsets. When a
+tuple is created with `box.tuple.new(<...>)` it has no offsets.
+
+A merge source differs in a way how tuples are obtained. A buffer source always
+creates tuples itself. A tuple or a table source can pass existing tuples or
+create tuples from Lua tables.
+
+When a merger acquires a tuple from a source it pass a tuple format, which can
+be used to create a tuple. So when a tuple is created by a source, field
+accesses will be fast and so comparisons will be fast. When a tuple is passes
+through a source it is possible that it lacks some offsets and so comparisons
+can be slow. In this case it is a user responsibility to provide tuples with
+needed offsets if (s)he want to do merge faster.
diff --git a/chunked_example/frontend.lua b/chunked_example/frontend.lua
@@ -39,13 +39,12 @@ local conns = {
 
 local key_parts = conns[1].space.s.index.pk.parts
 local key_def_inst = key_def.new(key_parts)
-local ctx = merger.context.new(key_def_inst)
 local sources = {}
 for i, conn in ipairs(conns) do
     local param = {conn = conns[i], key_def = key_def_inst}
     sources[i] = merger.new_table_source(fetch_chunk, param, {})
 end
-local merger_inst = merger.new(ctx, sources)
+local merger_inst = merger.new(key_def_inst, sources)
 local res = merger_inst:select()
 print(yaml.encode(res))
 os.exit()
diff --git a/chunked_example_fast/frontend.lua b/chunked_example_fast/frontend.lua
@@ -3,24 +3,24 @@
 local buffer = require('buffer')
 local msgpack = require('msgpack')
 local vshard = require('vshard')
-local key_def = require('key_def')
+local key_def_lib = require('key_def')
 local merger = require('merger')
 local json = require('json')
 local yaml = require('yaml')
 local vshard_cfg = require('vshard_cfg')
 
-local merger_context_cache = {}
+local key_def_cache = {}
 
 -- XXX: Implement some cache clean up strategy and a way to manual
 -- cache purge.
-local function get_merger_context(space_name, index_name)
-    local merger_context
+local function get_key_def(space_name, index_name)
+    local key_def
 
     -- Get from the cache if exists.
-    if merger_context_cache[space_name] ~= nil then
-        merger_context = merger_context_cache[space_name][index_name]
-        if merger_context ~= nil then
-            return merger_context
+    if key_def_cache[space_name] ~= nil then
+        key_def = key_def_cache[space_name][index_name]
+        if key_def ~= nil then
+            return key_def
         end
     end
 
@@ -30,21 +30,18 @@ local function get_merger_context(space_name, index_name)
     local index = conn.space[space_name].index[index_name]
 
     -- Create a key def.
-    local key_def_inst = key_def.new(index.parts)
+    key_def = key_def_lib.new(index.parts)
     if not index.unique then
-        key_def_inst = key_def_inst:merge(key_def.new(primary_index.parts))
+        key_def = key_def_inst:merge(key_def_lib.new(primary_index.parts))
     end
 
-    -- Create a merger context.
-    merger_context = merger.context.new(key_def_inst)
-
     -- Write to the cache.
-    if merger_context_cache[space_name] == nil then
-        merger_context_cache[space_name] = {}
+    if key_def_cache[space_name] == nil then
+        key_def_cache[space_name] = {}
     end
-    merger_context_cache[space_name][index_name] = merger_context
+    key_def_cache[space_name][index_name] = key_def
 
-    return merger_context
+    return key_def
 end
 
 local function decode_metainfo(buf)
@@ -101,7 +98,7 @@ end
 
 local function mr_call(space_name, index_name, key, opts)
     local opts = opts or {}
-    local merger_context = get_merger_context(space_name, index_name)
+    local key_def = get_key_def(space_name, index_name)
     local call_args = {space_name, index_name, key, opts}
 
     -- Request a first data chunk and create merger sources.
@@ -126,7 +123,7 @@ local function mr_call(space_name, index_name, key, opts)
         table.insert(merger_sources, source)
     end
 
-    local merger_inst = merger.new(merger_context, merger_sources)
+    local merger_inst = merger.new(key_def, merger_sources)
     return merger_inst:select()
 end
 
diff --git a/multiplexed_example/frontend.lua b/multiplexed_example/frontend.lua
@@ -3,7 +3,7 @@
 local buffer = require('buffer')
 local msgpack = require('msgpack')
 local net_box = require('net.box')
-local key_def = require('key_def')
+local key_def_lib = require('key_def')
 local merger = require('merger')
 local yaml = require('yaml')
 
@@ -58,10 +58,10 @@ local conns = {
 }
 
 -- We lean on the fact that primary keys of all that spaces are
--- the same. Otherwise we would need to use different merger
--- context for each merge.
+-- the same. Otherwise we would need to use different key_defs for
+-- each merge.
 local key_parts = conns[1].space.a.index.pk.parts
-local ctx = merger.context.new(key_def.new(key_parts))
+local key_def = key_def_lib.new(key_parts)
 
 -- The idea modelled here is that we have requests for several
 -- spaces and acquire results in one net.box call.
@@ -76,7 +76,7 @@ local res = {}
 for _ = 1, #requests do
     -- Merge ith result from each storage. On the first step they
     -- are results from space 'a', one the second from 'b', etc.
-    local tuples = merger.new(ctx, sources):select()
+    local tuples = merger.new(key_def, sources):select()
     table.insert(res, tuples)
 end