Skip to content

Commit 7e1c36a

Browse files
committed
API change: merger_context is gone
Merger context was removed from the API. A source and a merger don't more reallocate tuples: a source / a merger did create a new tuple if stored tuple format differs from acquired one. The idea was to ensure comparisons are fast. This however had its cost: we were need to re-create a tuple with another format. Now a source / a merger don't do that, but a user should ensure tuples has needed offsets. Added recommendations how to ensure that comparisons in a merge process will be fast.
1 parent e65141b commit 7e1c36a

File tree

4 files changed

+74
-43
lines changed

4 files changed

+74
-43
lines changed

README.md

+52-17
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,6 @@ if not index.unique then
3838
key_def_inst = key_def_inst:merge(key_def.new(space.index[0].parts))
3939
end
4040

41-
-- Create a merger context.
42-
-- NB: It worth to cache it.
43-
local ctx = merger.context.new(key_def_inst)
44-
4541
-- Prepare M sources.
4642
local sources = {}
4743
for _, conn in ipairs(connects) do
@@ -52,13 +48,13 @@ for _, conn in ipairs(connects) do
5248
end
5349

5450
-- Merge.
55-
local merger_inst = merger.new(ctx, sources)
51+
local merger_inst = merger.new(key_def_inst, sources)
5652
local res = merger_inst:select()
5753
```
5854

5955
## How to form key parts
6056

61-
The merger expects that each input tuple stream is sorted in the order that
57+
The merger expects that each input tuple stream is sorted in the order that is
6258
acquired for a result (via key parts and the `reverse` flag). It performs a
6359
kind of the merge sort: chooses a source with a minimal / maximal tuple on each
6460
step, consumes a tuple from this source and repeats.
@@ -202,10 +198,10 @@ limit and GT iterator (with a key extracted from a last fetched tuple).
202198
Note: such way to implement a cursor / a pagination will work smoothly only
203199
with unique indexes. See also #3898.
204200

205-
More complex scenarious are possible: using futures (`is_async = true`
206-
parameters of net.box methods) to fetch a next chunk while merge a current one
207-
or, say, call a function with several return values (some of them need to be
208-
skipped manually in a `gen` function to let merger read tuples).
201+
More complex scenarious are possible: using futures (`is_async = true` option
202+
of net.box methods) to fetch a next chunk while merge a current one or, say,
203+
call a function with several return values (some of them need to be skipped
204+
manually in a `gen` function to let merger read tuples).
209205

210206
Note: When using `is_async = true` net.box option one can lean on the fact that
211207
net.box writes an answer w/o yield: a partial result cannot be observed.
@@ -250,6 +246,9 @@ indexes) and use vshard API on a client.
250246
-- See chunked_example_fast/frontend.lua.
251247
```
252248

249+
In this example we also cache key_def instances to reuse them for processing
250+
results from same space and index.
251+
253252
## Multiplexing requests
254253

255254
Consider the case when a network latency between storage machines and frontend
@@ -261,7 +260,7 @@ one network request. We'll consider approach when a storage function returns
261260
many box.space.<...>:select(<...>) results instead of one.
262261

263262
One need to skip iproto_data header, two array headers and then run a merger N
264-
times on the same buffers (with the same or different contexts). No extra data
263+
times on the same buffers (with the same or different key_defs). No extra data
265264
copies, no tuples decoding into a Lua memory.
266265

267266
```lua
@@ -278,7 +277,7 @@ copies, no tuples decoding into a Lua memory.
278277

279278
## Cascading mergers
280279

281-
The idea is simple: a merger instance itself is a merger source.
280+
The idea is simple: a merger instance itself is a merge source.
282281

283282
The example below is synthetic to be simple. Real cases when cascading can be
284283
profitable likely involve additional layers of Tarantool instances between a
@@ -291,7 +290,7 @@ behaviour for a source and a merger looks as the good property of the API.
291290
<...requires...>
292291

293292
local sources = <...100 sources...>
294-
local ctx = merger.context.new(key_def.new(<...>))
293+
local key_def_inst = key_def.new(<...>)
295294

296295
-- Create 10 mergers with 10 sources in each.
297296
local middleware_mergers = {}
@@ -300,10 +299,46 @@ for i = 1, 10 do
300299
for j = 1, 10 do
301300
current_sources[j] = sources[(i - 1) * 10 + j]
302301
end
303-
middleware_mergers[i] = merger.new(ctx, current_sources)
302+
middleware_mergers[i] = merger.new(key_def_inst, current_sources)
304303
end
305304

306-
-- Note: Using different contexts will lead to extra copying of
307-
-- tuples.
308-
local res = merger.new(ctx, middleware_mergers):select()
305+
local res = merger.new(key_def_inst, middleware_mergers):select()
309306
```
307+
308+
## When comparisons are fast?
309+
310+
### In short
311+
312+
If tuples are from a local space and a key_def for a merger is created using
313+
parts of an index from the space (see the 'How to form key parts' section
314+
above), then comparisons will be fast (and no extra tuple creations occur).
315+
316+
If tuples are received from net.box, stored into a buffer and created with a
317+
buffer source, then everything is okay too.
318+
319+
When tuples are created from Lua tables comparisons will be fast too, but the
320+
case possibly means that extra work is performed to decode a tuple into a Lua
321+
table (say, in net.box) and then to encode it to a new tuple in a merge source.
322+
323+
When tuples are created with `box.tuple.new()` comparisons likely will be slow.
324+
325+
### In details
326+
327+
First, some background information. Tuples can be created with different tuple
328+
formats. A format in particular defines which fields have precalculated offsets
329+
(these offsets are stored within a tuple). When there is a precalculated offset
330+
reading of the field is faster: it does not require to decode the whole msgpack
331+
data until the field. When a tuple is obtained from a space all indexed fields
332+
(all fields that are part of an index from this space) have offsets. When a
333+
tuple is created with `box.tuple.new(<...>)` it has no offsets.
334+
335+
A merge source differs in a way how tuples are obtained. A buffer source always
336+
creates tuples itself. A tuple or a table source can pass existing tuples or
337+
create tuples from Lua tables.
338+
339+
When a merger acquires a tuple from a source it pass a tuple format, which can
340+
be used to create a tuple. So when a tuple is created by a source, field
341+
accesses will be fast and so comparisons will be fast. When a tuple is passes
342+
through a source it is possible that it lacks some offsets and so comparisons
343+
can be slow. In this case it is a user responsibility to provide tuples with
344+
needed offsets if (s)he want to do merge faster.

chunked_example/frontend.lua

+1-2
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,12 @@ local conns = {
3939

4040
local key_parts = conns[1].space.s.index.pk.parts
4141
local key_def_inst = key_def.new(key_parts)
42-
local ctx = merger.context.new(key_def_inst)
4342
local sources = {}
4443
for i, conn in ipairs(conns) do
4544
local param = {conn = conns[i], key_def = key_def_inst}
4645
sources[i] = merger.new_table_source(fetch_chunk, param, {})
4746
end
48-
local merger_inst = merger.new(ctx, sources)
47+
local merger_inst = merger.new(key_def_inst, sources)
4948
local res = merger_inst:select()
5049
print(yaml.encode(res))
5150
os.exit()

chunked_example_fast/frontend.lua

+16-19
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,24 @@
33
local buffer = require('buffer')
44
local msgpack = require('msgpack')
55
local vshard = require('vshard')
6-
local key_def = require('key_def')
6+
local key_def_lib = require('key_def')
77
local merger = require('merger')
88
local json = require('json')
99
local yaml = require('yaml')
1010
local vshard_cfg = require('vshard_cfg')
1111

12-
local merger_context_cache = {}
12+
local key_def_cache = {}
1313

1414
-- XXX: Implement some cache clean up strategy and a way to manual
1515
-- cache purge.
16-
local function get_merger_context(space_name, index_name)
17-
local merger_context
16+
local function get_key_def(space_name, index_name)
17+
local key_def
1818

1919
-- Get from the cache if exists.
20-
if merger_context_cache[space_name] ~= nil then
21-
merger_context = merger_context_cache[space_name][index_name]
22-
if merger_context ~= nil then
23-
return merger_context
20+
if key_def_cache[space_name] ~= nil then
21+
key_def = key_def_cache[space_name][index_name]
22+
if key_def ~= nil then
23+
return key_def
2424
end
2525
end
2626

@@ -30,21 +30,18 @@ local function get_merger_context(space_name, index_name)
3030
local index = conn.space[space_name].index[index_name]
3131

3232
-- Create a key def.
33-
local key_def_inst = key_def.new(index.parts)
33+
key_def = key_def_lib.new(index.parts)
3434
if not index.unique then
35-
key_def_inst = key_def_inst:merge(key_def.new(primary_index.parts))
35+
key_def = key_def_inst:merge(key_def_lib.new(primary_index.parts))
3636
end
3737

38-
-- Create a merger context.
39-
merger_context = merger.context.new(key_def_inst)
40-
4138
-- Write to the cache.
42-
if merger_context_cache[space_name] == nil then
43-
merger_context_cache[space_name] = {}
39+
if key_def_cache[space_name] == nil then
40+
key_def_cache[space_name] = {}
4441
end
45-
merger_context_cache[space_name][index_name] = merger_context
42+
key_def_cache[space_name][index_name] = key_def
4643

47-
return merger_context
44+
return key_def
4845
end
4946

5047
local function decode_metainfo(buf)
@@ -101,7 +98,7 @@ end
10198

10299
local function mr_call(space_name, index_name, key, opts)
103100
local opts = opts or {}
104-
local merger_context = get_merger_context(space_name, index_name)
101+
local key_def = get_key_def(space_name, index_name)
105102
local call_args = {space_name, index_name, key, opts}
106103

107104
-- Request a first data chunk and create merger sources.
@@ -126,7 +123,7 @@ local function mr_call(space_name, index_name, key, opts)
126123
table.insert(merger_sources, source)
127124
end
128125

129-
local merger_inst = merger.new(merger_context, merger_sources)
126+
local merger_inst = merger.new(key_def, merger_sources)
130127
return merger_inst:select()
131128
end
132129

multiplexed_example/frontend.lua

+5-5
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
local buffer = require('buffer')
44
local msgpack = require('msgpack')
55
local net_box = require('net.box')
6-
local key_def = require('key_def')
6+
local key_def_lib = require('key_def')
77
local merger = require('merger')
88
local yaml = require('yaml')
99

@@ -58,10 +58,10 @@ local conns = {
5858
}
5959

6060
-- We lean on the fact that primary keys of all that spaces are
61-
-- the same. Otherwise we would need to use different merger
62-
-- context for each merge.
61+
-- the same. Otherwise we would need to use different key_defs for
62+
-- each merge.
6363
local key_parts = conns[1].space.a.index.pk.parts
64-
local ctx = merger.context.new(key_def.new(key_parts))
64+
local key_def = key_def_lib.new(key_parts)
6565

6666
-- The idea modelled here is that we have requests for several
6767
-- spaces and acquire results in one net.box call.
@@ -76,7 +76,7 @@ local res = {}
7676
for _ = 1, #requests do
7777
-- Merge ith result from each storage. On the first step they
7878
-- are results from space 'a', one the second from 'b', etc.
79-
local tuples = merger.new(ctx, sources):select()
79+
local tuples = merger.new(key_def, sources):select()
8080
table.insert(res, tuples)
8181
end
8282

0 commit comments

Comments
 (0)