Skip to content

Commit

Permalink
feat: read_span_flatbuffer: support cleaning up spare examples correctly
Browse files Browse the repository at this point in the history
Consumers of VW as a library can provide their own event pools, etc. Previous parsers were always able to predict when an even would be needed ahead of time, so would only allocate when necessary. This was done by relying on a single incoming event preallocation to let the external host deallocate in the case of nothing to be parsed.

This does not work for the FB parser due to how it handles re-entrancy, and we do not want to spend the time re-architecting it to avoid this. The fix, in this case, is to expand the API to include a callback to return spare events back to the host's event pool.
  • Loading branch information
lokitoth committed Feb 14, 2024
1 parent 7c2963e commit 87ed9a5
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,17 @@ namespace VW

class api_status;

using example_sink_f = std::function<void(VW::multi_ex&& spare_examples)>;

namespace parsers
{
namespace flatbuffer
{
int flatbuffer_to_examples(VW::workspace* all, io_buf& buf, VW::multi_ex& examples);


bool read_span_flatbuffer(
VW::workspace* all, const uint8_t* span, size_t length, example_factory_t example_factory, VW::multi_ex& examples);
VW::workspace* all, const uint8_t* span, size_t length, example_factory_t example_factory, VW::multi_ex& examples, example_sink_f example_sink = nullptr);

class parser
{
Expand Down
24 changes: 20 additions & 4 deletions vowpalwabbit/fb_parser/src/parse_example_flatbuffer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,12 @@ int flatbuffer_to_examples(VW::workspace* all, io_buf& buf, VW::multi_ex& exampl
return static_cast<int>(status.get_error_code() == VW::experimental::error_code::success);
}

bool read_span_flatbuffer(
VW::workspace* all, const uint8_t* span, size_t length, example_factory_t example_factory, VW::multi_ex& examples)
bool read_span_flatbuffer(VW::workspace* all, const uint8_t* span, size_t length, example_factory_t example_factory,
VW::multi_ex& examples, example_sink_f example_sink)
{
int a = 0;
a++;

// we expect context to contain a size_prefixed flatbuffer (technically a binary string)
// which means:
//
Expand Down Expand Up @@ -84,7 +87,18 @@ bool read_span_flatbuffer(
}

VW::multi_ex temp_ex;
temp_ex.push_back(&example_factory());

// There is a bit of unhappiness with the interface of the read_XYZ_<format>() functions, because they often
// expect the input multi_ex to have a single "empty" example there. This contributes, in part, to the large
// proliferation of entry points into the JSON parser(s). We want to avoid exposing that insofar as possible,
// so we will check whether we already received a perfectly good example and use that, or create a new one if
// needed.
if (examples.size() > 0)
{
temp_ex.push_back(examples[0]);
examples.pop_back();

Check warning on line 99 in vowpalwabbit/fb_parser/src/parse_example_flatbuffer.cc

View check run for this annotation

Codecov / codecov/patch

vowpalwabbit/fb_parser/src/parse_example_flatbuffer.cc#L98-L99

Added lines #L98 - L99 were not covered by tests
}
else { temp_ex.push_back(&example_factory()); }

bool has_more = true;
VW::experimental::api_status status;
Expand Down Expand Up @@ -113,7 +127,9 @@ bool read_span_flatbuffer(
}
} while (has_more);

VW::finish_example(*all, temp_ex);
if (example_sink == nullptr) { VW::finish_example(*all, temp_ex); }
else { example_sink(std::move(temp_ex)); }

Check warning on line 131 in vowpalwabbit/fb_parser/src/parse_example_flatbuffer.cc

View check run for this annotation

Codecov / codecov/patch

vowpalwabbit/fb_parser/src/parse_example_flatbuffer.cc#L131

Added line #L131 was not covered by tests

return true;
}

Expand Down

0 comments on commit 87ed9a5

Please sign in to comment.