-
Notifications
You must be signed in to change notification settings - Fork 37
Fix transaction on_close and Java and Python block on close() #792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WIP Finally fixed on_close callback - threading issue Convert close to promise and resolve it in Python and Java Add Java onclose callback test Cleanup test Add mutex
type: string | ||
steps: | ||
- run: | | ||
brew install [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes mac build issues
- deploy-snapshot-mac-x86_64: | ||
filters: | ||
branches: | ||
only: [development, master, "3.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
/// Closes the transaction and frees the native rust object. | ||
#[no_mangle] | ||
pub extern "C" fn transaction_close(txn: *mut Transaction) { | ||
pub extern "C" fn transaction_submit_close(txn: *mut Transaction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New paradigm i chose:
close()
returns a promise and does not take ownership.
We also have a submit_close()
, which a fire-and-forget equivalent that is used by drop
since we can't block or resolve promises there.
Note: we still use force_close
on driver for server connections. Didn't change it there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitrii-ubskii @farost important change!
* under the License. | ||
*/ | ||
|
||
%module(threads=1) typedb_driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key point to make the python on_close callbacks work - segfaults are caused by multiple threads interacting. I think this makes SWIG interact with Python's GIL correctly
#include <iostream> | ||
#include <mutex> | ||
#include <unordered_map> | ||
static std::unordered_map<size_t, TransactionCallbackDirector*> transactionOnCloseCallbacks {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This had a race condition if the driver calls a callback at the same time as a user adding one to a new transaction.
ThreadSafeTransactionCallbacks(const ThreadSafeTransactionCallbacks&) = delete; | ||
ThreadSafeTransactionCallbacks& operator=(const ThreadSafeTransactionCallbacks&) = delete; | ||
|
||
// --- Core Operations --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this was AI generated
%inline %{ | ||
#include <atomic> | ||
void transaction_on_close_register(const Transaction* transaction, TransactionCallbackDirector* handler) { | ||
VoidPromise* transaction_on_close_register(const Transaction* transaction, TransactionCallbackDirector* handler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_close
callback registration also returns a promise, since we're not guaranteed that the on_close callback is registered when this returns.
|
||
void Transaction::close() { | ||
if (transactionNative != nullptr) _native::transaction_close(transactionNative.release()); | ||
void Transaction::submitClose() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably irrelevant to do this in C++ tbh
TransactionOnClose callback = new TransactionOnClose(function); | ||
callbacks.add(callback); | ||
transaction_on_close(nativeObject, callback.released()); | ||
transaction_on_close(nativeObject, callback.released()).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java blocks and resolves the promises for both on_close() and for close() for ease of use/so users don't forget to do it. I think in Rust people are more familiar with awaiting futures when they need to & the tooling around it is better
) | ||
|
||
typedb_java_test( | ||
name = "test-driver", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new place to test driver impl as an integration test
) | ||
|
||
py_test( | ||
name = "test_driver", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python integration test location
def on_close(self, function: callable): | ||
transaction_on_close(self.native_object, _Transaction.TransactionOnClose(function).__disown__()) | ||
callback = _Transaction.TransactionOnClose(function) | ||
void_promise_resolve(transaction_on_close(self.native_object, callback.__disown__())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python also actively resolves on_close and close() for the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitrii-ubskii @farost important to know in the new settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Java's the same)
is_open: Arc<AtomicCell<bool>>, | ||
error: Arc<RwLock<Option<Error>>>, | ||
on_close_register_sink: UnboundedSender<Box<dyn FnOnce(Option<Error>) + Send + Sync>>, | ||
on_close_register_sink: UnboundedSender<(Box<dyn FnOnce(Option<Error>) + Send + Sync>, UnboundedSender<()>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awaitable notifier to know when the on_close is guaranteed to be registered
impl Drop for TransactionTransmitter { | ||
fn drop(&mut self) { | ||
self.force_close(); | ||
self.submit_close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of new submit_close fire and forget drop
|
||
pub(in crate::connection) fn force_close(&self) { | ||
#[cfg(not(feature = "sync"))] | ||
pub(in crate::connection) fn close(&self) -> impl Promise<'_, Result<()>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close() now returns a promise. This is a bit ugly but probably ok?
Whole thing inside needs to be in the future so that it either all runs when awaited or not at all
let close_notifier_callback = Box::new(move |error| { | ||
closed_sink.send(()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda a hack but honestly might be ok
pub(in crate::connection) fn on_close( | ||
&self, | ||
callback: impl FnOnce(Option<Error>) + Send + Sync + 'static, | ||
) -> impl Promise<'_, Result<()>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_close is also implemented as a promise now
move || { | ||
Self::dispatch_loop(queue_source, request_sink, collector, on_close_callback_source, shutdown_signal) | ||
} | ||
move || Self::sync_dispatch_loop(queue_source, request_sink, collector, shutdown_signal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we just have to get the on_close_callback into the Collector in some way, we can choose whether we lsiten to the channel in the async listen loop or the sync dispatch loop. To minimize waits, we're going to let Tokio manage it in the Async loop.
driver_options.tls_config().clone().expect("TLS config object must be set when TLS is enabled"); | ||
builder = builder.tls_config(tls_config)?; | ||
} | ||
builder = builder.http2_keep_alive_interval(Duration::from_secs(10)).keep_alive_while_idle(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is a bit of mystery:
without it (this was AI suggested), it appears that server responses (in particular, the transaction open response) never gets delivered into our code.
Ie. we see:
Driver: send open transaction request
Server: receive open txn request OpenTransaction.Req
Server: open txn, response with OpenTransaction.Res
(confirmed the above with wireshark).
The client side actually receives _something (from stub.rs):
this
.grpc
.transaction(UnboundedReceiverStream::new(receiver))
when awaited, this actually returns a stream successfully -- however, the OpenTransaction.Res
message doesn't arrive until "something else" happens, such as the stream closing because of a transaction timeout...
It's very strange but since this solves it and it's sucked up a ton of time, I'm going to leave this in.
request = request_source.recv() => request, | ||
_ = shutdown_signal.recv() => None, | ||
} { | ||
trace!("RPC dispatcher loop received request {:?}", request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to see everything happening, on TRACE we'll basically see every message in and out
self.transaction_transmitter.on_close(callback) | ||
} | ||
|
||
pub(crate) fn close(&self) -> impl Promise<'_, Result<()>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's a question: why is it ok for close() to be a true promise, while the variants below for commit and rollback are resolved and re-emitted as promises? We should probably pick 1 approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wait for answers from the server in the variants below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And probably convert them
/// | ||
/// The logging is initialized only once using a static flag to prevent | ||
/// multiple initializations in applications that create multiple drivers. | ||
pub fn init_logging() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New feature: i added some basic but also easily extensible logging throughout. We can configure it at runtime with environment variables!
})) | ||
.await; | ||
|
||
transaction.close().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify: if we await the close() future, we have guaranteed all thea llocated callbacks are executed.
Usage and product changes
We notice that calling
transaction.close()
does not wait until the server has freed up resource. This makes quick sequences, such as tests where transactions open and are followed by database deletes, unreliable. Further investigation that workarounds using the existingon_close
callbacks in Python and Java caused segfaults. We fix both:Transaction.close()
in Python and Java now blocks for 1 round trip. In Rust, this now returns a promise/future. In Java/Python, we pick the most relevant default and resolve the promise from Java/Python..on_close
callbacks to transactions.We also fix nondeterministic errors:
on_close
callbacks must return a promise, since the implementation injects the callback into our lowest-level listener loop which may register the callback later. Not awaiting theon_close()
registration will lead to hit or miss execution of the callback when registering on_close callbacks, not awaiting, and then closing the transaction immediatelykeepalive
to the channel, without which messages sometimes get "stuck" on the client-side receiving end of responses from the server. No further clues found as to why this happens. See comments for more detail.We also add one major feature enhancement: configurable logging. All logging should now go through the
tracing
crate. We can configure logging levels for just the driver library with theTYPEDB_DRIVER_LOG
or generalRUST_LOG
environment variables. By default we set it toinfo
.Implementation
Fix and enhance on_close callbacks:
i
layer!Make
close()
a promise in Rust, which can be awaited, and a blocking operation in Java and Python, which awaits a signal from the server that the transaction is actually closed and the resources are freed up.We add on_close callback integration tests for Python, Java, and Rust
add
keepalive
to the channel, which prevents some nondeterministic message delays/delivery failures.Further notes
Mysterious lost responses
It appears that server responses (in particular, the transaction open response) sometimes never gets delivered into our code. This only is reproducible in the localstack demo https://github.com/typedb-osi/typedb-localstack-demo, and there non-deterministically!
We see:
These are confirmed with Wireshark.
The client side actually receives something. If we add logging into
stub.rs
:This actually returns a usable grpc stream successfully -- however, the initial OpenTransaction.Res message doesn't arrive until "something else" happens, such as the stream closing, or a keepalive ping it sent.
It's very strange but the keepalive ping at being set at 3 seconds does force the message to arrive at some point...