Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic langsmith-nodejs Rust bindings #1293

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

obi1kenobi
Copy link
Collaborator

@obi1kenobi obi1kenobi commented Dec 5, 2024

Rust-based client bindings for submitting data to LangSmith.

Current state: MVP, aiming to get minimal end-to-end validation of the approach. Only implements createRun at the moment. updateRun does not have a corresponding Rust implementation.

Don't mind the failing tests at the moment — it's because there aren't any test files, so the test runner is exiting non-zero. If we have a good way to make an integration test for this, we can easily add it to the test suite. In one of the earlier commits, I tested the bindings with a simple fibonacci() function to make sure the Node.js -> Rust calling works, and all the tests passed.

Testing this code

To build the native code bindings:

cd rust/crates/langsmith-nodejs
yarn install --dev
yarn run build:debug

# To work around an undefined type bug in the generated definitions.
git restore index.d.ts

This should produce a file with a name like langsmith-nodejs.linux-x64-gnu.node (on libc x64 Linux), or langsmith-nodejs.darwin-arm64.node (on macOS arm64). That file is the native Rust code, compiled for your platform.

Then, go to js/ and run yarn install to install the native library (as a path dependency).

To have the JS code use the Rust bindings to submit data, run it with the LANGCHAIN_USE_RUST_CLIENT environment variable set to any value.

@obi1kenobi obi1kenobi force-pushed the pg/nodejs-bindings branch 5 times, most recently from 12f8b84 to dab5a29 Compare December 5, 2024 19:13
@obi1kenobi obi1kenobi force-pushed the pg/nodejs-bindings branch 3 times, most recently from f125b51 to ee51176 Compare December 5, 2024 20:20
@obi1kenobi obi1kenobi marked this pull request as ready for review December 5, 2024 20:51
@obi1kenobi obi1kenobi changed the title langsmith-nodejs Rust bindings Basic project stub for langsmith-nodejs Rust bindings Dec 5, 2024
@obi1kenobi obi1kenobi requested a review from agola11 December 5, 2024 21:00
@@ -103,6 +103,7 @@
"dependencies": {
"@types/uuid": "^10.0.0",
"commander": "^10.0.1",
"langsmith-nodejs": "file:../rust/crates/langsmith-nodejs",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, we'll need to replace this path dependency with a dependency on a real, published JS library.

js/src/client.ts Outdated Show resolved Hide resolved
js/src/client.ts Outdated
Comment on lines 964 to 975
const response = await this.caller.call(
_getFetchImplementation(),
`${this.apiUrl}/runs`,
{
method: "POST",
headers,
body: stringifyForTracing(mergedRunCreateParam),
signal: AbortSignal.timeout(this.timeout_ms),
...this.fetchOptions,
}
);
await raiseForStatus(response, "create run", true);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the old code as-is, just moved into the else branch.

@@ -3769,7 +3844,7 @@ export class Client implements LangSmithTracingClientInterface {
if (options?.isPublic && !settings.tenant_handle) {
throw new Error(
`Cannot create a public prompt without first\n
creating a LangChain Hub handle.
creating a LangChain Hub handle.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this noise — my editor is stripping trailing whitespace on save by default. If this is a concern, I can undo it.

Comment on lines +4121 to +4122
* Clone a public dataset to your own langsmith tenant.
* This operation is idempotent. If you already have a dataset with the given name,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this noise — my editor is stripping trailing whitespace on save by default. If this is a concern, I can undo it.

@@ -2,7 +2,7 @@ use serde::{Deserialize, Serialize};
use serde_json::Value;

// Map attachment ref to tuple of filename, optional bytes
#[derive(Debug)]
#[derive(Debug, Serialize, Deserialize)]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made all of these Serialize, Deserialize to make it easier to create them from JS.

This is a temporary hack to get something working quickly. We can figure out a more appropriate long-term solution once we have an end-to-end working implementation.

@obi1kenobi obi1kenobi changed the title Basic project stub for langsmith-nodejs Rust bindings Basic langsmith-nodejs Rust bindings Dec 9, 2024
@@ -1,5 +1,7 @@
import * as uuid from "uuid";

import * as bindings from "langsmith-nodejs";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any compatibility issues here?

Will this install in environments that don't have Rust?

We can do some refactoring around entrypoints to make sure this works if so

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Rust shouldn't be required and users shouldn't notice we use Rust.

The usual approach (e.g. used by Sentry and esbuild) is to publish a cluster of packages like langsmith-nodejs-darwin-arm64, langsmith-nodejs-linux-x64-gnu etc. (see the npm directory in this PR) that wrap pre-compiled binaries for that specific platform: macOS on ARM, x64 linux with gnu libc, etc. We should look to make such packages available for all platforms on which users might use langsmith. Adding a new one is relatively straightforward.

When users install langsmith, it pulls in the dependency on langsmith-nodejs, which in turn has an optionalDependencies section with all the per-platform packages. Their package manager then only installs the "optional dependency" that matches their platform.

If people don't install optional dependencies, this obviously wouldn't work. There are a variety of options to give users actionable advice or fix the problem here:

  1. Using a postinstall script to determine the platform and download the required binary. Given that LangChain is used in enterprise environments, this is probably not a good idea — it would likely set off all kinds of security alerts.
  2. Instead of downloading the binary, telling the user to either use optional dependencies or manually add a dependency on the correct platform-specific package based on the platform we've determined in postinstall.
  3. Checking at langsmith initialization time that the platform-specific dependency is satisfied, and then notifying the user in the same way as above.

I think we probably want to stay away from 1., and probably do 2. with optionally adding 3.

P.S.: The package names are placeholders. langsmith-nodejs makes sense from the perspective of a Rust crate, but is probably not the right name for an npm package. Once we have something working end-to-end, we can and probably should rename.

@@ -521,6 +526,27 @@ export class Client implements LangSmithTracingClientInterface {
config.blockOnRootRunFinalization ?? this.blockOnRootRunFinalization;
this.batchSizeBytesLimit = config.batchSizeBytesLimit;
this.fetchOptions = config.fetchOptions || {};

// TODO: Do we care about syncing up the env var names between the JS and Python bindings?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we want to do LANGSMITH_USE

if (this._rustClient) {
const mergedRunCreateParam = mergeRuntimeEnvIntoRunCreate(runCreate);

// We need to massage the data shape into what Rust expects and will accept,
Copy link
Collaborator

@jacoblee93 jacoblee93 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can remove this shim in the JS client?

Can we add something to the Rust binding that converts JS data shapes to their Rust counterparts in a more general way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I plan to move most of this into Rust. This was just an MVP just to show the approach working end-to-end.

Some shim parts might need to continue to exist here though, for the sake of performance. It's relatively expensive to jump back and forth between JS and native code many times, so it's quite likely that blocks like this one are faster in JS than in Rust:

        const io = {
          inputs: mergedRunCreateParam.inputs
            ? stringifyForTracing(mergedRunCreateParam.inputs)
            : null,
          outputs: mergedRunCreateParam.outputs
            ? stringifyForTracing(mergedRunCreateParam.outputs)
            : null,
        };

This way, Rust sees inputs and outputs as just binary buffers instead of having to move through and dereference lots of JS objects. The serialization is using a highly optimized code path within the JS engine, instead of whatever Rust can access "from the outside."

It's probably an 80-20 kind of thing: 80% can easily move into Rust, so it should. Moving the remaining 20% to Rust would be more trouble than it's worth, and may require extra performance engineering work we could have just avoided by keeping it in JS.

console.error(e);
}
return;
} else if (this.autoBatchTracing) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is that anyone using the new bindings will have this enabled right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I can document that explicitly.

Copy link
Collaborator

@jacoblee93 jacoblee93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @obi1kenobi!

Main question on is if we can generalize serialization from JS to Rust and move it into the binding to avoid overhead on other methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants