feat: Go SDK with examples, CI/CD, and release-please by 0xdeafcafe · Pull Request #80 · langwatch/scenario

0xdeafcafe · 2025-06-30T11:39:47Z

Summary

Full Go SDK for scenario-based AI agent testing, aligned with the existing JS and Python SDKs.

Core SDK (30 commits)

Provider-agnostic agent testing framework with AgentAdapter interface
Built-in UserSimulatorAgent and JudgeAgent with LLM-powered evaluation
Script DSL: User(), Agent(), Judge(), Proceed(), Succeed(), Fail()
LLM provider adapters: OpenAI, Anthropic, Gemini, AWS Bedrock
LangWatch integration with event reporting and OpenTelemetry tracing
Verbose and Metadata fields on ScenarioConfig (wired into runner + events)

API alignment with JS/Python

AgentRole values changed to Title case ("Agent", "User", "Judge")
LastAssistantMessage() renamed to LastAgentMessage()
ScenarioResult uses MetCriteria/UnmetCriteria (matches JS)

Bug fix

Fixed swapped ToolMessage(content, toolCallID) arguments in OpenAI provider

Example test suite (`go/examples/`)

10 test files ported from the JS example suite:

weather_agent_test.go — tool calling + HasToolCall() assertions
vegetarian_recipe_agent_test.go — multi-turn with judge checkpoint criteria
travel_agent_test.go — multi-tool agent, recursive execution
false_assumptions_test.go — hardcoded messages + Proceed(WithProceedTurns, WithProceedOnTurn)
grouping_scenarios_test.go — echo agent, SetID, Succeed()
error_handling_test.go — agent error propagation
simple_tool_mocking_test.go — mocked tool execution, parameter verification
custom_judge_test.go — custom judge with direct LLM structured output
multiturn_10_scripted_test.go — fully scripted 10-turn conversation
mocked_weather_agent_tool_test.go — hardcoded tool call/result injection

CI/CD

go-ci.yml — vet, test, provider checks, example tests with secrets
go-publish.yml — verify + warm Go module proxy on release tag
Release-please configured for go component (release-type: go, starting at v0.1.0)

Test plan

All 10 example tests pass against live OpenAI API
go vet ./... clean on core SDK, examples, and providers
Existing internal unit tests pass (ksuid, ptr)
CI workflow runs on this PR

Extract LLM inference interface and message types into dedicated files. Update agent interfaces and user simulator with refined API.

- Prefix event types with SCENARIO_ (SCENARIO_RUN_STARTED, etc.) - Add ScenarioRunStatus type with uppercase values (SUCCESS, ERROR, FAILED) - Rewrite eventalert with tmpdir file coordination across processes - Show greeting banner only when API key is missing - Use path-based watch URL format ({setUrl}/{batchRunId}) - Add SCENARIO_HEADLESS env var to suppress browser open - Add SCENARIO_DISABLE_SIMULATION_REPORT_INFO env var to suppress banners - Scope watch message per scenarioSetId - Cache batch run ID per process with sync.Once

Add go.opentelemetry.io/otel, otel/sdk, otel/trace, otel/attribute, otel/codes and github.com/langwatch/langwatch/sdk-go for full OTel tracing integration.

- SpanCollector: implements sdktrace.SpanProcessor to collect spans, filters by thread ID with parent chain walking - SpanDigestFormatter: renders spans as plain-text hierarchy with timestamps, durations, attributes, events, and error sections - setupObservability: creates LangWatch exporter, TracerProvider, and SpanCollector; wires into global OTel provider - Remove TracedInference and instrumentBuiltInAgents (replaced by OTel)

- Execution: create per-turn and per-agent spans with tracer, end spans at all exit paths - JudgeAgent: build transcript from messages, include OTel trace digest in judge prompt alongside conversation transcript - Runner: init observability when API key present, wire span collector to judge agents and execution, default endpoint to app.langwatch.ai, default SetID to "default", shutdown observability after run

Multi-provider inference abstraction supporting OpenAI, Anthropic, Google Gemini, and AWS Bedrock with tool/function calling conversion.

…-please API alignment: - AgentRole values: lowercase → Title case to match JS/Python - Rename LastAssistantMessage → LastAgentMessage for consistency - Add Verbose and Metadata fields to ScenarioConfig - Wire Verbose (prints failure details) and Metadata (sent in events) Bug fix: - Fix swapped ToolMessage arguments in OpenAI provider (content/toolCallID) Examples (10 test files in go/examples/): - weather-agent, vegetarian-recipe, travel-agent, false-assumptions, grouping-scenarios, error-handling, simple-tool-mocking, custom-judge, multiturn-10-scripted, mocked-weather-agent-tool CI/CD: - go-ci.yml: vet, test, provider checks, example tests with secrets - go-publish.yml: verify + warm Go module proxy on release - release-please: add go component (release-type: go, v0.1.0) - version.go, CHANGELOG.md for release-please integration

github-actions · 2026-04-09T17:34:25Z

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

This PR's diff exceeds the size limit for automated low-risk evaluation. Manual review required.

This PR requires a manual review before merging.

Copilot

Pull request overview

Introduces an initial Scenario Go SDK (core runner/execution engine, DSL helpers, LangWatch event reporting + OTel tracing) plus provider adapters (OpenAI/Anthropic/Gemini/Bedrock), examples, and Go CI/publish workflows.

Changes:

Add Go SDK core types (agents, messages, execution, script DSL) and LangWatch event reporting / OTel tracing utilities.
Add provider-specific Inference adapters for OpenAI, Anthropic, Gemini, and AWS Bedrock.
Add Go examples/tests, release-please config entries, and GitHub Actions workflows for Go CI and publishing.

Reviewed changes

Copilot reviewed 66 out of 72 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
go/version.go	SDK version constant.
go/utils.go	Criterion param-name normalization + message role reversal utilities.
go/tracing.go	Tracing notes / legacy placeholder.
go/tracing_setup.go	OTel + LangWatch exporter setup and handle.
go/tracing_digest.go	Plain-text span digest formatter for judge evaluation.
go/tracing_collector.go	OTel span processor/collector to attach spans to judge.
go/script.go	Script DSL helper functions (`User/Agent/Judge/Proceed/...`).
go/runner.go	`Run()` entrypoint, option handling, reporter + observability wiring.
go/execution.go	Core scenario execution engine (turn loop, agent calls, events, spans).
go/executionstate.go	Execution state tracking and helper queries (tool calls, last messages).
go/message.go	Provider-agnostic message / tool-call domain types.
go/llm.go	Provider-agnostic `Inference` interface + tool schema types.
go/domain.go	Public interfaces/types for scripts, execution, state, results, options.
go/config.go	`ScenarioConfig` definition.
go/ids.go	KSUID-based IDs for thread/scenario/batch/run.
go/events.go	Event types emitted during execution.
go/eventbus.go	In-process channel-based event bus.
go/eventreporter.go	HTTP reporter posting events to LangWatch API.
go/eventalert.go	Console banner + “follow live” URL + coordination-file logic.
go/agent.go	Agent roles, inputs/returns, config, and judge option types.
go/agent_user_simulator.go	Built-in user simulator agent (role reversal + LLM call).
go/agent_judge.go	Built-in judge agent (criteria tools, transcript + OTel digest).
go/README.md	Go SDK documentation and usage examples.
go/CHANGELOG.md	Initial Go SDK changelog entry.
go/go.mod	Go module definition for SDK.
go/go.sum	Go module dependency lockfile for SDK.
go/internal/judge_agent_tools.go	Judge tool-argument parsing helpers.
go/internal/libraries/ptr/ptr.go	Small pointer helper library.
go/internal/libraries/ptr/ptr_test.go	Tests for ptr helpers.
go/internal/libraries/ksuid/README.md	Internal KSUID library docs.
go/internal/libraries/ksuid/LICENSE	Internal KSUID library license.
go/internal/libraries/ksuid/base62.go	KSUID base62 decode implementation.
go/internal/libraries/ksuid/id.go	KSUID ID type, parsing, encoding, JSON/db integration.
go/internal/libraries/ksuid/id_test.go	KSUID ID tests/benchmarks.
go/internal/libraries/ksuid/instance_id.go	Instance ID generation (docker/hardware/random).
go/internal/libraries/ksuid/node.go	KSUID node generator.
go/internal/libraries/ksuid/node_test.go	KSUID node benchmark.
go/providers/openai/openai.go	OpenAI provider adapter implementing `Inference`.
go/providers/openai/convert.go	Scenario<->OpenAI message/tool conversion helpers.
go/providers/openai/go.mod	Provider module definition.
go/providers/openai/go.sum	Provider dependency lockfile.
go/providers/anthropic/anthropic.go	Anthropic provider adapter implementing `Inference`.
go/providers/anthropic/convert.go	Scenario<->Anthropic message/tool conversion helpers.
go/providers/anthropic/go.mod	Provider module definition.
go/providers/anthropic/go.sum	Provider dependency lockfile.
go/providers/gemini/gemini.go	Gemini provider adapter implementing `Inference`.
go/providers/gemini/convert.go	Scenario<->Gemini message/tool conversion helpers.
go/providers/gemini/go.mod	Provider module definition.
go/providers/gemini/go.sum	Provider dependency lockfile.
go/providers/bedrock/bedrock.go	Bedrock provider adapter implementing `Inference`.
go/providers/bedrock/convert.go	Scenario<->Bedrock message/tool conversion helpers.
go/providers/bedrock/go.mod	Provider module definition.
go/providers/bedrock/go.sum	Provider dependency lockfile.
go/examples/.gitignore	Ignore local env files for examples.
go/examples/.env.example	Example env variables for running examples.
go/examples/go.mod	Examples module definition.
go/examples/go.sum	Examples dependency lockfile.
go/examples/helpers_test.go	Example helper agents + tool mocking helpers.
go/examples/weather_agent_test.go	Example scenario: weather tool calling.
go/examples/travel_agent_test.go	Example scenario: multi-tool travel agent + judge criteria.
go/examples/vegetarian_recipe_agent_test.go	Example scenario: multi-turn judge checkpoints.
go/examples/simple_tool_mocking_test.go	Example scenario: tool mocking + parameter assertion.
go/examples/multiturn_10_scripted_test.go	Example scenario: fully scripted 10-turn conversation + judge.
go/examples/mocked_weather_agent_tool_test.go	Example scenario: injecting tool call/result messages.
go/examples/grouping_scenarios_test.go	Example scenario: grouping via `SetID`.
go/examples/false_assumptions_test.go	Example scenario: proceed options + bias criteria.
go/examples/error_handling_test.go	Example scenario: agent error propagation into result.
go/examples/custom_judge_test.go	Example scenario: fully custom LLM judge agent.
.release-please-manifest.json	Adds `go` component version tracking.
.release-please-config.json	Adds release-please config for Go component.
.github/workflows/go-ci.yml	Go CI workflow (vet/test + providers + examples).
.github/workflows/go-publish.yml	Go publish/indexing workflow triggered on releases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T17:41:30Z

go/internal/judge_agent_tools.go

+func ParseJudgeAgentFinishTestToolArguments(arguments string) (*JudgeAgentFinishTestToolArguments, error) {
+	var resp *JudgeAgentFinishTestToolArguments
+	if err := json.Unmarshal([]byte(arguments), &resp); err != nil {
+		return nil, fmt.Errorf("failed to parse judge agent finish tool arguments: %w", err)
+	}
+
+	if resp.Verdict == "" {
+		resp.Verdict = "inconclusive"
+	}
+	if resp.Reasoning == "" {
+		resp.Reasoning = "No reasoning provided"
+	}
+	if resp.Criteria == nil {


json.Unmarshal is being done into a *JudgeAgentFinishTestToolArguments pointer (var resp *... then Unmarshal(&resp)). If the LLM returns null, resp remains nil and the subsequent field accesses will panic. Consider unmarshaling into a value struct (non-pointer) or explicitly handling the resp == nil case after unmarshal before setting defaults.

Copilot · 2026-04-09T17:41:30Z

go/internal/libraries/ksuid/instance_id.go

+func getHardwareAddr(ctx context.Context) (net.HardwareAddr, error) {
+	addrs, err := net.Interfaces()
+	if err != nil {
+		return nil, err
+	}
+
+	for _, addr := range addrs {
+		// only return physical interfaces (i.e. not loopback)
+		if len(addr.HardwareAddr) >= 6 {
+			return addr.HardwareAddr, nil
+		}
+	}
+
+	return nil, fmt.Errorf("%w: %w", ErrNoHardwareAddress, err)
+}


err is guaranteed to be nil here (all earlier error returns have already happened), so fmt.Errorf("%w: %w", ErrNoHardwareAddress, err) either adds a confusing <nil> wrap or relies on multi-%w behavior. Prefer returning ErrNoHardwareAddress directly (or include a concrete context error) to avoid wrapping nil.

Copilot · 2026-04-09T17:41:31Z

go/internal/libraries/ksuid/node.go

+// NewNode returns a ID generator for the current machine.
+func NewNode(environment string, instanceID InstanceID) *Node {
+	return &Node{
+		InstanceID: instanceID,
+	}
+}
+
+// Generate returns a new ID for the machine and resource configured.
+func (n *Node) Generate(ctx context.Context, resource string) (id ID) {
+	if strings.ContainsRune(resource, '_') {
+		panic(fmt.Errorf("ksuid resource contains underscore: %s", resource))
+	}
+
+	id.Environment = Production
+	id.Resource = resource
+	id.InstanceID = n.InstanceID


NewNode(environment, instanceID) ignores the environment argument, and Generate hard-codes id.Environment = Production, making it impossible to generate non-prod/environment-prefixed KSUIDs despite the API/docs implying it. Consider storing environment on Node and using it when populating id.Environment.

Copilot · 2026-04-09T17:41:31Z

go/providers/gemini/convert.go

+		case scenario.MessageRoleTool:
+			var responseData map[string]any
+			if msg.Content != "" {
+				if err := json.Unmarshal([]byte(msg.Content), &responseData); err != nil {
+					responseData = map[string]any{"result": msg.Content}
+				}
+			}
+			result = append(result, &genai.Content{
+				Role: "user",
+				Parts: []*genai.Part{
+					{
+						FunctionResponse: &genai.FunctionResponse{
+							Name:     msg.ToolCallID,
+							Response: responseData,
+						},
+					},
+				},
+			})
+


For Gemini tool results, FunctionResponse.Name should match the function/tool name, but this code sets it to msg.ToolCallID (an OpenAI-style call ID). This will break tool-call flows whenever ToolCallID != tool name (e.g. the mocked tool-call example uses IDs like call_mock_001). Consider building a toolCallID -> toolName map by scanning prior assistant messages' ToolCalls, and use that mapped tool name when creating FunctionResponse.

Copilot · 2026-04-09T17:41:31Z

go/providers/gemini/convert.go

+		if part.FunctionCall != nil {
+			args := "{}"
+			if part.FunctionCall.Args != nil {
+				b, err := json.Marshal(part.FunctionCall.Args)
+				if err == nil {
+					args = string(b)
+				}
+			}
+			msg.ToolCalls = append(msg.ToolCalls, scenario.ToolCall{
+				ID:        part.FunctionCall.Name,
+				Name:      part.FunctionCall.Name,
+				Arguments: args,
+			})
+		}


Gemini responses don’t appear to provide a unique tool-call ID, but this conversion sets ToolCall.ID to part.FunctionCall.Name. If the model emits multiple calls to the same function, IDs will collide and downstream tool-result correlation via ToolCallID becomes ambiguous. Consider generating a deterministic unique ID per tool call (e.g., call_1, call_2, …) while preserving Name for the function name.

Copilot · 2026-04-09T17:41:34Z

go/eventalert.go

+func showWatchMessage(setURL, scenarioSetID string) {
+	if isGreetingDisabled() {
+		return
+	}
+
+	if !createCoordinationFile("watch-" + scenarioSetID) {
+		return
+	}


scenarioSetID is concatenated into fileType and used to build a temp-file path. Because SetID is user-controlled, values containing path separators (e.g., ../ or /) can change the resulting path and cause unexpected failures or collisions. Consider sanitizing scenarioSetID (e.g., replace non [A-Za-z0-9._-] chars) before using it in a filename.

Copilot · 2026-04-09T17:41:34Z

go/internal/libraries/ptr/ptr.go

+// ValueOrNil returns the value of the pointer if it is not nil, otherwise it returns the
+// zero value of the type.
+func ValueOrNil[T any](v *T) T {
+	if v == nil {
+		var zero T
+		return zero
+	}
+
+	return *v
+}
+
+// ValueOrZero returns the value of the pointer if it is not nil, otherwise it returns
+// the zero value of the type.
+func ValueOrZero[T any](v *T) T {
+	if v == nil {
+		var zero T
+		return zero
+	}
+
+	return *v
+}


ValueOrNil and ValueOrZero have identical implementations and semantics (both return the zero value when nil). Keeping both increases API surface without adding behavior. Consider removing one of them or changing one to provide distinct semantics.

Copilot · 2026-04-09T17:41:34Z

go/internal/libraries/ksuid/README.md

+# ksuid
+
+ksuid is a Go library that generated prefixed, k-sorted globally unique identifiers.
+
+Each KSUID has a resource type and optionally an environment prefix (no environment prefix is for prod use only). They are roughly sortable down to per-second resolution.


Grammar: “ksuid is a Go library that generated …” should be “ksuid is a Go library that generates …”.

Copilot · 2026-04-09T17:41:34Z

go/providers/gemini/convert.go

+func toGeminiSchema(params map[string]any) *genai.Schema {
+	if params == nil {
+		return nil
+	}
+
+	schema := &genai.Schema{
+		Type: genai.TypeObject,
+	}
+
+	if props, ok := params["properties"].(map[string]any); ok {
+		schema.Properties = make(map[string]*genai.Schema)
+		for name, propDef := range props {
+			schema.Properties[name] = convertPropertyToSchema(propDef)
+		}
+	}
+
+	if req, ok := params["required"].([]any); ok {
+		for _, r := range req {
+			if s, ok := r.(string); ok {
+				schema.Required = append(schema.Required, s)
+			}
+		}
+	}
+


toGeminiSchema only reads required when it’s typed as []any, but callers commonly provide JSON schema required as []string (as in the examples). This means required fields will be silently dropped for Gemini tool definitions. Consider accepting both []string and []any (string elements) when populating schema.Required.

Copilot · 2026-04-09T17:41:35Z

go/providers/anthropic/convert.go

+func toAnthropicTools(tools []scenario.ToolDefinition) []anthropic.ToolUnionParam {
+	result := make([]anthropic.ToolUnionParam, 0, len(tools))
+	for _, tool := range tools {
+		tp := &anthropic.ToolParam{
+			Name:        tool.Name,
+			Description: anthropic.String(tool.Description),
+			InputSchema: anthropic.ToolInputSchemaParam{
+				Properties: tool.Parameters["properties"],
+			},
+		}
+		if req, ok := tool.Parameters["required"].([]any); ok {
+			reqStrings := make([]string, 0, len(req))
+			for _, r := range req {
+				if s, ok := r.(string); ok {
+					reqStrings = append(reqStrings, s)
+				}
+			}
+			tp.InputSchema.Required = reqStrings
+		}
+		result = append(result, anthropic.ToolUnionParam{OfTool: tp})


toAnthropicTools only reads required when it’s typed as []any, but callers commonly provide JSON schema required as []string (as in the examples). This means required fields will be silently dropped in the Anthropic tool schema. Consider accepting both []string and []any (string elements) when building tp.InputSchema.Required.

rogeriochaves force-pushed the main branch 2 times, most recently from 77a92af to 9fdb87c Compare December 16, 2025 15:54

0xdeafcafe force-pushed the feat/go-sdk branch from 61dfaae to 86db52d Compare February 23, 2026 11:17

0xdeafcafe marked this pull request as ready for review February 24, 2026 09:05

0xdeafcafe added 26 commits April 9, 2026 18:14

setup mod

297d578

setup shallow config

44b65f5

added internal libs (ksuid and ptr)

2e3e02a

added id gen for thread

78b2e49

added script

34632ec

started on runner

3f9a827

added openai

401a820

updated ptr lib

a26439a

updated structure of types

0a09030

added execution state

474fe7a

added user sim agent

35ac781

basic execution layout

2b0e559

updated user sim agent

67c638a

started on judge agent

27a5fc7

update config

c7bba9d

tools work on judge

7b0645b

update proceed arg mess to use options pattern

c323f2a

update agents to use openai client, rather than model id mess

60ebf46

finished judge agent

92fb0b1

cleanup runner

6fb5db3

internal logic to parse finish test tool

1ddd4bb

util for criterion name

5f4251a

new events api

57aa3c7

refactor core types, add llm/message abstractions, update agents

f3f7487

Extract LLM inference interface and message types into dedicated files. Update agent interfaces and user simulator with refined API.

add OpenTelemetry and LangWatch SDK dependencies

8f08c45

Add go.opentelemetry.io/otel, otel/sdk, otel/trace, otel/attribute, otel/codes and github.com/langwatch/langwatch/sdk-go for full OTel tracing integration.

0xdeafcafe added 5 commits April 9, 2026 18:14

add LLM provider implementations for OpenAI, Anthropic, Gemini, Bedrock

e873eca

Multi-provider inference abstraction supporting OpenAI, Anthropic, Google Gemini, and AWS Bedrock with tool/function calling conversion.

add Go SDK README

985e0c3

Copilot AI review requested due to automatic review settings April 9, 2026 17:34

0xdeafcafe force-pushed the feat/go-sdk branch from ebbda14 to feb39a5 Compare April 9, 2026 17:34

Copilot started reviewing on behalf of 0xdeafcafe April 9, 2026 17:34 View session

0xdeafcafe changed the title ~~feat: go sdk~~ feat: Go SDK with examples, CI/CD, and release-please Apr 9, 2026

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Go SDK with examples, CI/CD, and release-please#80

feat: Go SDK with examples, CI/CD, and release-please#80
0xdeafcafe wants to merge 31 commits intomainfrom
feat/go-sdk

0xdeafcafe commented Jun 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xdeafcafe commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core SDK (30 commits)

API alignment with JS/Python

Bug fix

Example test suite (go/examples/)

CI/CD

Test plan

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xdeafcafe commented Jun 30, 2025 •

edited

Loading

Example test suite (`go/examples/`)