Skip to content

Conversation

@azaddhirajkumar
Copy link
Contributor

No description provided.

@azaddhirajkumar azaddhirajkumar requested a review from a team as a code owner October 14, 2025 11:37
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @azaddhirajkumar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, in-depth tutorial designed to help developers build semantic search applications. It focuses on integrating Microsoft's Semantic Kernel with Couchbase's robust vector search features via the Couchbase .NET Vector Store Connector. The tutorial covers the entire process, from setting up the development environment and defining data models to generating embeddings with OpenAI, ingesting data, and executing various vector search queries. It also provides extensive information on configuring different types of Couchbase vector indexes, offering a practical guide for leveraging AI-powered search capabilities.

Highlights

  • New Tutorial Added: A comprehensive tutorial has been added, guiding users through building vector search applications using the Couchbase .NET Semantic Kernel Connector and OpenAI.
  • Couchbase Integration with Semantic Kernel: The tutorial demonstrates seamless integration of Microsoft Semantic Kernel with Couchbase's vector search capabilities, covering BHIVE, Composite, and FTS index types.
  • Embedding Generation and Storage: It explains how to generate text embeddings using OpenAI's text-embedding-ada-002 model and efficiently store them within Couchbase.
  • Vector Search Operations: The tutorial details both pure and filtered vector search queries, illustrating their underlying translation to SQL++ with ANN_DISTANCE and WHERE clauses.
  • Advanced Index Configuration: Detailed guidance is provided on configuring Couchbase vector indexes, including BHIVE, Composite, and FTS, along with explanations of parameters like centroids and quantization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new tutorial for using the Semantic Kernel with Couchbase. The tutorial is comprehensive and well-structured. However, I've found several issues that need to be addressed before merging. There are critical errors in the provided JSON and SQL++ code snippets (missing commas, trailing commas) that will prevent them from working. Additionally, some links point to temporary or internal resources (a feature branch and a test documentation server), which should be updated to stable, public URLs. There are also some invalid tags in the frontmatter that will likely fail validation, and a section on embedding generation is potentially confusing. I've left specific comments with suggestions for each of these points.

## Repository Links

- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the example code points to a feature branch (Support-Bhive-and-Composite-Index). This is not ideal for a public tutorial, as feature branches are often temporary and may be deleted. It's recommended to update this link to point to the main branch (e.g., main or master) or a specific release tag once the code is merged.

Suggested change
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/main/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the example not be in Couchbase-Examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can link the example from the framework README

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll shift this example to couchbase-example once I've the nuget package ready

- Use **Composite** when scalar filters eliminate large portions of data before vector comparison
- Use **FTS** when you need hybrid search combining full-text and semantic search

For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the "Couchbase Vector Index Documentation" points to a preview.docs-test.couchbase.com URL. This appears to be an internal or test documentation server. For a public tutorial, this should be updated to the final, public documentation URL.

Suggested change
For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html).
For more details, see the [Couchbase Vector Index Documentation](https://docs.couchbase.com/server/current/vector-search/vector-search-overview.html).

- `IVF1000,SQ6` - 1000 centroids, 6-bit quantization (faster, less accurate)
- `IVF,PQ32x8` - Auto centroids, product quantization (better accuracy)

For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This link points to a preview.docs-test.couchbase.com URL, which appears to be an internal or staging documentation server. For a public tutorial, this should be updated to point to the official public documentation.

Suggested change
For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation.
For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-search/indexing-vectors.html) documentation.

## Repository Links

- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the example not be in Couchbase-Examples?

## Repository Links

- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can link the example from the framework README


### 2. OpenAI API Access
- **OpenAI API Key** - Get one from: https://platform.openai.com/api-keys
- Used for generating text embeddings with `text-embedding-ada-002` model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to use the old embedding model? text-embedding-3-small should be better both from cost & performance perspective

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the embedding model


### 3. Configuration Setup

Update `appsettings.Development.json` with your credentials:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the bucket, scope & collection exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also mention that these values can be changed with corresponding updation of the code.

"glossary",
new CouchbaseQueryCollectionOptions
{
IndexName = "bhive_glossary_index", // BHIVE index name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to create the index without having any data? Or do you create the index after inserting the data? I think this point is worth highlighting.
Also is the index optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating of the index is done outside of the connector for this example, as creation of the index is not supported in the connector yet. It will be supported in the future version when this problem is fixed on the server side first.

I've not kept index creation optional here to follow consistency between FTS and GSI both.

- **Include Fields**: Non-vector fields for faster retrieval
- **Quantization**: `IVF,SQ8` (Inverted File with 8-bit scalar quantization)

> **Note**: Composite vector indexes can be created similarly by adding scalar fields to the index definition. Use composite indexes when your queries frequently filter on scalar values before vector comparison. For this demo, we use BHIVE since we're demonstrating pure semantic search capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to the composite index docs?

2. **Get Collection** - Use `GetCollection<TKey, TRecord>()` to get a typed collection reference
3. **Generate Embeddings** - Use Semantic Kernel's `IEmbeddingGenerator` to convert text to vectors
4. **Upsert Records** - Call `UpsertAsync()` to insert/update records with embeddings
5. **Create Index** - Set up a vector index using SQL++ for optimal search performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optional right? Without an index, the brute force knn search would be performed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the step by step process which the example follows when once the program is run. I can make the index creation optional, but I'm creating the index to follow consistency between FTS and GSI both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants