-
Notifications
You must be signed in to change notification settings - Fork 5
Semantic Kernel Tutorial #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @azaddhirajkumar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new, in-depth tutorial designed to help developers build semantic search applications. It focuses on integrating Microsoft's Semantic Kernel with Couchbase's robust vector search features via the Couchbase .NET Vector Store Connector. The tutorial covers the entire process, from setting up the development environment and defining data models to generating embeddings with OpenAI, ingesting data, and executing various vector search queries. It also provides extensive information on configuring different types of Couchbase vector indexes, offering a practical guide for leveraging AI-powered search capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new tutorial for using the Semantic Kernel with Couchbase. The tutorial is comprehensive and well-structured. However, I've found several issues that need to be addressed before merging. There are critical errors in the provided JSON and SQL++ code snippets (missing commas, trailing commas) that will prevent them from working. Additionally, some links point to temporary or internal resources (a feature branch and a test documentation server), which should be updated to stable, public URLs. There are also some invalid tags in the frontmatter that will likely fail validation, and a section on embedding generation is potentially confusing. I've left specific comments with suggestions for each of these points.
| ## Repository Links | ||
|
|
||
| - **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel | ||
| - **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the example code points to a feature branch (Support-Bhive-and-Composite-Index). This is not ideal for a public tutorial, as feature branches are often temporary and may be deleted. It's recommended to update this link to point to the main branch (e.g., main or master) or a specific release tag once the code is merged.
| - **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase | |
| - **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/main/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the example not be in Couchbase-Examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can link the example from the framework README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll shift this example to couchbase-example once I've the nuget package ready
| - Use **Composite** when scalar filters eliminate large portions of data before vector comparison | ||
| - Use **FTS** when you need hybrid search combining full-text and semantic search | ||
|
|
||
| For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the "Couchbase Vector Index Documentation" points to a preview.docs-test.couchbase.com URL. This appears to be an internal or test documentation server. For a public tutorial, this should be updated to the final, public documentation URL.
| For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html). | |
| For more details, see the [Couchbase Vector Index Documentation](https://docs.couchbase.com/server/current/vector-search/vector-search-overview.html). |
| - `IVF1000,SQ6` - 1000 centroids, 6-bit quantization (faster, less accurate) | ||
| - `IVF,PQ32x8` - Auto centroids, product quantization (better accuracy) | ||
|
|
||
| For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link points to a preview.docs-test.couchbase.com URL, which appears to be an internal or staging documentation server. For a public tutorial, this should be updated to point to the official public documentation.
| For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation. | |
| For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-search/indexing-vectors.html) documentation. |
tutorial/markdown/aspnet/semantic-kernel/semantic-kernel-tutorial.md
Outdated
Show resolved
Hide resolved
| ## Repository Links | ||
|
|
||
| - **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel | ||
| - **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the example not be in Couchbase-Examples?
| ## Repository Links | ||
|
|
||
| - **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel | ||
| - **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can link the example from the framework README
|
|
||
| ### 2. OpenAI API Access | ||
| - **OpenAI API Key** - Get one from: https://platform.openai.com/api-keys | ||
| - Used for generating text embeddings with `text-embedding-ada-002` model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to use the old embedding model? text-embedding-3-small should be better both from cost & performance perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the embedding model
|
|
||
| ### 3. Configuration Setup | ||
|
|
||
| Update `appsettings.Development.json` with your credentials: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the bucket, scope & collection exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also mention that these values can be changed with corresponding updation of the code.
| "glossary", | ||
| new CouchbaseQueryCollectionOptions | ||
| { | ||
| IndexName = "bhive_glossary_index", // BHIVE index name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to create the index without having any data? Or do you create the index after inserting the data? I think this point is worth highlighting.
Also is the index optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating of the index is done outside of the connector for this example, as creation of the index is not supported in the connector yet. It will be supported in the future version when this problem is fixed on the server side first.
I've not kept index creation optional here to follow consistency between FTS and GSI both.
| - **Include Fields**: Non-vector fields for faster retrieval | ||
| - **Quantization**: `IVF,SQ8` (Inverted File with 8-bit scalar quantization) | ||
|
|
||
| > **Note**: Composite vector indexes can be created similarly by adding scalar fields to the index definition. Use composite indexes when your queries frequently filter on scalar values before vector comparison. For this demo, we use BHIVE since we're demonstrating pure semantic search capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link to the composite index docs?
| 2. **Get Collection** - Use `GetCollection<TKey, TRecord>()` to get a typed collection reference | ||
| 3. **Generate Embeddings** - Use Semantic Kernel's `IEmbeddingGenerator` to convert text to vectors | ||
| 4. **Upsert Records** - Call `UpsertAsync()` to insert/update records with embeddings | ||
| 5. **Create Index** - Set up a vector index using SQL++ for optimal search performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is optional right? Without an index, the brute force knn search would be performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the step by step process which the example follows when once the program is run. I can make the index creation optional, but I'm creating the index to follow consistency between FTS and GSI both.
No description provided.