Skip to content

Azure Open AI, OSS LLM 🌊1. Vector storage and 🦙langchain 🔎2. Azure Search ChatGpt demo 3. Microsoft ♾️Semantic-Kernel with 🌌 Cosmos DB, etc.

Notifications You must be signed in to change notification settings

FreezeSoul/azure-openai-elastic-vector-langchain

 
 

Repository files navigation

updated: 05/31/2023

LLM (Large language model) and Azure related libraries

This repository contains references to open-source models similar to ChatGPT, as well as Langchain and prompt engineering libraries. It also includes related samples and research on Langchain, Vector Search (including feasibility checks on Elasticsearch, Azure Cognitive Search, Azure Cosmos DB), and more.

Rule: Brief each item on one or a few lines as much as possible.

Table of contents

Section 1 : Llama-index and Vector Storage (Search)

This repository has been created for testing and feasibility checks using vector and language chains, specifically llama-index. These libraries are commonly used when implementing Prompt Engineering and consuming one's own data into LLM.

Opensearch/Elasticsearch setup

  • docker : Opensearch Docker-compose
  • docker-elasticsearch : Not working for ES v8, requiring security plug-in with mandatory
  • docker-elk : Elasticsearch Docker-compose, Optimized Docker configurations with solving security plug-in issues.
  • es-open-search-set-analyzer.py : Put Language analyzer into Open search
  • es-open-search.py : Open search sample index creation
  • es-search-set-analyzer.py : Put Language analyzer into Elastic search
  • es-search.py : Usage of Elastic search python client
  • files : The Sample file for consuming

llama-index

  • index.json : Vector data local backup created by llama-index
  • index_vector_in_opensearch.json : Vector data stored in Open search (Source: files\all_h1.pdf)
  • llama-index-azure-elk-create.py: llama-index ElasticsearchVectorClient (Unofficial file to manipulate vector search, Created by me, Not Fully Tested)
  • llama-index-lang-chain.py : Lang chain memory and agent usage with llama-index
  • llama-index-opensearch-create.py : Vector index creation to Open search
  • llama-index-opensearch-query-chatgpt.py : Test module to access Azure Open AI Embedding API.
  • llama-index-opensearch-query.py : Vector index query with questions to Open search
  • llama-index-opensearch-read.py : llama-index ElasticsearchVectorClient (Unofficial file to manipulate vector search, Created by me, Not Fully Tested)
  • env.template : The properties. Change its name to .env once your values settings is done.
OPENAI_API_TYPE=azure
OPENAI_API_BASE=https://????.openai.azure.com/
OPENAI_API_VERSION=2022-12-01
OPENAI_API_KEY=<your value in azure>
OPENAI_DEPLOYMENT_NAME_A=<your value in azure>
OPENAI_DEPLOYMENT_NAME_B=<your value in azure>
OPENAI_DEPLOYMENT_NAME_C=<your value in azure>
OPENAI_DOCUMENT_MODEL_NAME=<your value in azure>
OPENAI_QUERY_MODEL_NAME=<your value in azure>

INDEX_NAME=gpt-index-demo
INDEX_TEXT_FIELD=content
INDEX_EMBEDDING_FIELD=embedding
ELASTIC_SEARCH_ID=elastic
ELASTIC_SEARCH_PASSWORD=elastic
OPEN_SEARCH_ID=admin
OPEN_SEARCH_PASSWORD=admin

Vector Storage Comparison

Vector Storage Options for Azure

Milvus Embedded

# Step 1. Start Milvus

1. Unzip the package
Unzip the package, and you will find a milvus directory, which contains all the files required.

2. Start a MinIO service
Double-click the run_minio.bat file to start a MinIO service with default configurations. Data will be stored in the subdirectory s3data.

3. Start an etcd service
Double-click the run_etcd.bat file to start an etcd service with default configurations.

4. Start Milvus service
Double-click the run_milvus.bat file to start the Milvus service.

# Step 2. Run hello_milvus.py

After starting the Milvus service, you can test by running hello_milvus.py. See Hello Milvus for more information.

Conclusion

  • Azure Open AI Embedding API,text-embedding-ada-002, supports 1536 dimensions. Elastic search, Lucene based engine, supports 1024 dimensions as a max. Open search can insert 16,000 dimensions as a vector storage.

  • Lang chain interface of Azure Open AI does not support ChatGPT yet. so that reason, need to use alternatives such as text-davinci-003.

@open ai documents: text-embedding-ada-002: Smaller embedding size. The new embeddings have only 1536 dimensions, one-eighth the size of davinci-001 embeddings, making the new embeddings more cost effective in working with vector databases. https://openai.com/blog/new-and-improved-embedding-model

@open search documents: However, one exception to this is that the maximum dimension count for the Lucene engine is 1,024, compared with 16,000 for the other engines. https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/

@llama-index examples: However, the examples in llama-index uses 1536 vector size.

llama-index Deep dive

Section 2 : ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search

The files in this directory, extra_steps, have been created for managing extra configurations and steps for launching the demo repository.

https://github.com/Azure-Samples/azure-search-openai-demo

Screenshot

additonal_steps (optional)

  • fix_from_origin : The modified files, setup related
  • ms_internal_az_init.ps1 : Powershell script for Azure module installation
  • ms_internal_troubleshootingt.ps1 : Set Specific Subscription Id as default

Configuration steps

  1. (optional) Check Azure module installation in Powershell by running ms_internal_az_init.ps1 script
  2. (optional) Set your Azure subscription Id to default

Start the following commands in ./azure-search-openai-demo directory

  1. (deploy azure resources) Simply Run azd up

The azd stores relevant values in the .env file which is stored at ${project_folder}\.azure\az-search-openai-tg\.env.

AZURE_ENV_NAME=<your_value_in_azure>
AZURE_LOCATION=<your_value_in_azure>
AZURE_OPENAI_SERVICE=<your_value_in_azure>
AZURE_PRINCIPAL_ID=<your_value_in_azure>
AZURE_SEARCH_INDEX=<your_value_in_azure>
AZURE_SEARCH_SERVICE=<your_value_in_azure>
AZURE_STORAGE_ACCOUNT=<your_value_in_azure>
AZURE_STORAGE_CONTAINER=<your_value_in_azure>
AZURE_SUBSCRIPTION_ID=<your_value_in_azure>
BACKEND_URI=<your_value_in_azure>
  1. Move to app by cd app command
  2. (sample data loading) Move to scripts then Change into Powershell by Powershell command, Run prepdocs.ps1
  • console output (excerpt)
        Uploading blob for page 20 -> role_library-20.pdf
        Uploading blob for page 21 -> role_library-21.pdf
        Uploading blob for page 22 -> role_library-22.pdf
        Uploading blob for page 23 -> role_library-23.pdf
        Uploading blob for page 24 -> role_library-24.pdf
        Uploading blob for page 25 -> role_library-25.pdf
        Uploading blob for page 26 -> role_library-26.pdf
        Uploading blob for page 27 -> role_library-27.pdf
        Uploading blob for page 28 -> role_library-28.pdf
        Uploading blob for page 29 -> role_library-29.pdf
        Uploading blob for page 30 -> role_library-30.pdf
Indexing sections from 'role_library.pdf' into search index 'gptkbindex'
Splitting './data\role_library.pdf' into sections
        Indexed 60 sections, 60 succeeded
  1. Move to app by cd .. and cd app command
  2. (locally running) Run start.cmd
  • console output (excerpt)
Building frontend


> [email protected] build \azure-search-openai-demo\app\frontend
> tsc && vite build

vite v4.1.1 building for production...
✓ 1250 modules transformed.
../backend/static/index.html                    0.49 kB
../backend/static/assets/github-fab00c2d.svg    0.96 kB
../backend/static/assets/index-184dcdbd.css     7.33 kB │ gzip:   2.17 kB
../backend/static/assets/index-41d57639.js    625.76 kB │ gzip: 204.86 kB │ map: 5,057.29 kB

Starting backend

 * Serving Flask app 'app'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
127.0.0.1 - - [13/Apr/2023 14:25:31] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [13/Apr/2023 14:25:31] "GET /assets/index-184dcdbd.css HTTP/1.1" 200 -
127.0.0.1 - - [13/Apr/2023 14:25:31] "GET /assets/index-41d57639.js HTTP/1.1" 200 -
127.0.0.1 - - [13/Apr/2023 14:25:31] "GET /assets/github-fab00c2d.svg HTTP/1.1" 200 -
127.0.0.1 - - [13/Apr/2023 14:25:32] "GET /favicon.ico HTTP/1.1" 304 -
127.0.0.1 - - [13/Apr/2023 14:25:42] "POST /chat HTTP/1.1" 200 -

Running from second times

  1. Move to app by cd .. and cd app command
  2. (locally running) Run start.cmd

Another Reference Architectue

azure-open-ai-embeddings-qna

embeddin_azure_csharp

C# Implementation ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search

embeddin_azure_csharp

Azure Cosmos DB + OpenAI ChatGPT C# blazor and Azure Custom Template

gpt-cosmos

Azure Open AI work with Cognitive Search act as a Long-term memory

Azure Cognitive Search : Vector Search

Options: 1. Vector similarity search, 2. Pure Vector Search, 3. Hybrid Search, 4. Semantic Hybrid Search

# Semantic Hybrid Search
query = "what is azure sarch?"

search_client = SearchClient(
    service_endpoint, index_name, AzureKeyCredential(key))

results = search_client.search(
    search_text=query, #text
    vector=Vector(value=generate_embeddings(
        query), k=3, fields="contentVector"), #vector
    select=["title", "content", "category"],
    query_type="semantic", query_language="en-us", semantic_configuration_name='my-semantic-config', query_caption="extractive", query_answer="extractive", #semantic
    top=3
)

semantic_answers = results.get_answers()

Section 3 : Microsoft Semantic Kernel with Azure Cosmos DB

Microsoft Langchain Library supports C# and Python and offers several features, some of which are still in development and may be unclear on how to implement. However, it is simple, stable, and faster than Python-based open-source software. The features listed on the link include: Semantic Kernel Feature Matrix

sk

This section includes how to utilize Azure Cosmos DB for vector storage and vector search by leveraging the Semantic-Kernel.

Semantic-Kernel

  • appsettings.template.json : Environment value configuration file.
  • ComoseDBVectorSearch.cs : Vector Search using Azure Cosmos DB
  • CosmosDBKernelBuild.cs : Kernel Build code (test)
  • CosmosDBVectorStore.cs : Embedding Text and store it to Azure Cosmos DB
  • LoadDocumentPage.cs : PDF splitter class. Split the text to unit of section. (C# version of azure-search-openai-demo/scripts/prepdocs.py)
  • LoadDocumentPageOutput : LoadDocumentPage class generated output
  • MemoryContextAndPlanner.cs : Test code of context and planner
  • MemoryConversationHistory.cs : Test code of conversation history
  • Program.cs : Run a demo. Program Entry point
  • SemanticFunction.cs : Test code of conversation history
  • semanticKernelCosmos.csproj : C# Project file
  • Settings.cs : Environment value class
  • SkillBingSearch.cs : Bing Search Skill
  • SkillDALLEImgGen.cs : DALLE Skill (Only OpenAI, Azure Open AI not supports yet)

Environment variable

{
  "Type": "azure",
  "Model": "<model_deployment_name>",
  "EndPoint": "https://<your-endpoint-value>.openai.azure.com/",
  "AOAIApiKey": "<your-key>",
  "OAIApiKey": "",
  "OrdId": "-", //The value needs only when using Open AI.
  "BingSearchAPIKey": "<your-key>",
  "aoaiDomainName": "<your-endpoint-value>",
  "CosmosConnectionString": "<cosmos-connection-string>"
}
  • Semantic Kernel has recently introduced support for Azure Cognitive Search as a memory. However, it currently only supports Azure Cognitive Search with a Semantic Search interface, lacking any features to store vectors to ACS.

  • According to the comments, this suggests that the strategy of the plan could be divided into two parts. One part focuses on Semantic Search, while the other involves generating embeddings using OpenAI.

Azure Cognitive Search automatically indexes your data semantically, so you don't need to worry about embedding generation. samples/dotnet/kernel-syntax-examples/Example14_SemanticMemory.cs.

// TODO: use vectors
// @Microsoft Semactic Kernel
var options = new SearchOptions
{
        QueryType = SearchQueryType.Semantic,
        SemanticConfigurationName = "default",
        QueryLanguage = "en-us",
        Size = limit,
};
  • SemanticKernel Implementation sample to overcome Token limits of Open AI model. Semantic Kernel でトークンの限界を超えるような長い文章を分割してスキルに渡して結果を結合したい (zenn.dev) Semantic Kernel でトークンの限界を超える

Bing search Web UI and Semantic Kernel sample code

Semantic Kernel sample code to integrate with Bing Search (ReAct??)

\ms-semactic-bing-notebook

  • gs_chatgpt.ipynb: Azure Open AI ChatGPT sample to use Bing Search
  • gs_davinci.ipynb: Azure Open AI Davinci sample to use Bing Search

Bing Search UI for demo

\bing-search-webui: (utility)

bingwebui

Section 4 : Langchain code

cite: @practical-ai

Langchain Quick Start: How to Use and Useful Utilities

  • Langchain_1_(믹스의_인공지능).ipynb : Langchain Get started
  • langchain_1_(믹스의_인공지능).py : -
  • Langchain_2_(믹스의_인공지능).ipynb : Langchain Utilities
  • langchain_2_(믹스의_인공지능).py : -
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(chat, chain_type="map_reduce", verbose=True)
chain.run(docs[:3])

Langchain chain_type

  • stuff: Sends everything at once in LLM. If it's too long, an error will occur.
  • map_reduce: Summarizes by dividing and then summarizing the entire summary.
  • refine: (Summary + Next document) => Summary
  • map_rerank: Ranks by score and summarizes to important points.

Section 5: Prompt Engineering, and Langchain vs Semantic Kernel

Prompt Engineering

  1. Zero-shot
  2. Few-shot Learning
  3. Chain of Thought (CoT): ReAct and Self Consistency also inherit the CoT concept.
  4. Recursively Criticizes and Improves (RCI)
  5. ReAct: Grounding with external sources. (Reasoning and Act)
  6. Chain-of-Thought Prompting (paper)
  7. Tree of Thought (github)
  • Prompt Concept
  1. Question-Answering
  2. Roll-play: Act as a [ROLE] perform [TASK] in [FORMAT]
  3. Reasoning
  4. Prompt-Chain
  5. Program Aided Language Model
  6. Recursive Summarization: Long Text -> Chunks -> Summarize pieces -> Concatenate -> Summarize

OpenAI Prompt Guide

DeepLearning.ai Prompt Engineering COURSE and others

Awesome ChatGPT Prompts

ChatGPT : “user”, “assistant”, and “system” messages.

To be specific, the ChatGPT API allows for differentiation between “user”, “assistant”, and “system” messages.

  1. always obey "system" messages.
  2. all end user input in the “user” messages.
  3. "assistant" messages as previous chat responses from the assistant.

Presumably, the model is trained to treat the user messages as human messages, system messages as some system level configuration, and assistant messages as previous chat responses from the assistant. (@https://blog.langchain.dev/using-chatgpt-api-to-evaluate-chatgpt/)

Finetuning

PEFT: Parameter-Efficient Fine-Tuning (Youtube)

LoRA

Sparsification

@Binghchat

Sparsification is a technique used to reduce the size of large language models (LLMs) by removing redundant parameters without significantly affecting their performance. It is one of the methods used to compress LLMs. LLMs are neural networks that are trained on massive amounts of data and can generate human-like text. The term “sparsification” refers to the process of removing redundant parameters from these models.

Langchain vs Semantic Kernel

Langchain Semantic Kernel
Memory Memory
Tookit Skill
Tool Function (Native, Semantic)
Agent Planner
Chain Steps, Pipeline
Tool Connector

Semantic Kernel : Semantic Function

expressed in natural language in a text file "skprompt.txt" using SK's Prompt Template language. Each semantic function is defined by a unique prompt template file, developed using modern

Semantic Kernel : Prompt Template language Key takeaways

  1. Variables : use the {{$variableName}} syntax : Hello {{$name}}, welcome to Semantic Kernel!

  2. Function calls: use the {{namespace.functionName}} syntax : The weather today is {{weather.getForecast}}.

  3. Function parameters: {{namespace.functionName $varName}} and {{namespace.functionName "value"}} syntax : The weather today in {{$city}} is {{weather.getForecast $city}}.

  4. Prompts needing double curly braces : {{ "{{" }} and {{ "}}" }} are special SK sequences.

  5. Values that include quotes, and escaping :

For instance:

... {{ 'no need to \"escape" ' }} ... is equivalent to:

... {{ 'no need to "escape" ' }} ...

Langchain Agent

  1. If you're using a text LLM, first try zero-shot-react-description.

  2. If you're using a Chat Model, try chat-zero-shot-react-description.

  3. If you're using a Chat Model and want to use memory, try conversational-react-description.

  4. self-ask-with-search: self ask with search paper

  5. react-docstore: ReAct paper

Sementic Kernel Glossary

Glossary in Git

Glossary in MS Doc

Journey Short Description
ASK A user's goal is sent to SK as an ASK
Kernel The kernel orchestrates a user's ASK
Planner The planner breaks it down into steps based upon resources that are available
Resources Planning involves leveraging available skills, memories, and connectors
Steps A plan is a series of steps for the kernel to execute
Pipeline Executing the steps results in fulfilling the user's ASK
GET And the user gets what they asked for ...

Section 6 : Improvement

Math problem-solving skill

OpenAI's plans according to Sam Altman

Section 7 : List of OSS LLM

List of OSS LLM

Huggingface Open LLM Learboard

Section 8 : References

Langchain and Prompt engineering library

AutoGPT

picoGPT

  • An unnecessarily tiny implementation of GPT-2 in NumPy. picoGPT

Communicative Agents

  • lightaime/camel: 🐫 CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society (github.com)
  • 1:1 Conversation between two ai agents Camel Agents - a Hugging Face Space by camel-ai Hugging Face (camel-agents)

Democratizing the magic of ChatGPT with open models

Hugging face Transformer

Hugging face StarCoder

MLLM (multimodal large language model)

  • Facebook: ImageBind / SAM (Just Info)
  1. facebookresearch/ImageBind: ImageBind One Embedding Space to Bind Them All (github.com)
  2. facebookresearch/segment-anything(SAM): The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. (github.com)
  • Microsoft: Kosmos-1
  1. [2302.14045] Language Is Not All You Need: Aligning Perception with Language Models (arxiv.org)
  2. Language Is Not All You Need

Generate 3D

openai/shap-e: Generate 3D objects conditioned on text or images (github.com)

DragGAN

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (paper)

string2string

The string2string library is an open-source tool that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems.

string2string

Tiktoken Alternative in C#

microsoft/Tokenizer: .NET and Typescript implementation of BPE tokenizer for OpenAI LLMs. (github.com) microsoft/Tokenizer

UI/UX

PDF with ChatGPT

  • Embedding does not use Open AI. Can be executed locally. pdfGPT

Edge and Chrome Extension / Plugin

etc.

日本語(Japanese Materials)

Acknowledgements

About

Azure Open AI, OSS LLM 🌊1. Vector storage and 🦙langchain 🔎2. Azure Search ChatGpt demo 3. Microsoft ♾️Semantic-Kernel with 🌌 Cosmos DB, etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 60.8%
  • Python 18.4%
  • TypeScript 6.2%
  • C# 5.4%
  • JavaScript 2.0%
  • Bicep 2.0%
  • Other 5.2%