Merge pull request NVIDIA#157 from meiranp-nvidia/meiranp-nv/llm_prom…

…pt_helpers Add new tool : llm prompt design helper
ckrapu-nv · Aug 16, 2024 · b723c72 · b723c72
2 parents b2fd9c5 + ff6e6ec
commit b723c72
Show file tree

Hide file tree

Showing 22 changed files with 1,648 additions and 1 deletion.
diff --git a/experimental/README.md b/experimental/README.md
@@ -58,4 +58,8 @@ Experimental examples are sample code and deployments for RAG pipelines that are
 
 * [NVIDIA Event Driven RAG for CVE Analysis with NVIDIA Morpheus](./event-driven-rag-cve-analysis/)
 
-  This example demonstrates how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines. These pipelines will be used to automatically and scalably traige and detect Common Vulnerabilities and Exposures (CVEs) in Docker containers using references to source code, dependencies, and information about the CVEs.
+  This example demonstrates how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines. These pipelines will be used to automatically and scalably traige and detect Common Vulnerabilities and Exposures (CVEs) in Docker containers using references to source code, dependencies, and information about the CVEs.
+
+* [LLM Prompt Design Helper using NIM](./llm-prompt-design-helper/)
+
+  This tool demonstrates how to utilize a user-friendly interface to interact with NVIDIA NIMs, including those available in the API catalog, self-deployed NIM endpoints, and NIMs hosted on Hugging Face. It also provides settings to integrate RAG pipelines with either local and temporary vector stores or self-hosted search engines. Developers can use this tool to design system prompts, few-shot prompts, and configure LLM settings.
diff --git a/experimental/llm-prompt-design-helper/.gitattributes b/experimental/llm-prompt-design-helper/.gitattributes
@@ -0,0 +1 @@
+*.gif filter=lfs diff=lfs merge=lfs -text
diff --git a/experimental/llm-prompt-design-helper/Dockerfile b/experimental/llm-prompt-design-helper/Dockerfile
@@ -0,0 +1,17 @@
+FROM ubuntu:20.04
+
+RUN apt-get -y update
+RUN apt-get -y install python3 python3-pip
+
+# RUN mkdir /chat_ui
+# COPY chat_ui.py /chat_ui
+# COPY config.yaml /chat_ui
+# COPY api_request.py /chat_ui
+COPY requirements.txt /chat_ui/
+WORKDIR /chat_ui/
+
+RUN pip3 install --upgrade pip
+RUN pip3 install -r requirements.txt
+
+ENTRYPOINT ["python3"]
+CMD ["-u", "chat_ui.py"]
diff --git a/experimental/llm-prompt-design-helper/README.md b/experimental/llm-prompt-design-helper/README.md
@@ -0,0 +1,132 @@
+# guide_to_integrate_api_catalog
+
+This project is used to create a simple UI to interact with selectable NIM endpoints (see below supported endpoints) and integrate RAG pipeline.
+
+- [API catalog](https://build.nvidia.com/explore/discover) hold by NVIDIA.
+- Self-host NIM
+- HuggingFace NIM
+
+
+## Target Users
+This project targets to help developers who:
+- Want to evaluate different NIM LLMs with small or large dataset.
+- Need to tune parameters, such as temperature, top_p, etc.
+- Need to do prompt engineering, such as system prompt, few shot examples, etc.
+- Need to design some simple agents based on prompt engineering.
+- Want to integrate with RAG pipeline to evaluate the designed system prompt.
+
+## System prompt helper
+
+![screenshot of the UI](./data/simple_ui.jpeg)
+
+The provided interface of this project supports designing a system prompt to call the LLM. The system prompt is configured in the `config.yaml` file using the model name as the key, e.g., `"meta/llama3-70b-instruct"`. You can also add few-shot examples in the `config.yaml` file (there are some commented lines for description) or via the UI in a defined format for your typical use case.
+
+For development purposes, developers can use this interface to design the system prompt interactively. After selecting the model, you can input a new system prompt, which will overwrite the system prompt in `config.yaml`. If the system prompt is defined, you can configure it for the related model in `config.yaml` by clicking `Update Yaml based on UI settings` button. 
+
+The interface will automatically load the selected model's conguration from `config.yaml` and display it in the UI. Additionally, it will list available chat models from the API catalog via `langchain-nvidia-ai-endpoints` in a dropdown menu. To see the list from the API catalog, you need to set the API key by following the instructions in the next section. If new models are not available via the endpoints or you want to test with self-hosted or Hugging Face NIMs endpoints, you can manually insert the model via the UI textbox (Input the name under `Model name in API catalog`, then click `Insert the model into list` button)
+
+Note: To insert models deployed in API catalog, pls using the same name as defined in the API catalog.
+
+## Integrate with RAG pipeline
+![screenshot of the UI - DB](./data/simple_ui_db.jpeg)
+
+This tool provides two methods to integrate with the RAG pipeline:
+1. Generate a temporary vector store for retrieval.
+2. Interact with a self-hosted retrieval engine that provides an endpoint for retrieval.
+
+### Temporary vector store
+This tool supports inserting website HTML links, downloadable PDF links, and uploading PDFs from local storage. 
+
+By clicking the **DataBase** tab in the UI, you can input website links or downloadable PDF links, using commas to separate multiple entries. You can also upload PDFs by clicking the `Click to Upload Local PDFs` button. Once the data sources are prepared, you can set the chunk size, chunk overlap size, and select one of the embedder models in [NVIDIA　API catalog](https://build.nvidia.com/explore/retrieval). By clicking `Embedding and Insert`, the content will be parsed, embedded, and inserted into a temporary vector store.
+
+With this vector store set up, go back to the **LLM Prompt Designer** tab and expand the `Data Base settings`. The retrieval settings will be available. You can then select one of the Reranker models for the RAG pipeline, which are available in [NVIDIA　API catalog](https://build.nvidia.com/explore/retrieval).
+
+![screenshot of the Local Database settings - DB](./data/local-database-settings.jpeg)
+
+### Self-deployed retrieval engine
+This tool also supports interacting with self-hosted retrieval engine which provided an endpoint for retrieval. 
+
+Expand the `Data Base settings` -> `Self deployed vector database settings` in **LLM Prompt Designer** tab, input the engine endpoint, and query format string, using `{input}` as format query input. The retrieval database selection of `self-deployed-db` will be available. You can then select one of the Reranker models which are available in [NVIDIA　API catalog](https://build.nvidia.com/explore/retrieval) for the RAG pipeline or disable reranker by selecting `None`.
+
+![screenshot of the Self-Deployed Database settings - DB](./data/self-host-database-settings.jpeg)
+
+## Getting started
+### Prepare the docker image
+Run below command to build the docker image
+```bash
+git clone https://github.com/NVIDIA/GenerativeAIExamples/ && cd GenerativeAIExamples/community/llm-prompt-design-helper
+bash ./build_image.sh
+```
+
+### Start the project
+#### API catalog NIM endpoints
+Set the API key env before start the container. 
+
+```bash
+export API_CATALOG_KEY="nvapi-*******************"
+export NIM_INFER_URL="https://integrate.api.nvidia.com/v1"
+```
+
+If you don't have an API key, follow [these instructions](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/api-catalog.md#get-an-api-key-for-the-accessing-models-on-the-api-catalog) to sign up for an NVIDIA AI Foundation developer account and obtain access.
+
+Run below command to run the container.
+```bash
+bash ./run_container.sh
+```
+
+#### Self-host NIM endpoints
+If you already have access to self-host NIM, you can follow the [guide](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html) to set up the NIM.
+
+To inference via this UI, follow this [run inference](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#openai-completion-request) guide to get the base_url and api_key. Then run below command to set the environment. 
+
+```bash
+export API_CATALOG_KEY="not-used"
+export NIM_INFER_URL="http://0.0.0.0:8000/v1"
+```
+
+Run below command to run the container.
+```bash
+bash ./run_container.sh
+```
+
+NOTE: 
+1. If you have different models deployed with different IP, you can set the env once, and using UI -> Show more settings -> input your different IP and port like "http://{IP}:{PORT}/v1"
+2. The **Insert model mannually** feature will be disabled when inference with self-host NIM endpoint
+
+#### Hugging Face NIM endpoints
+NVIDIA have already collaboration with Hugging Face to simplify generative AI model deployments, you can follow this [technical blog](https://developer.nvidia.com/blog/nvidia-collaborates-with-hugging-face-to-simplify-generative-ai-model-deployments/) to deploy the NIM in Hugging Face. After the deployment, you can also interact with the NIM endpoints via this project. 
+
+To inference via this UI, get the base_url and api_key of Hugging Face. Then run below command to set the environment. 
+```bash
+export API_CATALOG_KEY="hf_xxxx"
+export NIM_INFER_URL="{hugging face inference URL}"
+```
+
+Run below command to run the container.
+```bash
+bash ./run_container.sh
+```
+NOTE: 
+1. The **Insert model mannually** feature will be disabled when inference with Hugging Face NIM endpoint
+
+### Access the UI
+After service starts up, you can open the UI via http://localhost:80/ 
+
+## Test with dataset
+If you want to test with a local dataset when the config.yaml is finalized, then you can load your test set and run the inference with the configuration to test. The sample scripts can refer [`test.py`](./test.py).
+
+### Demo
+To update the application port number instead of default 80, do following:
+- Update the port number in `chat_ui.py` line `UI_SERVER_PORT = int(os.getenv("UI_SERVER_PORT", 80))` 
+- Update the port number in `run_container.sh` line `docker run -d -p80:80 ***` 
+
+See the demo ![workflow demo](./data/llm-prompt-designer-demo.gif)
+
+## Contributing
+
+Please create a merge request to this repository, our team appreciates any and all contributions that add features! We will review and get back as soon as possible.
+
+
+
+
+
diff --git a/experimental/llm-prompt-design-helper/api_request_backends/api_request.py b/experimental/llm-prompt-design-helper/api_request_backends/api_request.py
@@ -0,0 +1,101 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import yaml
+import os
+from abc import ABC,abstractmethod
+import logging
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+
+API_CATALOG_KEY = os.getenv("API_CATALOG_KEY", "")
+NIM_INFER_URL = os.getenv("NIM_INFER_URL", "https://integrate.api.nvidia.com/v1")
+
+class APIRequest(ABC):
+    def __init__(self, config_path):
+        self.config_path = config_path
+        self.config = {}
+        with open(config_path, 'r') as file:
+            self.config = yaml.safe_load(file)
+        return 
+
+    def get_model_settings(self,api_model):
+        model_settings = self.config.get(api_model,None)
+        if model_settings == None:
+            logging.info(f"No config for {api_model}, load the default")
+            model_settings=self.config.get('default')
+
+        return model_settings
+    def update_yaml(self,api_model,parameters):
+        self.config.update({api_model:parameters})
+        with open(self.config_path, 'w') as file:
+            yaml.dump(self.config, file, default_flow_style=False, sort_keys=False)
+
+    def get_model_configuration(self,api_model):
+        model_config = self.get_model_settings(api_model)
+        # system_prompt = model_config.get('system_prompt','')    
+        return model_config
+
+    @abstractmethod
+    def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
+        pass
+
+    def generate_response(self,api_model,chat_messages,system_prompt=None,initial_prompt=None,temperature=None, top_p=None,max_tokens=None,few_shot_exampls=None,base_url='',context=''):
+        # Step 1: Get model config based on configuration yaml file.
+        model_config = self.get_model_settings(api_model)
+
+        # Step 2: Get the parameters for LLM based on different model.
+        temperature = temperature if temperature!=None else model_config.get("temperature",0.0)
+        top_p = top_p if top_p!=None else model_config.get("top_p",0.7)
+        max_tokens = max_tokens if max_tokens!=None else model_config.get("max_tokens",1024)
+
+        # Step 3: Prepare the messages to be sent to API catalog
+        #         System prompt
+        #         few shot examples.
+        oai_message = []
+        system_prompt_message = system_prompt if system_prompt!=None else model_config.get('system_prompt','')
+        if context:
+            system_prompt_message += f"\nUse the following pieces of retrieved context to answer the question. \n {context}"
+        fewshow_examples = few_shot_exampls if few_shot_exampls else model_config.get('few_shot_examples',[])
+        if system_prompt_message !='':
+            oai_message.append({'role': 'system', 'content': system_prompt_message})
+        if fewshow_examples:
+            for example in few_shot_exampls:
+                oai_message.append(example)
+
+        for item in chat_messages:
+            if item[0] == None and item[1] == initial_prompt:
+                continue
+            oai_message.append({'role': 'user', 'content': item[0]})
+            if item[1] != '' and item[1] !=None:
+                # add pure assitant response to chat history.
+                assistant_msg = item[1]
+                oai_message.append({'role': 'assistant', 'content': assistant_msg})
+
+        request_body = {
+            "model":api_model,
+            "messages":oai_message,
+            "temperature":temperature,
+            "top_p":top_p,
+            "max_tokens":max_tokens,
+            "stream":True
+        }
+        logging.info(request_body)
+
+        # Step 4: Send the requests using OpenAI Compatible API
+        return self.send_request(api_model,oai_message,temperature,top_p,max_tokens,base_url)
+
+
+
diff --git a/experimental/llm-prompt-design-helper/api_request_backends/chatnv_client.py b/experimental/llm-prompt-design-helper/api_request_backends/chatnv_client.py
@@ -0,0 +1,43 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .api_request import API_CATALOG_KEY,APIRequest,NIM_INFER_URL
+from langchain_nvidia_ai_endpoints import ChatNVIDIA
+
+class ChatNVDIAClient(APIRequest):
+    def __init__(self, config_path):
+        super().__init__(config_path)
+
+    def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
+        # default NIM_INFER_URL = "https://integrate.api.nvidia.com/v1"
+
+        base_url_infer = NIM_INFER_URL
+        api_key = API_CATALOG_KEY
+        if base_url !='':
+            base_url_infer = base_url
+
+        client = ChatNVIDIA(base_url= base_url_infer, api_key= api_key,model=api_model, temperature=temperature, max_tokens=max_tokens, top_p=top_p)
+        try:
+            completion = client.stream(oai_message,timeout=10.0)
+
+            # Step 5: Yield the output of delta content
+            for chunk in completion:
+                if chunk.content is not None:
+                    next_token= chunk.content
+                    yield next_token
+                else:
+                    pass
+        except Exception as e:
+            yield "Request is Error:\n" + str(e)
diff --git a/experimental/llm-prompt-design-helper/api_request_backends/openai_client.py b/experimental/llm-prompt-design-helper/api_request_backends/openai_client.py
@@ -0,0 +1,83 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .api_request import API_CATALOG_KEY,APIRequest,NIM_INFER_URL
+from openai import OpenAI
+from langchain_openai import ChatOpenAI
+
+class OpenAIClient(APIRequest):
+    def __init__(self, config_path):
+        super().__init__(config_path)       
+
+    def send_request_chain(self, api_model, oai_message, temperature, top_p, max_tokens, base_url=''):
+        base_url_infer = NIM_INFER_URL
+        api_key = API_CATALOG_KEY
+        if base_url !='':
+            base_url_infer = base_url
+
+        client = ChatOpenAI(
+            base_url = base_url_infer,
+            api_key = api_key,
+            model=api_model, 
+            temperature=temperature, 
+            max_tokens=max_tokens, 
+            model_kwargs={"top_p": top_p},
+            timeout=10.0
+            )
+        try:
+            completion = client.stream(oai_message)
+
+            # Step 5: Yield the output of delta content
+            for chunk in completion:
+                if chunk.content is not None:
+                    next_token= chunk.content
+                    yield next_token
+                else:
+                    pass
+        except Exception as e:
+            yield "Request is Error:\n" + str(e)
+
+    def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
+        # default NIM_INFER_URL = "https://integrate.api.nvidia.com/v1"
+
+        base_url_infer = NIM_INFER_URL
+        api_key = API_CATALOG_KEY
+        if base_url !='':
+            base_url_infer = base_url
+
+        client = OpenAI(
+            base_url = base_url_infer,
+            api_key = api_key
+            )
+        try:
+            completion = client.chat.completions.create(
+                                model=api_model,
+                                messages=oai_message,
+                                temperature=temperature,
+                                top_p=top_p,
+                                max_tokens=max_tokens,
+                                stream=True,
+                                timeout=10.0
+                            )
+
+            # Step 5: Yield the output of delta content
+            for chunk in completion:
+                if chunk.choices[0].delta.content is not None:
+                    next_token= chunk.choices[0].delta.content
+                    yield next_token
+                else:
+                    pass
+        except Exception as e:
+            yield "Request is Error:\n" + str(e)