Skip to content

Commit

Permalink
Merge pull request NVIDIA#157 from meiranp-nvidia/meiranp-nv/llm_prom…
Browse files Browse the repository at this point in the history
…pt_helpers

Add new tool : llm prompt design helper
  • Loading branch information
dglogo authored Aug 16, 2024
2 parents b2fd9c5 + ff6e6ec commit b723c72
Show file tree
Hide file tree
Showing 22 changed files with 1,648 additions and 1 deletion.
6 changes: 5 additions & 1 deletion experimental/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,8 @@ Experimental examples are sample code and deployments for RAG pipelines that are

* [NVIDIA Event Driven RAG for CVE Analysis with NVIDIA Morpheus](./event-driven-rag-cve-analysis/)

This example demonstrates how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines. These pipelines will be used to automatically and scalably traige and detect Common Vulnerabilities and Exposures (CVEs) in Docker containers using references to source code, dependencies, and information about the CVEs.
This example demonstrates how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines. These pipelines will be used to automatically and scalably traige and detect Common Vulnerabilities and Exposures (CVEs) in Docker containers using references to source code, dependencies, and information about the CVEs.

* [LLM Prompt Design Helper using NIM](./llm-prompt-design-helper/)

This tool demonstrates how to utilize a user-friendly interface to interact with NVIDIA NIMs, including those available in the API catalog, self-deployed NIM endpoints, and NIMs hosted on Hugging Face. It also provides settings to integrate RAG pipelines with either local and temporary vector stores or self-hosted search engines. Developers can use this tool to design system prompts, few-shot prompts, and configure LLM settings.
1 change: 1 addition & 0 deletions experimental/llm-prompt-design-helper/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.gif filter=lfs diff=lfs merge=lfs -text
17 changes: 17 additions & 0 deletions experimental/llm-prompt-design-helper/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
FROM ubuntu:20.04

RUN apt-get -y update
RUN apt-get -y install python3 python3-pip

# RUN mkdir /chat_ui
# COPY chat_ui.py /chat_ui
# COPY config.yaml /chat_ui
# COPY api_request.py /chat_ui
COPY requirements.txt /chat_ui/
WORKDIR /chat_ui/

RUN pip3 install --upgrade pip
RUN pip3 install -r requirements.txt

ENTRYPOINT ["python3"]
CMD ["-u", "chat_ui.py"]
132 changes: 132 additions & 0 deletions experimental/llm-prompt-design-helper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# guide_to_integrate_api_catalog

This project is used to create a simple UI to interact with selectable NIM endpoints (see below supported endpoints) and integrate RAG pipeline.

- [API catalog](https://build.nvidia.com/explore/discover) hold by NVIDIA.
- Self-host NIM
- HuggingFace NIM


## Target Users
This project targets to help developers who:
- Want to evaluate different NIM LLMs with small or large dataset.
- Need to tune parameters, such as temperature, top_p, etc.
- Need to do prompt engineering, such as system prompt, few shot examples, etc.
- Need to design some simple agents based on prompt engineering.
- Want to integrate with RAG pipeline to evaluate the designed system prompt.

## System prompt helper

![screenshot of the UI](./data/simple_ui.jpeg)

The provided interface of this project supports designing a system prompt to call the LLM. The system prompt is configured in the `config.yaml` file using the model name as the key, e.g., `"meta/llama3-70b-instruct"`. You can also add few-shot examples in the `config.yaml` file (there are some commented lines for description) or via the UI in a defined format for your typical use case.

For development purposes, developers can use this interface to design the system prompt interactively. After selecting the model, you can input a new system prompt, which will overwrite the system prompt in `config.yaml`. If the system prompt is defined, you can configure it for the related model in `config.yaml` by clicking `Update Yaml based on UI settings` button.

The interface will automatically load the selected model's conguration from `config.yaml` and display it in the UI. Additionally, it will list available chat models from the API catalog via `langchain-nvidia-ai-endpoints` in a dropdown menu. To see the list from the API catalog, you need to set the API key by following the instructions in the next section. If new models are not available via the endpoints or you want to test with self-hosted or Hugging Face NIMs endpoints, you can manually insert the model via the UI textbox (Input the name under `Model name in API catalog`, then click `Insert the model into list` button)

Note: To insert models deployed in API catalog, pls using the same name as defined in the API catalog.

## Integrate with RAG pipeline
![screenshot of the UI - DB](./data/simple_ui_db.jpeg)

This tool provides two methods to integrate with the RAG pipeline:
1. Generate a temporary vector store for retrieval.
2. Interact with a self-hosted retrieval engine that provides an endpoint for retrieval.

### Temporary vector store
This tool supports inserting website HTML links, downloadable PDF links, and uploading PDFs from local storage.

By clicking the **DataBase** tab in the UI, you can input website links or downloadable PDF links, using commas to separate multiple entries. You can also upload PDFs by clicking the `Click to Upload Local PDFs` button. Once the data sources are prepared, you can set the chunk size, chunk overlap size, and select one of the embedder models in [NVIDIA API catalog](https://build.nvidia.com/explore/retrieval). By clicking `Embedding and Insert`, the content will be parsed, embedded, and inserted into a temporary vector store.

With this vector store set up, go back to the **LLM Prompt Designer** tab and expand the `Data Base settings`. The retrieval settings will be available. You can then select one of the Reranker models for the RAG pipeline, which are available in [NVIDIA API catalog](https://build.nvidia.com/explore/retrieval).

![screenshot of the Local Database settings - DB](./data/local-database-settings.jpeg)

### Self-deployed retrieval engine
This tool also supports interacting with self-hosted retrieval engine which provided an endpoint for retrieval.

Expand the `Data Base settings` -> `Self deployed vector database settings` in **LLM Prompt Designer** tab, input the engine endpoint, and query format string, using `{input}` as format query input. The retrieval database selection of `self-deployed-db` will be available. You can then select one of the Reranker models which are available in [NVIDIA API catalog](https://build.nvidia.com/explore/retrieval) for the RAG pipeline or disable reranker by selecting `None`.

![screenshot of the Self-Deployed Database settings - DB](./data/self-host-database-settings.jpeg)

## Getting started
### Prepare the docker image
Run below command to build the docker image
```bash
git clone https://github.com/NVIDIA/GenerativeAIExamples/ && cd GenerativeAIExamples/community/llm-prompt-design-helper
bash ./build_image.sh
```

### Start the project
#### API catalog NIM endpoints
Set the API key env before start the container.

```bash
export API_CATALOG_KEY="nvapi-*******************"
export NIM_INFER_URL="https://integrate.api.nvidia.com/v1"
```

If you don't have an API key, follow [these instructions](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/api-catalog.md#get-an-api-key-for-the-accessing-models-on-the-api-catalog) to sign up for an NVIDIA AI Foundation developer account and obtain access.

Run below command to run the container.
```bash
bash ./run_container.sh
```

#### Self-host NIM endpoints
If you already have access to self-host NIM, you can follow the [guide](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html) to set up the NIM.

To inference via this UI, follow this [run inference](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#openai-completion-request) guide to get the base_url and api_key. Then run below command to set the environment.

```bash
export API_CATALOG_KEY="not-used"
export NIM_INFER_URL="http://0.0.0.0:8000/v1"
```

Run below command to run the container.
```bash
bash ./run_container.sh
```

NOTE:
1. If you have different models deployed with different IP, you can set the env once, and using UI -> Show more settings -> input your different IP and port like "http://{IP}:{PORT}/v1"
2. The **Insert model mannually** feature will be disabled when inference with self-host NIM endpoint

#### Hugging Face NIM endpoints
NVIDIA have already collaboration with Hugging Face to simplify generative AI model deployments, you can follow this [technical blog](https://developer.nvidia.com/blog/nvidia-collaborates-with-hugging-face-to-simplify-generative-ai-model-deployments/) to deploy the NIM in Hugging Face. After the deployment, you can also interact with the NIM endpoints via this project.

To inference via this UI, get the base_url and api_key of Hugging Face. Then run below command to set the environment.
```bash
export API_CATALOG_KEY="hf_xxxx"
export NIM_INFER_URL="{hugging face inference URL}"
```

Run below command to run the container.
```bash
bash ./run_container.sh
```
NOTE:
1. The **Insert model mannually** feature will be disabled when inference with Hugging Face NIM endpoint

### Access the UI
After service starts up, you can open the UI via http://localhost:80/

## Test with dataset
If you want to test with a local dataset when the config.yaml is finalized, then you can load your test set and run the inference with the configuration to test. The sample scripts can refer [`test.py`](./test.py).

### Demo
To update the application port number instead of default 80, do following:
- Update the port number in `chat_ui.py` line `UI_SERVER_PORT = int(os.getenv("UI_SERVER_PORT", 80))`
- Update the port number in `run_container.sh` line `docker run -d -p80:80 ***`

See the demo ![workflow demo](./data/llm-prompt-designer-demo.gif)

## Contributing

Please create a merge request to this repository, our team appreciates any and all contributions that add features! We will review and get back as soon as possible.





Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import yaml
import os
from abc import ABC,abstractmethod
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

API_CATALOG_KEY = os.getenv("API_CATALOG_KEY", "")
NIM_INFER_URL = os.getenv("NIM_INFER_URL", "https://integrate.api.nvidia.com/v1")

class APIRequest(ABC):
def __init__(self, config_path):
self.config_path = config_path
self.config = {}
with open(config_path, 'r') as file:
self.config = yaml.safe_load(file)
return

def get_model_settings(self,api_model):
model_settings = self.config.get(api_model,None)
if model_settings == None:
logging.info(f"No config for {api_model}, load the default")
model_settings=self.config.get('default')

return model_settings
def update_yaml(self,api_model,parameters):
self.config.update({api_model:parameters})
with open(self.config_path, 'w') as file:
yaml.dump(self.config, file, default_flow_style=False, sort_keys=False)

def get_model_configuration(self,api_model):
model_config = self.get_model_settings(api_model)
# system_prompt = model_config.get('system_prompt','')
return model_config

@abstractmethod
def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
pass

def generate_response(self,api_model,chat_messages,system_prompt=None,initial_prompt=None,temperature=None, top_p=None,max_tokens=None,few_shot_exampls=None,base_url='',context=''):
# Step 1: Get model config based on configuration yaml file.
model_config = self.get_model_settings(api_model)

# Step 2: Get the parameters for LLM based on different model.
temperature = temperature if temperature!=None else model_config.get("temperature",0.0)
top_p = top_p if top_p!=None else model_config.get("top_p",0.7)
max_tokens = max_tokens if max_tokens!=None else model_config.get("max_tokens",1024)

# Step 3: Prepare the messages to be sent to API catalog
# System prompt
# few shot examples.
oai_message = []
system_prompt_message = system_prompt if system_prompt!=None else model_config.get('system_prompt','')
if context:
system_prompt_message += f"\nUse the following pieces of retrieved context to answer the question. \n {context}"
fewshow_examples = few_shot_exampls if few_shot_exampls else model_config.get('few_shot_examples',[])
if system_prompt_message !='':
oai_message.append({'role': 'system', 'content': system_prompt_message})
if fewshow_examples:
for example in few_shot_exampls:
oai_message.append(example)

for item in chat_messages:
if item[0] == None and item[1] == initial_prompt:
continue
oai_message.append({'role': 'user', 'content': item[0]})
if item[1] != '' and item[1] !=None:
# add pure assitant response to chat history.
assistant_msg = item[1]
oai_message.append({'role': 'assistant', 'content': assistant_msg})

request_body = {
"model":api_model,
"messages":oai_message,
"temperature":temperature,
"top_p":top_p,
"max_tokens":max_tokens,
"stream":True
}
logging.info(request_body)

# Step 4: Send the requests using OpenAI Compatible API
return self.send_request(api_model,oai_message,temperature,top_p,max_tokens,base_url)



Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .api_request import API_CATALOG_KEY,APIRequest,NIM_INFER_URL
from langchain_nvidia_ai_endpoints import ChatNVIDIA

class ChatNVDIAClient(APIRequest):
def __init__(self, config_path):
super().__init__(config_path)

def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
# default NIM_INFER_URL = "https://integrate.api.nvidia.com/v1"

base_url_infer = NIM_INFER_URL
api_key = API_CATALOG_KEY
if base_url !='':
base_url_infer = base_url

client = ChatNVIDIA(base_url= base_url_infer, api_key= api_key,model=api_model, temperature=temperature, max_tokens=max_tokens, top_p=top_p)
try:
completion = client.stream(oai_message,timeout=10.0)

# Step 5: Yield the output of delta content
for chunk in completion:
if chunk.content is not None:
next_token= chunk.content
yield next_token
else:
pass
except Exception as e:
yield "Request is Error:\n" + str(e)
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from .api_request import API_CATALOG_KEY,APIRequest,NIM_INFER_URL
from openai import OpenAI
from langchain_openai import ChatOpenAI

class OpenAIClient(APIRequest):
def __init__(self, config_path):
super().__init__(config_path)

def send_request_chain(self, api_model, oai_message, temperature, top_p, max_tokens, base_url=''):
base_url_infer = NIM_INFER_URL
api_key = API_CATALOG_KEY
if base_url !='':
base_url_infer = base_url

client = ChatOpenAI(
base_url = base_url_infer,
api_key = api_key,
model=api_model,
temperature=temperature,
max_tokens=max_tokens,
model_kwargs={"top_p": top_p},
timeout=10.0
)
try:
completion = client.stream(oai_message)

# Step 5: Yield the output of delta content
for chunk in completion:
if chunk.content is not None:
next_token= chunk.content
yield next_token
else:
pass
except Exception as e:
yield "Request is Error:\n" + str(e)

def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
# default NIM_INFER_URL = "https://integrate.api.nvidia.com/v1"

base_url_infer = NIM_INFER_URL
api_key = API_CATALOG_KEY
if base_url !='':
base_url_infer = base_url

client = OpenAI(
base_url = base_url_infer,
api_key = api_key
)
try:
completion = client.chat.completions.create(
model=api_model,
messages=oai_message,
temperature=temperature,
top_p=top_p,
max_tokens=max_tokens,
stream=True,
timeout=10.0
)

# Step 5: Yield the output of delta content
for chunk in completion:
if chunk.choices[0].delta.content is not None:
next_token= chunk.choices[0].delta.content
yield next_token
else:
pass
except Exception as e:
yield "Request is Error:\n" + str(e)
Loading

0 comments on commit b723c72

Please sign in to comment.