Every day, Security Operation Center (SOC) analysts receive an overwhelming amount of security alerts, from which they must triage out false positives and identify true security breaches. In this example, we create a RAG enabled co-pilot for this task. We cover the steps needed to make a co-pilot, which can be applied to any use case or industry where data retrieval and synthesis can be simple but tedious. We cover multi-step agentic reasoning, data ingestion for RAG, speech input/output, and face model animation.
This project will walk you through a broad sample of NVIDIA technologies, and no prior knowledge is necessary:
- Morpheus SDK with the Digital Fingerprinting (DFP) autoencoder workflow for anomaly detection in users
- Riva Speech Services (Text-to-Speech and Automatic Speech Recognition)
- NeMo Retriever for Retrieval Augmented Generation (RAG)
- Omniverse Audio2Face for animating a Digital Human
This demo of the Analyst Morpheus project is running in Unreal Engine 5.4, using the default Metahuman model Omar.
The co-pilot Analyst Morpheus uses a multi-step agentic reasoning workflow via langchain. The agent has access to several tools including a Network Traffic Database, a User Directory, a Threat Intelligence database, and other sources that mimic what a real-world SOC analyst would use. The agent also has access to an Alert Summaries Database of real-time security alerts generated by NVIDIA LLM NIMs in combination with NVIDIA Morpheus's Digital Fingerprinting workflow for anomaly detection. The goal is to exemplify how live data can be ingested into a RAG co-pilot workflow. We use Reranking and Embedding models from NeMo Retriever.
The digital human takes voice queries, which are transcribed into text via NVIDIA Riva Automatic Speech Recognition (ASR). The RAG text response is turned into audio output via NVIDIA Riva Text-to-Speech (TTS). This output is then used by Omniverse Audio2Face to animate a Metahuman face model, which is rendered in Unreal Engine. Optionally, you can render in the Omniverse Audio2Face App instead.
Read more about this project in the Spotlight Blog post [here]. [Link to blog coming soon!]
From build.nvidia.com, generate a Llama 3.1 key, a Embedding API key, and a Reranking API key. Add all three keys to the .env
file.
[!NOTE]
Section 1 focuses on data preprocessing such as turning numerical data into natural language reports; specifically, we turn numerical anomaly scores into user summary reports. The RAG system will then use the summary reports as context. However, in this repo we include pre-generated summary reports, so you can choose to skip to Section 2 if you would like to just focus on the RAG and Digital Human aspects.
In order to create a co-pilot that has accurate and up-to-date knowledge of a specific organization's security operations landscape, we have to first devise a method for ingestion of alerts, logs, network traffic, and other data that an analyst would usually have access to. Taking a step back, we use the Morpheus SDK to train autoencoder models which will output numerical alerts when anomalous user activity is detected. We then use NVIDIA NIMs to create natural language summaries of such alerts on a per-user basis. Lastly, we upload the alert summaries into an asynchronous NeMo Retriever collection that serves as our RAG knowledge source.
Pull the 24.03 version of the Morpheus container image from NGC.
docker pull nvcr.io/nvidia/morpheus/morpheus:24.03-runtime
[!OPTIONAL]
By default, this project code is configured by default to use the API found at build.nvidia.com. To switch to a self hosted NIM, generate a valid NIM key from NGC and pull the LLaMa-3-8b NIM 24.05 image:docker pull nvcr.io/nim/meta/llama3-8b-instruct:1.0.0
. Uncomment the nim-llm portion ofdocker-compose.yml
[!OPTIONAL] In
docker-compose.yml
, you can also configure the number of GPUs to use under theCUDA_VISIBLE_DEVICES
parameter.
Run the container.
docker compose up jupyter
Check that containers spun up successfully. You should see jupyter and mlflow.
docker compose ps
If you elected for a self hosted LLM NIM, you should also see nim-llm. Otherwise, try docker compose up nim-llm
.
[!TROUBLESHOOTING]
If any of the containers don't spin up, ensure that the GPUs you're pointing to in the docker-compose.yml file have enough memory available. You can usedocker compose logs [container-name]
to check for issues. Also, if you have configured a non-default port for jupyter, be sure to manually forward your port.
Once inside the jupyter notebook, you will find the folders you would see in the open source Morpheus github repo. Navigate to where our security-analyst-digital-human
folder is mounted: examples/digital_fingerprinting/production/morpheus
First, we will need to train the baseline autoencoder models per entity (user) so that we can run inference with these models to detect anomalies in user behavior.
Open a new terminal in the jupyter notebook and type bash
to open a new bash shell. Navigate to the path /workspace/examples/digital_fingerprinting/production/morpheus
.
cd /workspace/examples/digital_fingerprinting/production/morpheus
From here, run the following commands to download some training data. We will be using azure logs; you can view the log features here. If the reconstruction loss (which the Morpheus autoencoder class translates into a z-score for increased explanability) of any of these features is high past a configurable threshold, we will trigger an alert. For example, if the appIncrement feature describing the number of apps the user is using is higher than expected, we consider this anomalous.
pip install s3fs
./../../fetch_example_data.py azure
Now, we can run all cells in the training pipeline notebook dfp_azure_training.ipynb
located in the examples/digital_fingerprinting/production/morpheus/workspace
directory. This notebook may take several minutes to run.
Once baseline models are trained and stored in mlflow, we want to run our inference pipeline on validation azure logs to see if we can detect unusual user behavior. The dfp_azure_chatbot.ipynb
notebook will output each feature and the corresponding z-score. We will aggregate all unusually high z-scores belonging to a particular user, and use this to generate a summary report using NIM LLMs. The goal is to provide an additional layer of natural language explainability. We will also enrich this per-user summary with Threat Intelligence pulled from the Internet (stored at upload_intel/intel/cyber_enrichment
).
We want to populate a vector database with our Threat Intelligence. To do so, we will use langchain and NVIDIA reranking and embedding APIs from build.nvidia.com.
[!OPTIONAL]
You can also easily host your own reranking and embedding microservices. Read more here.
Run all cells of the notebook. It may take several minutes. Alert summaries per user will be generated and uploaded to a new vector database.
[!TROUBLESHOOTING]
If you recieve a File Does Not Exist error, create the missing file (and any missing directories along the specified file path) in the Jupyter notebook.
[!OPTIONAL]
This example project is currently configured to point to build.nvidia API for the Summary Inference Morpheus pipeline step. To use your self-hosted LLM NIM instead (which you have spun up in a previous optional step), edit thechat_nvidia_service.py
file and add abase_url
parameter to the ChatNVIDIA model instantiation. Point thebase_url
to your locally hosted model.
[!NOTE]
Since we only have one anomalous user in our validation data ([email protected]), we manually add two more example reports in the "RAG Upload to User Summaries Vector Database" section of the notebook. (You can view the example reports atupload_intel/intel/user_summaries
).
We will now move out of jupyter notebook for the remainder of this project. Inside the ragbot
folder contains scripts that will drive the chatbot as well as give it access to the data sources a SOC analyst would typically have. This access is given in the form of Langchain Tools, which can be found in ragbot/agent_tools.py
. There are currently five tools:
- Access to a User Directory containing user email, full name, endpoint device IP, department, city
- Access to a Network Traffic Database containing connections that detail destination url, timestamp, source IP, destination IP, source port, destination port, bytes sent, bytes received, protocol, user
- Access to a Threat Intelligence Database containing known malicious URLs
- Access to an Email Security Gateway, containing the content of emails flagged as malicious as well as corresponding users they were sent to and timestamp
- Access to Alert Summaries, containing the per-user summaries (generated by our
dfp_azure_chatbot.ipynb
notebook)
Each of these tools has a corresponding knowledge source text file which can be found in the upload_intel/intel/tools
directory. Each tool is essentially a RAG query into the specific collection in which the relevant data is stored.
[!OPTIONAL]
You can experiment with adding your own knowledge source and corresponding tool by adding your content to a new .txt file, then uploading it to a new Retriever collection. Seeragbot/agent_tools
to see how tools are defined.
[!NOTE]
In addition to the five databases mentioned above, we will also be creating a "General Collection" which will contain the combined contents of all databases. The purpose is to achieve lower latency. If the end-user's query does not require multiple tools, then the LLM will not need to build a multi-step checklist to accomplish the query; the General Collection provides a single consolidated stop for the LLM. On the other hand, we also create separated collections for each tool for the purpose of modularity and retrieval accuracy during a multi-step inference.
Set up an instance of Riva by following this Quick Start Guide.
[!IMPORTANT] You must have access to a Linux machine to deploy your Riva instance. See specifics at the top of the Riva Quick Start Guide.
At this point, we recommend you to move to a Windows environment if you are not in one already. Linux requires root permissions to run the keyboard library that is used in the voice_ragbot.py
script, so trying to run voice_ragbot.py
in Linux will need additional workarounds.
We are now ready to deploy the RAG-bot. Navigate to ragbot/voice_ragbot.py
.
At the top of the script, we need to configure the IP and port of our Riva instance.
RIVA_IP_AND_PORT = "YOUR RIVA DEPLOYMENT IP AND PORT" #should be in format [IP]:[PORT]
Install some requirements and run the voice_ragbot.py
script:
pip install numpy nvidia-riva-client pandas soundfile sounddevice scipy keyboard protobuf==3.20.3 langchain langchain_openai langchain-nvidia-ai-endpoints
python3 voice_ragbot.py
[!NOTE] You may run into an issue where sounddevice library requires the PortAudio library. Install the PortAudio library.
Use the 'Space' key to begin voice recording, and press 'Space' again to finish recording. Riva ASR will transcribe the speech input, which is sent to the llama3 API at build.nvidia. An LLM "router" decides if the query is one-step or multi-step. If the query is multi-step, it will be routed to a multi-step workflow. The "router" also decides if the query is requires Retriever context, in which case a retrieval step is inserted before the final answer generation. The final text output will be handed to Riva TTS to convert to audio, which is then saved as a .wav file. The script will keep running until the 'Escape' key is pressed.
Using the Audio2Face App will require less set up, but the face will be a default grey colored character. See here for an introductory tutorial to Audio2Face. The default face is pictured below.
[!IMPORTANT] You must have access to a Windows or Linux machine to use the Omniverse Audio2Face app.
Download the Omniverse App:
Within the Omniverse App, search for the Audio2Face app in the Exchange tab and download it. Launch the Audio2face App. It may take several minutes to start up.
To run a quick test of your Riva TTS server and Audio2Face working together, navigate to Windows>Extensions in the Audio2Face App and search for the Riva TTS extension. Toggle it on.
Navigate out of the Extensions panel. You will see that a new Riva TTS Extension panel has opened. In this panel, configure your Riva server IP and port. Then, you can type in a message in the provided text box, which will be sent to your Riva server, then sent back to animate the face mesh. You can find a more detailed video tutorial here.
On the right panel of the interface, configure Audio Player dropdown to Streaming
and click the Get Started
button. Confirm Yes
to creating a new Stage, and a new window will open with the default male face Mark.
Locate the Stage
panel on the right side of the face. In this panel, expand the audio2face
OmniGraph and find the Player_Streaming
OmniGraphNode; this object will take in audio streams from your Riva server. At the top of voice_ragbot.py
, assign the variable A2F_PLAYER_STREAMING_LOCATION
to be to the path of the Player_Streaming
Node. You can find this path by clicking on the Node and viewing the Prim Path
value in under the Property
panel. By default, the path is /World/audio2face/PlayerStreaming
.
In voice_ragbot.py
uncomment the line under #for Audio2Face App rendering
. Comment out the #for Unreal Engine rendering
section (up until and including the sf.write
line).
Then, uncomment the line containing the function call to a2f_client.push_audio_track_stream
. This function pushes the Riva TTS output to an Audio2Face client (see a2f_client.py
) which then animates the face in the Audio2Face app.
Re-run the script. Once audio is synthesized using Riva TTS, it will be automatically streamed to Audio2Face app. The app will play back the audio and drive Mark's face animations in sync with the audio playback.
If you hear audio, but the face mesh is not moving, it is possible that you have multiple Player_Streaming
objects and the one that is receiving audio is not the same object that is driving face movement. Double check the objects under the Audio2Face OmniGraph, as well as the path you configured in voice_ragbot.py
.
This option has a more difficult set up, but allows for greatly increased avatar customization. We will use a Metahuman model, which includes a suite of default models which can be further customized. Some prerequisite knowledge of Unreal Engine may be helpful.
Download Unreal Engine 5.4.
Download the Kairos Example Project found here at the "Access NVIDIA ACE Unreal Engine 5 Sample" link. Follow this guide to set up a Metahuman that can be animated with Audio2Face via the API.
Once the Metahuman is working with your Riva deployment and the Audio2Face API, edit the Level Blueprint and call the Animate Character from Wav File
function. Configure the Path to Wav
to be the path to the output.wav
file which Riva TTS in voice_ragbot.py
generates. This block will send your selected audio data to the Metahuman in the scene. You can then create any event to trigger the Animate Character from Wav File
. In the example below, we use the Comma
key press block.
For a more realistic feel, you can add random eye blink and slight eye movement. Replace the Face_PostProcess_Anim Animation Blueprint with the ragbot/Face_PostProcess_AnimBP.uasset
file included in this repo. This Blueprint's Event Graph is shown below. The Current Smile Value
variable also allows for an adjustable constant smile.
The Kairos Example Project documentation linked above provides guidance on adding idle animation. The example project comes with several animation sequences, and more free animation sequences can be found online from Mixamo and Actorcore. The idle animation used in the demo video can be found in this repo at ragbot/idle_animation.fbx
. Importing the fbx file into Unreal Engine will create an Animation Sequence, which you can then retarget for your MetaHuman’s skeletal mesh.