Skip to content

Qihoo360/HyperGLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HyperGLLM: An Efficient Framework for Endpoint Threat Detection via Hypergraph-Enhanced Large Language Models

An efficient framework that introduces hypergraph reasoning into LLMs for malicious behavior detection in EDR logs.

License Static Badge

Overview

HyperGLLM is a novel detection framework that introduces hypergraph reasoning into LLMs. It first constructs an attribute-value level relation-aware graph to model low-order structural semantics while reducing textual redundancy. Then, it introduces a differential hypergraph module with multi-granularity clustering to capture high-order behavioral dependencies embedded in interleaved events and reinforce threat semantics. Finally, the hypergraph representations are aligned with an LLM for efficient contextual reasoning over potential malicious behaviors. We curate EDR3.6B-63F, a large-scale EDR dataset containing 3.6 billion events across 63 distinct behavior families. Extensive experiments demonstrate that HyperGLLM significantly outperforms state-of-the-art methods by reducing the false alarm rate to 1.67%, achieving 94.65% accuracy across 63 behavior families, and improving the modeling efficiency of LLMs on long EDR logs. Our framework and dataset provide a solid foundation for future research and support the development of advanced detection solutions in endpoint security.

✨ Features

  • Framework: We propose HyperGLLM, an efficient framework that introduces hypergraph reasoning into LLMs for malicious behavior detection in EDR logs, capturing both structural semantics and long-range temporal dependencies.
  • Structural Semantics & Temporal Dependencies: We design an attribute-value level relation-aware graph and a differential hypergraph module with multi-granularity clustering to jointly model low- and high- order behavior semantics, thereby enhancing the semantic representation of threat behaviors.
  • EDR3.6B-63F Dataset: We construct EDR3.6B-63F, a large-scale EDR dataset that serves as a high-quality benchmark for advancing AI-driven research in endpoint security, offering diverse behavior types and detailed event records.
  • Effective and Efficient: Extensive experiments demonstrate that HyperGLLM consistently outperforms state-of-the-art baselines across multiple metrics while maintaining high inference efficiency.

📃 Changelog

[23/10/25] The project launched!

🚀 Quick Start

We conduct training on eight NVIDIA H100 (80GB) GPUs and perform evaluation on a single GPU.

Directory Overview

  • requirements.txt — Lists all dependencies required to reproduce this project.
  • datasets/ — Contains all datasets used in both the main experiments and the ablation study described in the paper.
  • experiments_main/ — Source code for reproducing the main experiments.
  • experiments_ablation/ — Source code for reproducing the ablation study experiments in the paper.
  • experiments_appendix/ — Code used for the appendix experiments included in the paper.

Install Dependencies

To reproduce our work, you need to have Python installed along with the required libraries. You can install the necessary dependencies using the following command:

pip install -r requirements.txt

Reproducing the Main Experiment

To reproduce the main experimental results:

  1. cd to the main experiment directory:
cd experiments_main
  1. Run the training and inference script:
sh run.sh
  1. Obtain evaluation metrics:
python get_metrics.py

You can also obtain runtime efficiency metrics (GPU memory usage and Time-to-First-Token) by running:

python get_gpumu_tps.py

This provides performance metrics on an input of 1,024K tokens.

*To reproduce other experiments (e.g., the ablation study), cd to the corresponding directory (e.g., cd experiments_ablation/Analysis_DHGNN), run sh run.sh for training and inference, and obtain evaluation metrics with python get_metrics.py.

💾 Dataset

EDR3.6B-63F Dataset: this repo.

🙌 FAQs

🔖 License

Our project is licensed under the MIT License.

Citation

If you use the EDR3.6B-63F dataset in your research or find our method HyperGLLM inspiring, please consider citing our paper:

@inproceedings{
  title     = {HyperGLLM: An Efficient Framework for Endpoint Threat Detection via Hypergraph-Enhanced Large Language Models},
  year      = {2025},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published