* update 2024-10-31 06:20:39

yuriufo · Oct 30, 2024 · 72895c7 · 72895c7
1 parent 482ce23
commit 72895c7
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -3246,3 +3246,39 @@
 
 </details>
 
+<details>
+
+<summary>2024-10-29 04:22:28 - Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection</summary>
+
+- *Md Abu Sayed, Asif Rahman, Christopher Kiekintveld, Sebastian Garcia*
+
+- `2410.21723v1` - [abs](http://arxiv.org/abs/2410.21723v1) - [pdf](http://arxiv.org/pdf/2410.21723v1)
+
+> Domain Generation Algorithms (DGAs) are malicious techniques used by malware to dynamically generate seemingly random domain names for communication with Command & Control (C&C) servers. Due to the fast and simple generation of DGA domains, detection methods must be highly efficient and precise to be effective. Large Language Models (LLMs) have demonstrated their proficiency in real-time detection tasks, making them ideal candidates for detecting DGAs. Our work validates the effectiveness of fine-tuned LLMs for detecting DGAs and DNS exfiltration attacks. We developed LLM models and conducted comprehensive evaluation using a diverse dataset comprising 59 distinct real-world DGA malware families and normal domain data. Our LLM model significantly outperformed traditional natural language processing techniques, especially in detecting unknown DGAs. We also evaluated its performance on DNS exfiltration datasets, demonstrating its effectiveness in enhancing cybersecurity measures. To the best of our knowledge, this is the first work that empirically applies LLMs for DGA and DNS exfiltration detection.
+
+</details>
+
+<details>
+
+<summary>2024-10-29 10:52:43 - LogSHIELD: A Graph-based Real-time Anomaly Detection Framework using Frequency Analysis</summary>
+
+- *Krishna Chandra Roy, Qian Chen*
+
+- `2410.21936v1` - [abs](http://arxiv.org/abs/2410.21936v1) - [pdf](http://arxiv.org/pdf/2410.21936v1)
+
+> Anomaly-based cyber threat detection using deep learning is on a constant growth in popularity for novel cyber-attack detection and forensics. A robust, efficient, and real-time threat detector in a large-scale operational enterprise network requires high accuracy, high fidelity, and a high throughput model to detect malicious activities. Traditional anomaly-based detection models, however, suffer from high computational overhead and low detection accuracy, making them unsuitable for real-time threat detection. In this work, we propose LogSHIELD, a highly effective graph-based anomaly detection model in host data. We present a real-time threat detection approach using frequency-domain analysis of provenance graphs. To demonstrate the significance of graph-based frequency analysis we proposed two approaches. Approach-I uses a Graph Neural Network (GNN) LogGNN and approach-II performs frequency domain analysis on graph node samples for graph embedding. Both approaches use a statistical clustering algorithm for anomaly detection. The proposed models are evaluated using a large host log dataset consisting of 774M benign logs and 375K malware logs. LogSHIELD explores the provenance graph to extract contextual and causal relationships among logs, exposing abnormal activities. It can detect stealthy and sophisticated attacks with over 98% average AUC and F1 scores. It significantly improves throughput, achieves an average detection latency of 0.13 seconds, and outperforms state-of-the-art models in detection time.
+
+</details>
+
+<details>
+
+<summary>2024-10-29 17:43:06 - Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats</summary>
+
+- *Mohammad Setak, Pooria Madani*
+
+- `2410.22293v1` - [abs](http://arxiv.org/abs/2410.22293v1) - [pdf](http://arxiv.org/pdf/2410.22293v1)
+
+> Recent advancements in Large Language Models (LLMs) have significantly improved their capabilities in natural language processing and code synthesis, enabling more complex applications across different fields. This paper explores the application of LLMs in the context of code mutation, a process where the structure of program code is altered without changing its functionality. Traditionally, code mutation has been employed to increase software robustness in mission-critical applications. Additionally, mutation engines have been exploited by malware developers to evade the signature-based detection methods employed by malware detection systems. Existing code mutation engines, often used by such threat actors, typically result in only limited variations in the malware, which can still be identified through static code analysis. However, the agility demonstrated by an LLM-based code synthesizer could significantly change this threat landscape by allowing for more complex code mutations that are not easily detected using static analysis. One can increase variations of codes synthesized by a pre-trained LLM through fine-tuning and retraining. This process is what we refer to as code mutation training. In this paper, we propose a novel definition of code mutation training tailored for pre-trained LLM-based code synthesizers and demonstrate this training on a lightweight pre-trained model. Our approach involves restructuring (i.e., mutating) code at the subroutine level, which allows for more manageable mutations while maintaining the semantic integrity verified through unit testing. Our experimental results illustrate the effectiveness of our approach in improving code mutation capabilities of LLM-based program synthesizers in producing varied and functionally correct code solutions, showcasing their potential to transform the landscape of code mutation and the threats associated with it.
+
+</details>
+