* update 2024-12-11 06:22:26

yuriufo · Dec 10, 2024 · a04a4c6 · a04a4c6
1 parent 8802d4b
commit a04a4c6
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -3770,3 +3770,27 @@
 
 </details>
 
+<details>
+
+<summary>2024-12-09 04:55:10 - Applications of Positive Unlabeled (PU) and Negative Unlabeled (NU) Learning in Cybersecurity</summary>
+
+- *Robert Dilworth, Charan Gudla*
+
+- `2412.06203v1` - [abs](http://arxiv.org/abs/2412.06203v1) - [pdf](http://arxiv.org/pdf/2412.06203v1)
+
+> This paper explores the relatively underexplored application of Positive Unlabeled (PU) Learning and Negative Unlabeled (NU) Learning in the cybersecurity domain. While these semi-supervised learning methods have been applied successfully in fields like medicine and marketing, their potential in cybersecurity remains largely untapped. The paper identifies key areas of cybersecurity--such as intrusion detection, vulnerability management, malware detection, and threat intelligence--where PU/NU learning can offer significant improvements, particularly in scenarios with imbalanced or limited labeled data. We provide a detailed problem formulation for each subfield, supported by mathematical reasoning, and highlight the specific challenges and research gaps in scaling these methods to real-time systems, addressing class imbalance, and adapting to evolving threats. Finally, we propose future directions to advance the integration of PU/NU learning in cybersecurity, offering solutions that can better detect, manage, and mitigate emerging cyber threats.
+
+</details>
+
+<details>
+
+<summary>2024-12-09 06:45:17 - Symbol Preference Aware Generative Models for Recovering Variable Names from Stripped Binary</summary>
+
+- *Xiangzhe Xu, Zhuo Zhang, Zian Su, Ziyang Huang, Shiwei Feng, Yapeng Ye, Nan Jiang, Danning Xie, Siyuan Cheng, Lin Tan, Xiangyu Zhang*
+
+- `2306.02546v4` - [abs](http://arxiv.org/abs/2306.02546v4) - [pdf](http://arxiv.org/pdf/2306.02546v4)
+
+> Decompilation aims to recover the source code form of a binary executable. It has many security applications, such as malware analysis, vulnerability detection, and code hardening. A prominent challenge in decompilation is to recover variable names. We propose a novel technique that leverages the strengths of generative models while mitigating model biases. We build a prototype, GenNm, from pre-trained generative models CodeGemma-2B, CodeLlama-7B, and CodeLlama-34B. We finetune GenNm on decompiled functions and teach models to leverage contextual information. GenNm includes names from callers and callees while querying a function, providing rich contextual information within the model's input token limitation. We mitigate model biases by aligning the output distribution of models with symbol preferences of developers. Our results show that GenNm improves the state-of-the-art name recovery precision by 5.6-11.4 percentage points on two commonly used datasets and improves the state-of-the-art by 32% (from 17.3% to 22.8%) in the most challenging setup where ground-truth variable names are not seen in the training dataset.
+
+</details>
+