* update 2024-12-07 06:22:03

yuriufo · Dec 6, 2024 · a909c26 · a909c26
1 parent b95de2f
commit a909c26
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -3722,3 +3722,39 @@
 
 </details>
 
+<details>
+
+<summary>2024-12-04 18:59:45 - Explainable Malware Detection through Integrated Graph Reduction and Learning Techniques</summary>
+
+- *Hesamodin Mohammadian, Griffin Higgins, Samuel Ansong, Roozbeh Razavi-Far, Ali A. Ghorbani*
+
+- `2412.03634v1` - [abs](http://arxiv.org/abs/2412.03634v1) - [pdf](http://arxiv.org/pdf/2412.03634v1)
+
+> Control Flow Graphs and Function Call Graphs have become pivotal in providing a detailed understanding of program execution and effectively characterizing the behavior of malware. These graph-based representations, when combined with Graph Neural Networks (GNN), have shown promise in developing high-performance malware detectors. However, challenges remain due to the large size of these graphs and the inherent opacity in the decision-making process of GNNs. This paper addresses these issues by developing several graph reduction techniques to reduce graph size and applying the state-of-the-art GNNExplainer to enhance the interpretability of GNN outputs. The analysis demonstrates that integrating our proposed graph reduction technique along with GNNExplainer in the malware detection framework significantly reduces graph size while preserving high performance, providing an effective balance between efficiency and transparency in malware detection.
+
+</details>
+
+<details>
+
+<summary>2024-12-05 13:54:53 - On the Lack of Robustness of Binary Function Similarity Systems</summary>
+
+- *Gianluca Capozzi, Tong Tang, Jie Wan, Ziqi Yang, Daniele Cono D'Elia, Giuseppe Antonio Di Luna, Lorenzo Cavallaro, Leonardo Querzoni*
+
+- `2412.04163v1` - [abs](http://arxiv.org/abs/2412.04163v1) - [pdf](http://arxiv.org/pdf/2412.04163v1)
+
+> Binary function similarity, which often relies on learning-based algorithms to identify what functions in a pool are most similar to a given query function, is a sought-after topic in different communities, including machine learning, software engineering, and security. Its importance stems from the impact it has in facilitating several crucial tasks, from reverse engineering and malware analysis to automated vulnerability detection. Whereas recent work cast light around performance on this long-studied problem, the research landscape remains largely lackluster in understanding the resiliency of the state-of-the-art machine learning models against adversarial attacks. As security requires to reason about adversaries, in this work we assess the robustness of such models through a simple yet effective black-box greedy attack, which modifies the topology and the content of the control flow of the attacked functions. We demonstrate that this attack is successful in compromising all the models, achieving average attack success rates of 57.06% and 95.81% depending on the problem settings (targeted and untargeted attacks). Our findings are insightful: top performance on clean data does not necessarily relate to top robustness properties, which explicitly highlights performance-robustness trade-offs one should consider when deploying such models, calling for further research.
+
+</details>
+
+<details>
+
+<summary>2024-12-05 15:03:26 - PBP: Post-training Backdoor Purification for Malware Classifiers</summary>
+
+- *Dung Thuy Nguyen, Ngoc N. Tran, Taylor T. Johnson, Kevin Leach*
+
+- `2412.03441v2` - [abs](http://arxiv.org/abs/2412.03441v2) - [pdf](http://arxiv.org/pdf/2412.03441v2)
+
+> In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks on ML malware classifiers. For instance, adversaries could inject malicious samples into public malware repositories, contaminating the training data and potentially misclassifying malware by the ML model. Current countermeasures predominantly focus on detecting poisoned samples by leveraging disagreements within the outputs of a diverse set of ensemble models on training data points. However, these methods are not suitable for scenarios where Machine Learning-as-a-Service (MLaaS) is used or when users aim to remove backdoors from a model after it has been trained. Addressing this scenario, we introduce PBP, a post-training defense for malware classifiers that mitigates various types of backdoor embeddings without assuming any specific backdoor embedding mechanism. Our method exploits the influence of backdoor attacks on the activation distribution of neural networks, independent of the trigger-embedding method. In the presence of a backdoor attack, the activation distribution of each layer is distorted into a mixture of distributions. By regulating the statistics of the batch normalization layers, we can guide a backdoored model to perform similarly to a clean one. Our method demonstrates substantial advantages over several state-of-the-art methods, as evidenced by experiments on two datasets, two types of backdoor methods, and various attack configurations. Notably, our approach requires only a small portion of the training data -- only 1\% -- to purify the backdoor and reduce the attack success rate from 100\% to almost 0\%, a 100-fold improvement over the baseline methods. Our code is available at \url{https://github.com/judydnguyen/pbp-backdoor-purification-official}.
+
+</details>
+