* update 2024-08-07 06:19:16

yuriufo · Aug 6, 2024 · 3d370ba · 3d370ba
1 parent b82777f
commit 3d370ba
Show file tree

Hide file tree

Showing 2 changed files with 73 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -2254,3 +2254,75 @@
 
 </details>
 
+<details>
+
+<summary>2024-08-03 04:21:24 - Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector</summary>
+
+- *Xingyuan Wei, Ce Li, Qiujian Lv, Ning Li, Degang Sun, Yan Wang*
+
+- `2408.01661v1` - [abs](http://arxiv.org/abs/2408.01661v1) - [pdf](http://arxiv.org/pdf/2408.01661v1)
+
+> In dynamic Windows malware detection, deep learning models are extensively deployed to analyze API sequences. Methods based on API sequences play a crucial role in malware prevention. However, due to the continuous updates of APIs and the changes in API sequence calls leading to the constant evolution of malware variants, the detection capability of API sequence-based malware detection models significantly diminishes over time. We observe that the API sequences of malware samples before and after evolution usually have similar malicious semantics. Specifically, compared to the original samples, evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors. For instance, they access similar sensitive system resources and extend new malicious functions based on the original functionalities. In this paper, we propose a frame(MME), a framework that can enhance existing API sequence-based malware detectors and mitigate the adverse effects of malware evolution. To help detection models capture the similar semantics of these post-evolution API sequences, our framework represents API sequences using API knowledge graphs and system resource encodings and applies contrastive learning to enhance the model's encoder. Results indicate that, compared to Regular Text-CNN, our framework can significantly reduce the false positive rate by 13.10% and improve the F1-Score by 8.47% on five years of data, achieving the best experimental results. Additionally, evaluations show that our framework can save on the human costs required for model maintenance. We only need 1% of the budget per month to reduce the false positive rate by 11.16% and improve the F1-Score by 6.44%.
+
+</details>
+
+<details>
+
+<summary>2024-08-04 11:55:24 - Reinforcement Learning for an Efficient and Effective Malware Investigation during Cyber Incident Response</summary>
+
+- *Dipo Dunsin, Mohamed Chahine Ghanem, Karim Ouazzane, Vassil Vassilev*
+
+- `2408.01999v1` - [abs](http://arxiv.org/abs/2408.01999v1) - [pdf](http://arxiv.org/pdf/2408.01999v1)
+
+> This research focused on enhancing post-incident malware forensic investigation using reinforcement learning RL. We proposed an advanced MDP post incident malware forensics investigation model and framework to expedite post incident forensics. We then implement our RL Malware Investigation Model based on structured MDP within the proposed framework. To identify malware artefacts, the RL agent acquires and examines forensics evidence files, iteratively improving its capabilities using Q Table and temporal difference learning. The Q learning algorithm significantly improved the agent ability to identify malware. An epsilon greedy exploration strategy and Q learning updates enabled efficient learning and decision making. Our experimental testing revealed that optimal learning rates depend on the MDP environment complexity, with simpler environments benefiting from higher rates for quicker convergence and complex ones requiring lower rates for stability. Our model performance in identifying and classifying malware reduced malware analysis time compared to human experts, demonstrating robustness and adaptability. The study highlighted the significance of hyper parameter tuning and suggested adaptive strategies for complex environments. Our RL based approach produced promising results and is validated as an alternative to traditional methods notably by offering continuous learning and adaptation to new and evolving malware threats which ultimately enhance the post incident forensics investigations.
+
+</details>
+
+<details>
+
+<summary>2024-08-04 15:42:34 - PromptSAM+: Malware Detection based on Prompt Segment Anything Model</summary>
+
+- *Xingyuan Wei, Yichen Liu, Ce Li, Ning Li, Degang Sun, Yan Wang*
+
+- `2408.02066v1` - [abs](http://arxiv.org/abs/2408.02066v1) - [pdf](http://arxiv.org/pdf/2408.02066v1)
+
+> Machine learning and deep learning (ML/DL) have been extensively applied in malware detection, and some existing methods demonstrate robust performance. However, several issues persist in the field of malware detection: (1) Existing work often overemphasizes accuracy at the expense of practicality, rarely considering false positive and false negative rates as important metrics. (2) Considering the evolution of malware, the performance of classifiers significantly declines over time, greatly reducing the practicality of malware detectors. (3) Prior ML/DL-based efforts heavily rely on ample labeled data for model training, largely dependent on feature engineering or domain knowledge to build feature databases, making them vulnerable if correct labels are scarce. With the development of computer vision, vision-based malware detection technology has also rapidly evolved. In this paper, we propose a visual malware general enhancement classification framework, `PromptSAM+', based on a large visual network segmentation model, the Prompt Segment Anything Model(named PromptSAM+). Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives. The proposed method outperforms the most advanced image-based malware detection technologies on several datasets. 'PromptSAM+' can mitigate aging in existing image-based malware classifiers, reducing the considerable manpower needed for labeling new malware samples through active learning. We conducted experiments on datasets for both Windows and Android platforms, achieving favorable outcomes. Additionally, our ablation experiments on several datasets demonstrate that our model identifies effective modules within the large visual network.
+
+</details>
+
+<details>
+
+<summary>2024-08-05 08:41:07 - On the Robustness of Malware Detectors to Adversarial Samples</summary>
+
+- *Muhammad Salman, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Muhammad Ikram, Sidharth Kaushik, Mohamed Ali Kaafar*
+
+- `2408.02310v1` - [abs](http://arxiv.org/abs/2408.02310v1) - [pdf](http://arxiv.org/pdf/2408.02310v1)
+
+> Adversarial examples add imperceptible alterations to inputs with the objective to induce misclassification in machine learning models. They have been demonstrated to pose significant challenges in domains like image classification, with results showing that an adversarially perturbed image to evade detection against one classifier is most likely transferable to other classifiers. Adversarial examples have also been studied in malware analysis. Unlike images, program binaries cannot be arbitrarily perturbed without rendering them non-functional. Due to the difficulty of crafting adversarial program binaries, there is no consensus on the transferability of adversarially perturbed programs to different detectors. In this work, we explore the robustness of malware detectors against adversarially perturbed malware. We investigate the transferability of adversarial attacks developed against one detector, against other machine learning-based malware detectors, and code similarity techniques, specifically, locality sensitive hashing-based detectors. Our analysis reveals that adversarial program binaries crafted for one detector are generally less effective against others. We also evaluate an ensemble of detectors and show that they can potentially mitigate the impact of adversarial program binaries. Finally, we demonstrate that substantial program changes made to evade detection may result in the transformation technique being identified, implying that the adversary must make minimal changes to the program binary.
+
+</details>
+
+<details>
+
+<summary>2024-08-05 08:46:46 - A Lean Transformer Model for Dynamic Malware Analysis and Detection</summary>
+
+- *Tony Quertier, Benjamin Marais, Grégoire Barrué, Stéphane Morucci, Sévan Azé, Sébastien Salladin*
+
+- `2408.02313v1` - [abs](http://arxiv.org/abs/2408.02313v1) - [pdf](http://arxiv.org/pdf/2408.02313v1)
+
+> Malware is a fast-growing threat to the modern computing world and existing lines of defense are not efficient enough to address this issue. This is mainly due to the fact that many prevention solutions rely on signature-based detection methods that can easily be circumvented by hackers. Therefore, there is a recurrent need for behavior-based analysis where a suspicious file is ran in a secured environment and its traces are collected to reports for analysis. Previous works have shown some success leveraging Neural Networks and API calls sequences extracted from these execution reports.   Recently, Large Language Models and Generative AI have demonstrated impressive capabilities mainly in Natural Language Processing tasks and promising applications in the cybersecurity field for both attackers and defenders.   In this paper, we design an Encoder-Only model, based on the Transformers architecture, to detect malicious files, digesting their API call sequences collected by an execution emulation solution. We are also limiting the size of the model architecture and the number of its parameters since it is often considered that Large Language Models may be overkill for specific tasks such as the one we are dealing with hereafter. In addition to achieving decent detection results, this approach has the advantage of reducing our carbon footprint by limiting training and inference times and facilitating technical operations with less hardware requirements.   We also carry out some analysis of our results and highlight the limits and possible improvements when using Transformers to analyze malicious files.
+
+</details>
+
+<details>
+
+<summary>2024-08-05 17:01:33 - Command-line Obfuscation Detection using Small Language Models</summary>
+
+- *Vojtech Outrata, Michael Adam Polak, Martin Kopp*
+
+- `2408.02637v1` - [abs](http://arxiv.org/abs/2408.02637v1) - [pdf](http://arxiv.org/pdf/2408.02637v1)
+
+> To avoid detection, adversaries often use command-line obfuscation. There are numerous techniques of the command-line obfuscation, all designed to alter the command-line syntax without affecting its original functionality. This variability forces most security solutions to create an exhaustive enumeration of signatures for even a single pattern. In contrast to using signatures, we have implemented a scalable NLP-based detection method that leverages a custom-trained, small transformer language model that can be applied to any source of execution logs. The evaluation on top of real-world telemetry demonstrates that our approach yields high-precision detections even on high-volume telemetry from a diverse set of environments spanning from universities and businesses to healthcare or finance. The practical value is demonstrated in a case study of real-world samples detected by our model. We show the model's superiority to signatures on established malware known to employ obfuscation and showcase previously unseen obfuscated samples detected by our model.
+
+</details>
+