* update 2025-01-08 06:20:29

yuriufo · Jan 7, 2025 · e8c80a0 · e8c80a0
1 parent 818d9d7
commit e8c80a0
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2025.md b/arXiv_db/Malware/2025.md
@@ -42,3 +42,27 @@
 
 </details>
 
+<details>
+
+<summary>2025-01-05 10:04:58 - Predicting Vulnerability to Malware Using Machine Learning Models: A Study on Microsoft Windows Machines</summary>
+
+- *Marzieh Esnaashari, Nima Moradi*
+
+- `2501.02493v1` - [abs](http://arxiv.org/abs/2501.02493v1) - [pdf](http://arxiv.org/pdf/2501.02493v1)
+
+> In an era of escalating cyber threats, malware poses significant risks to individuals and organizations, potentially leading to data breaches, system failures, and substantial financial losses. This study addresses the urgent need for effective malware detection strategies by leveraging Machine Learning (ML) techniques on extensive datasets collected from Microsoft Windows Defender. Our research aims to develop an advanced ML model that accurately predicts malware vulnerabilities based on the specific conditions of individual machines. Moving beyond traditional signature-based detection methods, we incorporate historical data and innovative feature engineering to enhance detection capabilities. This study makes several contributions: first, it advances existing malware detection techniques by employing sophisticated ML algorithms; second, it utilizes a large-scale, real-world dataset to ensure the applicability of findings; third, it highlights the importance of feature analysis in identifying key indicators of malware infections; and fourth, it proposes models that can be adapted for enterprise environments, offering a proactive approach to safeguarding extensive networks against emerging threats. We aim to improve cybersecurity resilience, providing critical insights for practitioners in the field and addressing the evolving challenges posed by malware in a digital landscape. Finally, discussions on results, insights, and conclusions are presented.
+
+</details>
+
+<details>
+
+<summary>2025-01-06 16:29:32 - Leveraging Large Language Models to Detect npm Malicious Packages</summary>
+
+- *Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams*
+
+- `2403.12196v4` - [abs](http://arxiv.org/abs/2403.12196v4) - [pdf](http://arxiv.org/pdf/2403.12196v4)
+
+> Existing malicious code detection techniques demand the integration of multiple tools to detect different malware patterns, often suffering from high misclassification rates. Therefore, malicious code detection techniques could be enhanced by adopting advanced, more automated approaches to achieve high accuracy and a low misclassification rate. The goal of this study is to aid security analysts in detecting malicious packages by empirically studying the effectiveness of Large Language Models (LLMs) in detecting malicious code. We present SocketAI, a malicious code review workflow to detect malicious code. To evaluate the effectiveness of SocketAI, we leverage a benchmark dataset of 5,115 npm packages, of which 2,180 packages have malicious code. We conducted a baseline comparison of GPT-3 and GPT-4 models with the state-of-the-art CodeQL static analysis tool, using 39 custom CodeQL rules developed in prior research to detect malicious Javascript code. We also compare the effectiveness of static analysis as a pre-screener with SocketAI workflow, measuring the number of files that need to be analyzed. and the associated costs. Additionally, we performed a qualitative study to understand the types of malicious activities detected or missed by our workflow. Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores, respectively. GPT-4 achieves higher accuracy with 99% precision and 97% F1 scores, while GPT-3 offers a more cost-effective balance at 91% precision and 94% F1 scores. Pre-screening files with a static analyzer reduces the number of files requiring LLM analysis by 77.9% and decreases costs by 60.9% for GPT-3 and 76.1% for GPT-4. Our qualitative analysis identified data theft, execution of arbitrary code, and suspicious domain categories as the top detected malicious packages.
+
+</details>
+