* update 2024-05-09 06:16:57

yuriufo · May 8, 2024 · 21ab3b7 · 21ab3b7
1 parent 105a480
commit 21ab3b7
Show file tree

Hide file tree

Showing 2 changed files with 73 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -1342,3 +1342,75 @@
 
 </details>
 
+<details>
+
+<summary>2024-05-07 04:10:01 - Assemblage: Automatic Binary Dataset Construction for Machine Learning</summary>
+
+- *Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt, Kristopher Micinski*
+
+- `2405.03991v1` - [abs](http://arxiv.org/abs/2405.03991v1) - [pdf](http://arxiv.org/pdf/2405.03991v1)
+
+> Binary code is pervasive, and binary analysis is a key task in reverse engineering, malware classification, and vulnerability discovery. Unfortunately, while there exist large corpuses of malicious binaries, obtaining high-quality corpuses of benign binaries for modern systems has proven challenging (e.g., due to licensing issues). Consequently, machine learning based pipelines for binary analysis utilize either costly commercial corpuses (e.g., VirusTotal) or open-source binaries (e.g., coreutils) available in limited quantities. To address these issues, we present Assemblage: an extensible cloud-based distributed system that crawls, configures, and builds Windows PE binaries to obtain high-quality binary corpuses suitable for training state-of-the-art models in binary analysis. We have run Assemblage on AWS over the past year, producing 890k Windows PE and 428k Linux ELF binaries across 29 configurations. Assemblage is designed to be both reproducible and extensible, enabling users to publish "recipes" for their datasets, and facilitating the extraction of a wide array of features. We evaluated Assemblage by using its data to train modern learning-based pipelines for compiler provenance and binary function similarity. Our results illustrate the practical need for robust corpuses of high-quality Windows PE binaries in training modern learning-based binary analyses. Assemblage can be downloaded from https://assemblage-dataset.net
+
+</details>
+
+<details>
+
+<summary>2024-05-07 04:59:19 - Explainability-Informed Targeted Malware Misclassification</summary>
+
+- *Quincy Card, Kshitiz Aryal, Maanak Gupta*
+
+- `2405.04010v1` - [abs](http://arxiv.org/abs/2405.04010v1) - [pdf](http://arxiv.org/pdf/2405.04010v1)
+
+> In recent years, there has been a surge in malware attacks across critical infrastructures, requiring further research and development of appropriate response and remediation strategies in malware detection and classification. Several works have used machine learning models for malware classification into categories, and deep neural networks have shown promising results. However, these models have shown its vulnerabilities against intentionally crafted adversarial attacks, which yields misclassification of a malicious file. Our paper explores such adversarial vulnerabilities of neural network based malware classification system in the dynamic and online analysis environments. To evaluate our approach, we trained Feed Forward Neural Networks (FFNN) to classify malware categories based on features obtained from dynamic and online analysis environments. We use the state-of-the-art method, SHapley Additive exPlanations (SHAP), for the feature attribution for malware classification, to inform the adversarial attackers about the features with significant importance on classification decision. Using the explainability-informed features, we perform targeted misclassification adversarial white-box evasion attacks using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks against the trained classifier. Our results demonstrated high evasion rate for some instances of attacks, showing a clear vulnerability of a malware classifier for such attacks. We offer recommendations for a balanced approach and a benchmark for much-needed future research into evasion attacks against malware classifiers, and develop more robust and trustworthy solutions.
+
+</details>
+
+<details>
+
+<summary>2024-05-07 07:55:45 - Going Proactive and Explanatory Against Malware Concept Drift</summary>
+
+- *Yiling He, Junchi Lei, Zhan Qin, Kui Ren*
+
+- `2405.04095v1` - [abs](http://arxiv.org/abs/2405.04095v1) - [pdf](http://arxiv.org/pdf/2405.04095v1)
+
+> Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive understanding of malware concepts and provide limited guidance for effective drift adaptation, leading to unstable detection performance and high human labeling costs.   To address these limitations, we introduce DREAM, a novel system designed to surpass the capabilities of existing drift detectors and to establish an explanatory drift adaptation process. DREAM enhances drift detection through model sensitivity and data autonomy. The detector, trained in a semi-supervised approach, proactively captures malware behavior concepts through classifier feedback. During testing, it utilizes samples generated by the detector itself, eliminating reliance on extensive training data. For drift adaptation, DREAM enlarges human intervention, enabling revisions of malware labels and concept explanations embedded within the detector's latent space. To ensure a comprehensive response to concept drift, it facilitates a coordinated update process for both the classifier and the detector. Our evaluation shows that DREAM can effectively improve the drift detection accuracy and reduce the expert analysis effort in adaptation across different malware datasets and classifiers.
+
+</details>
+
+<details>
+
+<summary>2024-05-07 08:25:12 - The Malware as a Service ecosystem</summary>
+
+- *Constantinos Patsakis, David Arroyo, Fran Casino*
+
+- `2405.04109v1` - [abs](http://arxiv.org/abs/2405.04109v1) - [pdf](http://arxiv.org/pdf/2405.04109v1)
+
+> The goal of this chapter is to illuminate the operational frameworks, key actors, and significant cybersecurity implications of the Malware as a Service (MaaS) ecosystem. Highlighting the transformation of malware proliferation into a service-oriented model, the chapter discusses how MaaS democratises access to sophisticated cyberattack capabilities, enabling even those with minimal technical knowledge to execute catastrophic cyberattacks. The discussion extends to the roles within the MaaS ecosystem, including malware developers, affiliates, initial access brokers, and the essential infrastructure providers that support these nefarious activities. The study emphasises the profound challenges MaaS poses to traditional cybersecurity defences, rendered ineffective against the constantly evolving and highly adaptable threats generated by MaaS platforms. With the increase in malware sophistication, there is a parallel call for a paradigm shift in defensive strategies, advocating for dynamic analysis, behavioural detection, and the integration of AI and machine learning techniques. By exploring the intricacies of the MaaS ecosystem, including the economic motivations driving its growth and the blurred lines between legitimate service models and cyber crime, the chapter presents a comprehensive overview intended to foster a deeper understanding among researchers and cybersecurity professionals. The ultimate goal is to aid in developing more effective strategies for combating the spread of commoditised malware threats and safeguarding against the increasing accessibility and scalability of cyberattacks facilitated by the MaaS model.
+
+</details>
+
+<details>
+
+<summary>2024-05-07 14:33:05 - SmmPack: Obfuscation for SMM Modules with TPM Sealed Key</summary>
+
+- *Kazuki Matsuo, Satoshi Tanda, Kuniyasu Suzaki, Yuhei Kawakoya, Tatsuya Mori*
+
+- `2405.04355v1` - [abs](http://arxiv.org/abs/2405.04355v1) - [pdf](http://arxiv.org/pdf/2405.04355v1)
+
+> System Management Mode (SMM) is the highest-privileged operating mode of x86 and x86-64 processors. Through SMM exploitation, attackers can tamper with the Unified Extensible Firmware Interface (UEFI) firmware, disabling the security mechanisms implemented by the operating system and hypervisor. Vulnerabilities enabling SMM code execution are often reported as Common Vulnerabilities and Exposures (CVEs); however, no security mechanisms currently exist to prevent attackers from analyzing those vulnerabilities. To increase the cost of vulnerability analysis of SMM modules, we introduced SmmPack. The core concept of SmmPack involves encrypting an SMM module with the key securely stored in a Trusted Platform Module (TPM). We assessed the effectiveness of SmmPack in preventing attackers from obtaining and analyzing SMM modules using various acquisition methods. Our results show that SmmPack significantly increases the cost by narrowing down the means of module acquisition. Furthermore, we demonstrated that SmmPack operates without compromising the performance of the original SMM modules. We also clarified the management and adoption methods of SmmPack, as well as the procedure for applying BIOS updates, and demonstrated that the implementation of SmmPack is realistic.
+
+</details>
+
+<details>
+
+<summary>2024-05-07 14:57:24 - Leveraging LSTM and GAN for Modern Malware Detection</summary>
+
+- *Ishita Gupta, Sneha Kumari, Priya Jha, Mohona Ghosh*
+
+- `2405.04373v1` - [abs](http://arxiv.org/abs/2405.04373v1) - [pdf](http://arxiv.org/pdf/2405.04373v1)
+
+> The malware booming is a cyberspace equal to the effect of climate change to ecosystems in terms of danger. In the case of significant investments in cybersecurity technologies and staff training, the global community has become locked up in the eternal war with cyber security threats. The multi-form and changing faces of malware are continuously pushing the boundaries of the cybersecurity practitioners employ various approaches like detection and mitigate in coping with this issue. Some old mannerisms like signature-based detection and behavioral analysis are slow to adapt to the speedy evolution of malware types. Consequently, this paper proposes the utilization of the Deep Learning Model, LSTM networks, and GANs to amplify malware detection accuracy and speed. A fast-growing, state-of-the-art technology that leverages raw bytestream-based data and deep learning architectures, the AI technology provides better accuracy and performance than the traditional methods. Integration of LSTM and GAN model is the technique that is used for the synthetic generation of data, leading to the expansion of the training datasets, and as a result, the detection accuracy is improved. The paper uses the VirusShare dataset which has more than one million unique samples of the malware as the training and evaluation set for the presented models. Through thorough data preparation including tokenization, augmentation, as well as model training, the LSTM and GAN models convey the better performance in the tasks compared to straight classifiers. The research outcomes come out with 98% accuracy that shows the efficiency of deep learning plays a decisive role in proactive cybersecurity defense. Aside from that, the paper studies the output of ensemble learning and model fusion methods as a way to reduce biases and lift model complexity.
+
+</details>
+