* update 2024-09-26 06:19:38

yuriufo · Sep 25, 2024 · 14ecba8 · 14ecba8
1 parent 1d1eae5
commit 14ecba8
Show file tree

Hide file tree

Showing 2 changed files with 85 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -2596,6 +2596,18 @@
 
 <details>
 
+<summary>2024-09-09 08:19:33 - Explainable Malware Analysis: Concepts, Approaches and Challenges</summary>
+
+- *Harikha Manthena, Shaghayegh Shajarian, Jeffrey Kimmell, Mahmoud Abdelsalam, Sajad Khorsandroo, Maanak Gupta*
+
+- `2409.13723v1` - [abs](http://arxiv.org/abs/2409.13723v1) - [pdf](http://arxiv.org/pdf/2409.13723v1)
+
+> Machine learning (ML) has seen exponential growth in recent years, finding applications in various domains such as finance, medicine, and cybersecurity. Malware remains a significant threat to modern computing, frequently used by attackers to compromise systems. While numerous machine learning-based approaches for malware detection achieve high performance, they often lack transparency and fail to explain their predictions. This is a critical drawback in malware analysis, where understanding the rationale behind detections is essential for security analysts to verify and disseminate information. Explainable AI (XAI) addresses this issue by maintaining high accuracy while producing models that provide clear, understandable explanations for their decisions. In this survey, we comprehensively review the current state-of-the-art ML-based malware detection techniques and popular XAI approaches. Additionally, we discuss research implementations and the challenges of explainable malware analysis. This theoretical survey serves as an entry point for researchers interested in XAI applications in malware detection. By analyzing recent advancements in explainable malware analysis, we offer a broad overview of the progress in this field, positioning our work as the first to extensively cover XAI methods for malware classification and detection.
+
+</details>
+
+<details>
+
 <summary>2024-09-11 12:48:42 - The Philosopher's Stone: Trojaning Plugins of Large Language Models</summary>
 
 - *Tian Dong, Minhui Xue, Guoxing Chen, Rayne Holland, Yan Meng, Shaofeng Li, Zhen Liu, Haojin Zhu*
@@ -2704,6 +2716,18 @@
 
 <details>
 
+<summary>2024-09-18 17:24:39 - Magika: AI-Powered Content-Type Detection</summary>
+
+- *Yanick Fratantonio, Luca Invernizzi, Loua Farah, Kurt Thomas, Marina Zhang, Ange Albertini, Francois Galilee, Giancarlo Metitieri, Julien Cretin, Alex Petit-Bianco, David Tao, Elie Bursztein*
+
+- `2409.13768v1` - [abs](http://arxiv.org/abs/2409.13768v1) - [pdf](http://arxiv.org/pdf/2409.13768v1)
+
+> The task of content-type detection -- which entails identifying the data encoded in an arbitrary byte sequence -- is critical for operating systems, development, reverse engineering environments, and a variety of security applications. In this paper, we introduce Magika, a novel AI-powered content-type detection tool. Under the hood, Magika employs a deep learning model that can execute on a single CPU with just 1MB of memory to store the model's weights. We show that Magika achieves an average F1 score of 99% across over a hundred content types and a test set of more than 1M files, outperforming all existing content-type detection tools today. In order to foster adoption and improvements, we open source Magika under an Apache 2 license on GitHub and make our model and training pipeline publicly available. Our tool has already seen adoption by the Gmail email provider for attachment scanning, and it has been integrated with VirusTotal to aid with malware analysis.   We note that this paper discusses the first iteration of Magika, and a more recent version already supports more than 200 content types. The interested reader can see the latest development on the Magika GitHub repository, available at https://github.com/google/magika.
+
+</details>
+
+<details>
+
 <summary>2024-09-20 01:27:34 - A Survey on the Application of Generative Adversarial Networks in Cybersecurity: Prospective, Direction and Open Research Scopes</summary>
 
 - *Md Mashrur Arifin, Md Shoaib Ahmed, Tanmai Kumar Ghosh, Ikteder Akhand Udoy, Jun Zhuang, Jyh-haw Yeh*
@@ -2714,3 +2738,63 @@
 
 </details>
 
+<details>
+
+<summary>2024-09-20 04:50:49 - MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning</summary>
+
+- *Eric Li, Yifan Zhang, Yu Huang, Kevin Leach*
+
+- `2409.13213v1` - [abs](http://arxiv.org/abs/2409.13213v1) - [pdf](http://arxiv.org/pdf/2409.13213v1)
+
+> Recent growth and proliferation of malware has tested practitioners' ability to promptly classify new samples according to malware families. In contrast to labor-intensive reverse engineering efforts, machine learning approaches have demonstrated increased speed and accuracy. However, most existing deep-learning malware family classifiers must be calibrated using a large number of samples that are painstakingly manually analyzed before training. Furthermore, as novel malware samples arise that are beyond the scope of the training set, additional reverse engineering effort must be employed to update the training set. The sheer volume of new samples found in the wild creates substantial pressure on practitioners' ability to reverse engineer enough malware to adequately train modern classifiers. In this paper, we present MalMixer, a malware family classifier using semi-supervised learning that achieves high accuracy with sparse training data. We present a novel domain-knowledge-aware technique for augmenting malware feature representations, enhancing few-shot performance of semi-supervised malware family classification. We show that MalMixer achieves state-of-the-art performance in few-shot malware family classification settings. Our research confirms the feasibility and effectiveness of lightweight, domain-knowledge-aware feature augmentation methods and highlights the capabilities of similar semi-supervised classifiers in addressing malware classification issues.
+
+</details>
+
+<details>
+
+<summary>2024-09-20 22:43:47 - Lightweight and Resilient Signatures for Cloud-Assisted Embedded IoT Systems</summary>
+
+- *Saif E. Nouma, Attila A. Yavuz*
+
+- `2409.13937v1` - [abs](http://arxiv.org/abs/2409.13937v1) - [pdf](http://arxiv.org/pdf/2409.13937v1)
+
+> Digital signatures provide scalable authentication with non-repudiation and are vital tools for the Internet of Things (IoT). Many IoT applications harbor vast quantities of resource-limited devices often used with cloud computing. However, key compromises (e.g., physical, malware) pose a significant threat to IoTs due to increased attack vectors and open operational environments. Forward security and distributed key management are critical breach-resilient countermeasures to mitigate such threats. Yet forward-secure signatures are exorbitantly costly for low-end IoTs, while cloud-assisted approaches suffer from centrality or non-colluding semi-honest servers. In this work, we create two novel digital signatures called Lightweight and Resilient Signatures with Hardware Assistance (LRSHA) and its Forward-secure version (FLRSHA). They offer a near-optimally efficient signing with small keys and signature sizes. We synergize various design strategies, such as commitment separation to eliminate costly signing operations and hardware-assisted distributed servers to enable breach-resilient verification. Our schemes achieve magnitudes of faster forward-secure signing and compact key/signature sizes without suffering from strong security assumptions (non-colluding, central servers) or a heavy burden on the verifier (extreme storage, computation). We formally prove the security of our schemes and validate their performance with full-fledged open-source implementations on both commodity hardware and 8-bit AVR microcontrollers.
+
+</details>
+
+<details>
+
+<summary>2024-09-22 13:29:10 - A Visualized Malware Detection Framework with CNN and Conditional GAN</summary>
+
+- *Fang Wang, Hussam Al Hamadi, Ernesto Damiani*
+
+- `2409.14439v1` - [abs](http://arxiv.org/abs/2409.14439v1) - [pdf](http://arxiv.org/pdf/2409.14439v1)
+
+> Malware visualization analysis incorporating with Machine Learning (ML) has been proven to be a promising solution for improving security defenses on different platforms. In this work, we propose an integrated framework for addressing common problems experienced by ML utilizers in developing malware detection systems. Namely, a pictorial presentation system with extensions is designed to preserve the identities of benign/malign samples by encoding each variable into binary digits and mapping them into black and white pixels. A conditional Generative Adversarial Network based model is adopted to produce synthetic images and mitigate issues of imbalance classes. Detection models architected by Convolutional Neural Networks are for validating performances while training on datasets with and without artifactual samples. Result demonstrates accuracy rates of 98.51% and 97.26% for these two training scenarios.
+
+</details>
+
+<details>
+
+<summary>2024-09-22 21:19:37 - DarkGram: Exploring and Mitigating Cybercriminal content shared in Telegram channels</summary>
+
+- *Sayak Saha Roy, Elham Pourabbas Vafa, Kobra Khanmohammadi, Shirin Nilizadeh*
+
+- `2409.14596v1` - [abs](http://arxiv.org/abs/2409.14596v1) - [pdf](http://arxiv.org/pdf/2409.14596v1)
+
+> We present the first large scale analysis of 339 cybercriminal activity channels (CACs) on Telegram from February to May 2024. Collectively followed by over 23.8 million users, these channels shared a wide array of illicit content, including compromised credentials, pirated software and media, tools for blackhat hacking resources such as malware, social engineering scams, and exploit kits. We developed DarkGram, a BERT based framework that identifies malicious posts from the CACs with an accuracy of 96%, using which we conducted a quantitative analysis of 53,605 posts from these channels, revealing key characteristics of shared content. While much of this content is distributed for free, channel administrators frequently employ promotions and giveaways to engage users and boost the sales of premium cybercriminal content. These channels also pose significant risks to their own subscribers. Notably, 28.1% of shared links contained phishing attacks, and 38% of executable files were bundled with malware. Moreover, our qualitative analysis of replies in CACs shows how subscribers cultivate a dangerous sense of community through requests for illegal content, illicit knowledge sharing, and collaborative hacking efforts, while their reactions to posts, including emoji responses, further underscore their appreciation for such content. We also find that the CACs can evade scrutiny by quickly migrating to new channels with minimal subscriber loss, highlighting the resilience of this ecosystem. To counteract this, we further utilized DarkGram to detect new channels, reporting malicious content to Telegram and the affected organizations which resulted in the takedown of 196 such channels over three months. To aid further collaborative efforts in taking down these channels, we open source our dataset and the DarkGram framework.
+
+</details>
+
+<details>
+
+<summary>2024-09-23 15:32:46 - UTrace: Poisoning Forensics for Private Collaborative Learning</summary>
+
+- *Evan Rose, Hidde Lycklama, Harsh Chaudhari, Anwar Hithnawi, Alina Oprea*
+
+- `2409.15126v1` - [abs](http://arxiv.org/abs/2409.15126v1) - [pdf](http://arxiv.org/pdf/2409.15126v1)
+
+> Privacy-preserving machine learning (PPML) enables multiple data owners to contribute their data privately to a set of servers that run a secure multi-party computation (MPC) protocol to train a joint ML model. In these protocols, the input data remains private throughout the training process, and only the resulting model is made available. While this approach benefits privacy, it also exacerbates the risks of data poisoning, where compromised data owners induce undesirable model behavior by contributing malicious datasets. Existing MPC mechanisms can mitigate certain poisoning attacks, but these measures are not exhaustive. To complement existing poisoning defenses, we introduce UTrace: a framework for User-level Traceback of poisoning attacks in PPML. Utrace computes user responsibility scores using gradient similarity metrics aggregated across the most relevant samples in an owner's dataset. UTrace is effective at low poisoning rates and is resilient to poisoning attacks distributed across multiple data owners, unlike existing unlearning-based methods. We introduce methods for checkpointing gradients with low storage overhead, enabling traceback in the absence of data owners at deployment time. We also design several optimizations that reduce traceback time and communication in MPC. We provide a comprehensive evaluation of UTrace across four datasets from three data modalities (vision, text, and malware) and show its effectiveness against 10 poisoning attacks.
+
+</details>
+