* update 2024-10-11 06:20:36

yuriufo · Oct 10, 2024 · a03477d · a03477d
1 parent 75310b0
commit a03477d
Show file tree

Hide file tree

Showing 2 changed files with 49 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -2982,3 +2982,51 @@
 
 </details>
 
+<details>
+
+<summary>2024-10-07 21:15:40 - Cybersecurity Threat Hunting and Vulnerability Analysis Using a Neo4j Graph Database of Open Source Intelligence</summary>
+
+- *Elijah Pelofske, Lorie M. Liebrock, Vincent Urias*
+
+- `2301.12013v2` - [abs](http://arxiv.org/abs/2301.12013v2) - [pdf](http://arxiv.org/pdf/2301.12013v2)
+
+> Open source intelligence is a powerful tool for cybersecurity analysts to gather information both for analysis of discovered vulnerabilities and for detecting novel cybersecurity threats and exploits. However the scale of information that is relevant for information security on the internet is always increasing, and is intractable for analysts to parse comprehensively. Therefore methods of condensing the available open source intelligence, and automatically developing connections between disparate sources of information, is incredibly valuable. In this research, we present a system which constructs a Neo4j graph database formed by shared connections between open source intelligence text including blogs, cybersecurity bulletins, news sites, antivirus scans, social media posts (e.g., Reddit and Twitter), and threat reports. These connections are comprised of possible indicators of compromise (e.g., IP addresses, domains, hashes, email addresses, phone numbers), information on known exploits and techniques (e.g., CVEs and MITRE ATT&CK Technique ID's), and potential sources of information on cybersecurity exploits such as twitter usernames. The construction of the database of potential IoCs is detailed, including the addition of machine learning and metadata which can be used for filtering of the data for a specific domain (for example a specific natural language) when needed. Examples of utilizing the graph database for querying connections between known malicious IoCs and open source intelligence documents, including threat reports, are shown. We show three specific examples of interesting connections found in the graph database; the connections to a known exploited CVE, a known malicious IP address, and a malware hash signature.
+
+</details>
+
+<details>
+
+<summary>2024-10-08 16:00:27 - Detecting Android Malware by Visualizing App Behaviors from Multiple Complementary Views</summary>
+
+- *Zhaoyi Meng, Jiale Zhang, Jiaqi Guo, Wansen Wang, Wenchao Huang, Jie Cui, Hong Zhong, Yan Xiong*
+
+- `2410.06157v1` - [abs](http://arxiv.org/abs/2410.06157v1) - [pdf](http://arxiv.org/pdf/2410.06157v1)
+
+> Deep learning has emerged as a promising technology for achieving Android malware detection. To further unleash its detection potentials, software visualization can be integrated for analyzing the details of app behaviors clearly. However, facing increasingly sophisticated malware, existing visualization-based methods, analyzing from one or randomly-selected few views, can only detect limited attack types. We propose and implement LensDroid, a novel technique that detects Android malware by visualizing app behaviors from multiple complementary views. Our goal is to harness the power of combining deep learning and software visualization to automatically capture and aggregate high-level features that are not inherently linked, thereby revealing hidden maliciousness of Android app behaviors. To thoroughly comprehend the details of apps, we visualize app behaviors from three related but distinct views of behavioral sensitivities, operational contexts and supported environments. We then extract high-order semantics based on the views accordingly. To exploit semantic complementarity of the views, we design a deep neural network based model for fusing the visualized features from local to global based on their contributions to downstream tasks. A comprehensive comparison with five baseline techniques is performed on datasets of more than 51K apps in three real-world typical scenarios, including overall threats, app evolution and zero-day malware. The experimental results show that the overall performance of LensDroid is better than the baseline techniques. We also validate the complementarity of the views and demonstrate that the multi-view fusion in LensDroid enhances Android malware detection.
+
+</details>
+
+<details>
+
+<summary>2024-10-09 01:09:24 - Multi-label Classification for Android Malware Based on Active Learning</summary>
+
+- *Qijing Qiao, Ruitao Feng, Sen Chen, Fei Zhang, Xiaohong Li*
+
+- `2410.06444v1` - [abs](http://arxiv.org/abs/2410.06444v1) - [pdf](http://arxiv.org/pdf/2410.06444v1)
+
+> The existing malware classification approaches (i.e., binary and family classification) can barely benefit subsequent analysis with their outputs. Even the family classification approaches suffer from lacking a formal naming standard and an incomplete definition of malicious behaviors. More importantly, the existing approaches are powerless for one malware with multiple malicious behaviors, while this is a very common phenomenon for Android malware in the wild. So, neither of them can provide researchers with a direct and comprehensive enough understanding of malware. In this paper, we propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors. With an in-depth analysis, we summarize six basic malicious behaviors from real-world malware with security reports and construct a labeled dataset. We compare the results of 70 algorithm combinations to evaluate the effectiveness (best at 73.3%). Faced with the challenge of the expensive cost of data annotation, we further propose an active learning approach based on data augmentation, which can improve the overall accuracy to 86.7% with a data augmentation of 5,000+ high-quality samples from an unlabeled malware dataset. This is the first multi-label Android malware classification approach intending to provide more information on fine-grained malicious behaviors.
+
+</details>
+
+<details>
+
+<summary>2024-10-09 01:36:25 - Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders</summary>
+
+- *David Noever, Forrest McKee*
+
+- `2410.06462v1` - [abs](http://arxiv.org/abs/2410.06462v1) - [pdf](http://arxiv.org/pdf/2410.06462v1)
+
+> The research builds and evaluates the adversarial potential to introduce copied code or hallucinated AI recommendations for malicious code in popular code repositories. While foundational large language models (LLMs) from OpenAI, Google, and Anthropic guard against both harmful behaviors and toxic strings, previous work on math solutions that embed harmful prompts demonstrate that the guardrails may differ between expert contexts. These loopholes would appear in mixture of expert's models when the context of the question changes and may offer fewer malicious training examples to filter toxic comments or recommended offensive actions. The present work demonstrates that foundational models may refuse to propose destructive actions correctly when prompted overtly but may unfortunately drop their guard when presented with a sudden change of context, like solving a computer programming challenge. We show empirical examples with trojan-hosting repositories like GitHub, NPM, NuGet, and popular content delivery networks (CDN) like jsDelivr which amplify the attack surface. In the LLM's directives to be helpful, example recommendations propose application programming interface (API) endpoints which a determined domain-squatter could acquire and setup attack mobile infrastructure that triggers from the naively copied code. We compare this attack to previous work on context-shifting and contrast the attack surface as a novel version of "living off the land" attacks in the malware literature. In the latter case, foundational language models can hijack otherwise innocent user prompts to recommend actions that violate their owners' safety policies when posed directly without the accompanying coding support request.
+
+</details>
+