* update 2024-10-16 06:20:24

yuriufo · Oct 15, 2024 · 51248e0 · 51248e0
1 parent dae492d
commit 51248e0
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 1 deletion.
diff --git a/arXiv_db/Malware/2024.md b/arXiv_db/Malware/2024.md
@@ -3042,3 +3042,27 @@
 
 </details>
 
+<details>
+
+<summary>2024-10-12 07:10:44 - A Novel Approach to Malicious Code Detection Using CNN-BiLSTM and Feature Fusion</summary>
+
+- *Lixia Zhang, Tianxu Liu, Kaihui Shen, Cheng Chen*
+
+- `2410.09401v1` - [abs](http://arxiv.org/abs/2410.09401v1) - [pdf](http://arxiv.org/pdf/2410.09401v1)
+
+> With the rapid advancement of Internet technology, the threat of malware to computer systems and network security has intensified. Malware affects individual privacy and security and poses risks to critical infrastructures of enterprises and nations. The increasing quantity and complexity of malware, along with its concealment and diversity, challenge traditional detection techniques. Static detection methods struggle against variants and packed malware, while dynamic methods face high costs and risks that limit their application. Consequently, there is an urgent need for novel and efficient malware detection techniques to improve accuracy and robustness.   This study first employs the minhash algorithm to convert binary files of malware into grayscale images, followed by the extraction of global and local texture features using GIST and LBP algorithms. Additionally, the study utilizes IDA Pro to decompile and extract opcode sequences, applying N-gram and tf-idf algorithms for feature vectorization. The fusion of these features enables the model to comprehensively capture the behavioral characteristics of malware.   In terms of model construction, a CNN-BiLSTM fusion model is designed to simultaneously process image features and opcode sequences, enhancing classification performance. Experimental validation on multiple public datasets demonstrates that the proposed method significantly outperforms traditional detection techniques in terms of accuracy, recall, and F1 score, particularly in detecting variants and obfuscated malware with greater stability.   The research presented in this paper offers new insights into the development of malware detection technologies, validating the effectiveness of feature and model fusion, and holds promising application prospects.
+
+</details>
+
+<details>
+
+<summary>2024-10-14 05:13:48 - BinSimDB: Benchmark Dataset Construction for Fine-Grained Binary Code Similarity Analysis</summary>
+
+- *Fei Zuo, Cody Tompkins, Qiang Zeng, Lannan Luo, Yung Ryn Choe, Junghwan Rhee*
+
+- `2410.10163v1` - [abs](http://arxiv.org/abs/2410.10163v1) - [pdf](http://arxiv.org/pdf/2410.10163v1)
+
+> Binary Code Similarity Analysis (BCSA) has a wide spectrum of applications, including plagiarism detection, vulnerability discovery, and malware analysis, thus drawing significant attention from the security community. However, conventional techniques often face challenges in balancing both accuracy and scalability simultaneously. To overcome these existing problems, a surge of deep learning-based work has been recently proposed. Unfortunately, many researchers still find it extremely difficult to conduct relevant studies or extend existing approaches. First, prior work typically relies on proprietary benchmark without making the entire dataset publicly accessible. Consequently, a large-scale, well-labeled dataset for binary code similarity analysis remains precious and scarce. Moreover, previous work has primarily focused on comparing at the function level, rather than exploring other finer granularities. Therefore, we argue that the lack of a fine-grained dataset for BCSA leaves a critical gap in current research. To address these challenges, we construct a benchmark dataset for fine-grained binary code similarity analysis called BinSimDB, which contains equivalent pairs of smaller binary code snippets, such as basic blocks. Specifically, we propose BMerge and BPair algorithms to bridge the discrepancies between two binary code snippets caused by different optimization levels or platforms. Furthermore, we empirically study the properties of our dataset and evaluate its effectiveness for the BCSA research. The experimental results demonstrate that BinSimDB significantly improves the performance of binary code similarity comparison.
+
+</details>
+