Tuesday 25 February 2020

Representative Sample Selection via FrequentSubgraph Analysis

Representative Sample Selection via FrequentSubgraph Analysis
 First, the accurate separation of malicious components andthe legitimate part from the majority of Android malware,which are repackaged popular apps, is nontrivial [9]–[12].Zhou and Jiang [7] found that 86% of Android malwaresamples are repackaged apps produced by injecting maliciouscomponents into legitimate apps. The injected maliciouscomponents are hidden within the functionalities of popularapps and usually constitute only a small portion of therepackaged apps. Differentiating between the legitimate partand malicious components of malware is difficult for existingfeatures, such as system calls [13] and sensitive path [14].Second, polymorphic variants of Android malware thatbelong to the same family perform the same maliciousactivities with different implementations. Therefore, suchmalware can easily evade existing classification solutions [15],[16] that seek an exact match of a given specification. Forexample, Listing 1 illustrates different implementations ofthe same functionality (i.e., obtain device id, phone number,and voice mail number) in two malware samples. The twomalware samples belong to the same family,geinimi. Thesebot-like malware samples steal personal information and sendit to a remote server. Three major differences (highlightedin red) are observed in the two implementations. First, thestructures of class names are different. Second, the argumentsof the two functions are different. One takes a service(Lcom/geinimi/Adservice), one of the four basic componentsof Android apps, as an argument. By contrast, the other usesan object of the classrally/eas an argument. Third, theformer function contains two more statements (including oneinvocation) than the latter Code Shoppy
                       Representative Sample Selection via FrequentSubgraph Analysis
https://codeshoppy.com/php-projects-titles-topics.html
To address the above challenges, we propose a novelapproach that exploits the following two observations:Observation 1.Android malware usually invokes sensitiveapplication program interface (API) calls that operate onsensitive data to perform malicious activities. For example,the malware samples presented in Listing 1 invokeget-Line1Number()to obtain the phone number of users.Observation 2.Malware and its variants within the samefamily invoke sensitive API calls by following similar patternseven if their codes may be obfuscated. As illustrated in Listing1, three commonly invoked sensitive API calls (i.e.,get-DeviceId(),getLine1Number(), andgetVoiceMailNumber()),which are highlighted in blue, exist in the two methods ofdifferent malware samples. The three sensitive API calls aresequentially invoked in the two methods, thus illustrating asimilar pattern of sensitive API calls in different sampleswithin the same family.By exploiting the above two observations, we first distillprogram semantics into function call graph (FCG) represen-tation and assign different weights to different sensitive APIcalls with a term frequency-inverse document frequency (TF-IDF)-like approach (see Section II-A for details). TF-IDF is anumerical statistic that evaluates the importance of a word toa document in a collection or corpus.Then, we propose two key techniques to solve the chal-lenges (see Section II-B for details), as follows: 1) We proposea clustering-based approach to extract common malicious be-havior in each family and to address the inaccurate separationof malicious components and the legitimate part of repackagedapps. Thus, we can reduce the side-effects of the legitimatepart in the malware. 2) For the different implementationsof the same functionality, we propose a weighted-sensitive-API-call-based graph matching approach to calculate thesimilarity between graphs generated by community detectionalgorithms. Community detection algorithms are used todetermine whether or not a graph has community structureif the nodes of the graph can be easily grouped into setsof nodes, such that each set of nodes is internally denselyconnected. Our approach can detect homogeneous maliciousbehavior while tolerating minor differences in implementation,such as function renaming and junk-code insertion. SensitiveAPI calls constitute only a small portion of the entire AndroidAPI calls, and they cannot be easily obfuscated by existingtypical obfuscation techniques, whereas the names of user-defined functions are usually obfuscated asa,b, orc.To represent common malicious behaviors shared by mal-ware samples within the same family, we construct frequentsubgraphs (fregraphs), which are novel graph-based featuresextracted from generated FCGs, on the basis of two keytechniques. Moreover, we propose and develop FalDroid,an automatic system for classifying Android malware andselecting representative samples of each family in accordancewithfregraphs, in 8,100 lines of Java code and 900 linesof Python code. We apply FalDroid to 8,407 malware in 36different families and find that it exhibits impressive familialclassification performance. Moreover, it can effectively reduceworkload and accelerate malware analysis.In summary, our major contributions include the followingi) We proposefregraph, a novel graph-based feature, torepresent the common behavior of malware within thesame family. We then employfregraphto conduct mal-ware familial classification and representative malwareselection.(ii) We propose a novel weighted-sensitive-API-call-basedgraph matching approach that can detect the homo-geneous malicious behavior of malware within thesame family while tolerating minor differences inimplementation.(iii) We design and implement FalDroid, a novel systemthat can handle the familial classification of large-scaleAndroid malware with high accuracy and effectivelydecrease the number of malware to be analyzed.(iv) We conduct extensive experiments to evaluate FalDroid.Our results show that FalDroid can achieve 94.2%accuracy and only requires approximately 4.6 sec toprocess an app. Moreover, it can also dramaticallydecrease the cost of malware investigation by selectingonly 8.5% to 22% of representative samples that presentthe most malicious behavior among all samples.The remainder of this paper is organized as follows. Themethodology of FalDroid is detailed in Section II, and its twousages are presented in Section III. The experimental resultsare reported in Section IV. After providing a discussion ofthe limitations of FalDroid in Section V, we introduce relatedwork in Section VI. We conclude the paper with a discussionof future work in Section VII.

Android Malware Familial Classification

Android Malware Familial Classification

The rapid increase in the number of Androidmalware poses great challenges to anti-malware systems becausethe sheer number of malware samples overwhelms malwareanalysis systems. The classification of malware samples intofamilies, such that the common features shared by malwaresamples in the same family can be exploited in malware detectionand inspection, is a promising approach for accelerating malwareanalysis. Furthermore, the selection of representative malwaresamples in each family can drastically decrease the number ofmalware to be analyzed. However, existing classification solutionsare limited because of following reasons: First, the legitimatepart of the malware may misguide the classification algorithmsbecause the majority of Android malware are constructed byinserting malicious components into popular apps. Second, thepolymorphic variants of Android malware can evade detectionby employing transformation attacks. In this work, we propose anovel approach that constructs frequent subgraphs (fregraphs) torepresent the common behaviors of malware samples that belongto the same family. Moreover, we propose and develop FalDroid,a novel system that automatically classifies Android malwareand selects representative malware samples in accordance withfregraphs. We apply it to 8,407 malware samples from 36families. Experimental results show that FalDroid can correctlyclassify 94.2% of malware samples into their families usingapproximately 4.6 sec per app. FalDroid can also dramaticallyreduce the cost of malware investigation, by selecting only 8.5%to 22% representative samples that exhibit the most commonmalicious behavior among all samples into their families usingapproximately 4.6 sec per app. FalDroid can also dramaticallyreduce the cost of malware investigation, by selecting only 8.5%to 22% representative samples that exhibit the most commonmalicious behavior among all samplesn the third quarter of 2016, Android, the most popularmobile operating system, accounted for 86.8% of the marketshare of smartphones [1]. Meanwhile, it has become the majortarget of 97% of mobile malware [2]. A recent security reportshows that on average, 38,000 new mobile malware sampleswere captured per day during the third quarter of 2016 [3].The analysis of each malware sample requires ample time [4]–[6]. Hence, the sheer number of malware samples overwhelmsmalware analysis systems.The majority of new malware samples are polymorphicvariants of known malware [7], [8]. Thus, to acceleratemalware analysis, we can classify malware samples intovarious families and then select representative samples fromeach family. However, the familial classification of Androidmalware is challenging because of two reasonsirst, the accurate separation of malicious components Code Shoppy
Android Malware Familial Classification 

https://codeshoppy.com/android-app-ideas-for-students-college-project.html
the legitimate part from the majority of Android malware,which are repackaged popular apps, is nontrivial [9]–[12].Zhou and Jiang [7] found that 86% of Android malwaresamples are repackaged apps produced by injecting maliciouscomponents into legitimate apps. The injected maliciouscomponents are hidden within the functionalities of popularapps and usually constitute only a small portion of therepackaged apps. Differentiating between the legitimate partand malicious components of malware is difficult for existingfeatures, such as system calls [13] and sensitive path [14].Second, polymorphic variants of Android malware thatbelong to the same family perform the same maliciousactivities with different implementations. Therefore, suchmalware can easily evade existing classification solutions [15],[16] that seek an exact match of a given specification. Forexample, Listing 1 illustrates different implementations ofthe same functionality (i.e., obtain device id, phone number,and voice mail number) in two malware samples. The twomalware samples belong to the same family,geinimi. Thesebot-like malware samples steal personal information and sendit to a remote server. Three major differences (highlightedin red) are observed in the two implementations. First, thestructures of class names are different. Second, the argumentsof the two functions are different. One takes a service(Lcom/geinimi/Adservice), one of the four basic componentsof Android apps, as an argument. By contrast, the other usesan object of the classrally/eas an argument. Third, theformer function contains two more statements (including oneinvocation) than the latter