Direct PhD
IIT Madras
Jan 2015 – Present
Malware Behavior-as-a-Service
An in-depth analysis of the impact of malware across multiple layers of cyber-connected systems is crucial for confronting evolving cyber-attacks. Gleaning such insights requires executing malware samples in analysis frameworks and observing their run-time characteristics. However, the evasive nature of malware, its dependence on real-world conditions, Internet connectivity, and short-lived remote servers to reveal its behavior, and the catastrophic consequences of its execution, pose significant challenges in collecting its real-world run-time behavior in analysis environments.
In this context, we propose JUGAAD, a malware behavior-as-a-service to meet the demands for the safe execution of malware. Such a service enables the users to submit malware hashes or programs and retrieve their precise and comprehensive real-world run-time characteristics. Unlike prior services that analyze malware and present verdicts on maliciousness and analysis reports, JUGAAD provides raw run-time characteristics to foster unbounded research while alleviating the unpredictable risks involved in executing them. JUGAAD facilitates such a service with a back-end that executes a regular supply of malware samples on a real-world testbed to feed a growing data-corpus that is used to serve the users. With heterogeneous compute and Internet connectivity, the testbed ensures real-world conditions for malware to operate while containing its ramifications. The simultaneous capture of multiple execution artifacts across the system stack, including network, operating system, and hardware, presents a comprehensive view of malware activity to foster multi-dimensional research. Finally, the automated mechanisms in JUGAAD ensure that the data-corpus is continually growing and is up to date with the changing malware landscape.
Real-World Datasets for AI Powered Run-time Detection of Cyber-Attacks
Artificial Intelligence-based techniques on malware run-time behavior have emerged as a promising tool in the arms race against sophisticated and stealthy cyber-attacks. While data of malware run-time features are critical for research and benchmark comparisons, unfortunately, there is a dearth of real-world datasets due to multiple challenges to their collection. The evasive nature of malware, its dependence on connected real-world conditions and active remote servers to execute, and its potential repercussions pose significant challenges for executing malware in laboratory settings. Consequently, prior open datasets rely on isolated virtual sandboxes to run malware, resulting in data that is not representative of malware behavior in the wild.
This paper presents RaDaR, an open real-world dataset for run-time behavioral analysis of Windows malware. RaDaR is collected by executing malware on a real-world testbed with Internet connectivity and in a timely manner, thus providing a close-to-real-world representation of malware behavior. To enable an unbiased comparison of different solutions and foster multiple verticals in malware research, RaDaR provides a multi-perspective data collection and labeling of malware activity. The multi-perspective collection provides a comprehensive view of malware activity across the network, operating system (OS), and hardware. On the other hand, the multi-perspective labeling provides four independent perspectives to analyze the same malware, including its methodology, objective, capabilities, and the information it exfiltrates. To date, RaDaR includes 7 million network packets, 11.3 million OS system call traces, and 3.3 million hardware events of 10,434 malware samples having different methodologies (3 classes) and objectives (9 classes), spread across 30 well-known malware families.
Early Malware Detection
A malware goes through multiple stages in its life-cycle at the target machine before mounting its expected attack. The entire life-cycle can span anywhere from a few weeks to several months. The network communications during the initial phase could be the earliest indicators of a malware infection. While prior works have leveraged network traffic, none have focused on the temporal analysis of how early can the malware be detected. The main challenges here are the difficulty in differentiating benign-looking malware communications in the early stages of the malware life-cycle. In our quest to build an early warning system, we analyze malware communications to identify such early indicators.
Case-sensitive Malware Detection
Malware programs differ in their objectives and threat levels that range from mere pop-ups to financial losses and fatal sabotages. Consequently, malware differ in their functionality leaving varying footprints in different system components, namely Network, Operating System (OS), and Hardware. To detect malware, most contemporary run-time solutions make use of generic models that utilize the program behavior at one of these components, missing critical information that can be collected from the other system components.
In this paper, we propose a malware detection framework called SUNDEW, that takes a cross-dimensional view of malware behavior considering run-time traces in the network, OS, and hardware components. SUNDEW makes use of a bouquet of predictors, each tuned on data from a specific malware class in a particular system component. SUNDEW then uses a hierarchical strategy to aggregate the predictions taking into consideration the threat-level of a malware class and the dynamic noise behavior in the system components.
We evaluate SUNDEW on a cross-dimensional behavioral dataset of more than 10,000 malware samples, from 8 malware classes, collected on a real-world testbed. SUNDEW achieves an F1-Score of 1 for most malware classes, an average F1-Score of 0.93 in precisely detecting any malware class, and 0.82 under highly noisy conditions, with an average overhead as low as 1.5% at the end-host machines.
Early mitigation of DDoS Attacks
Volumetric Distributed Denial of Service (DDoS) attacks are a significant concern for information technology-based organizations. These attacks result in significant revenue losses in terms of wastage of resources and unavailability of services at the victim (e.g., business websites, DNS servers, etc.) as well as the Internet Service Providers (ISPs) along the path of the attack. The state-of-the-art DDoS mitigation mechanisms attempt to alleviate the losses at either the victim or the ISPs, but not both. In this paper, we present Net-Police, which is a traffic patrolling system for DDoS mitigation. Net-Police identifies the sources of attack so that filters can be employed at these sources in order to quickly mitigate the attack. Such a solution effectively prevents the flow of malicious traffic across the ISP networks, thereby benefiting the ISPs also. Net-Police patrols the network by designating a small number of routers as dynamic packet taggers, to prune benign regions in the network, and localize the search to the Autonomous Systems (AS) from which the attack originates. We evaluate the proposed solution on 257 real-world topologies from the Internet Topology Zoo library and the Internet AS level topology. The paper also presents details of our hardware test-bed platform consisting of 30 routers on which network services such as Net-Police can be implemented and studied for on-field feasibility. Our experiments reveal that Net-Police performs better than the state-of-the-art cloud-based and traceback-based solutions in terms of ISP bandwidth savings and availability of the victim to legitimate clients.