The emergence of Infrastructure as a Service framework brings new opportunities, which also accompanies with new challenges in auto scaling, resource allocation, and security. A fundamental challenge underpinning these problems is the continuous tracking and monitoring of resource usage in the system. In this paper, we present ATOM, an efficient and effective framework to automatically track, monitor, and orchestrate resource usage in an Infrastructure as a Service (IaaS) system that is widely used in cloud infrastructure. We use novel tracking method to continuously track important system usage metrics with low overhead, and develop a Principal Component Analysis (PCA) based approach to continuously monitor and automatically find anomalies based on the approximated tracking results. We show how to dynamically set the tracking threshold based on the detection results, and further, how to adjust tracking algorithm to ensure its optimality under dynamic workloads. Lastly, when potential anomalies are identified, we use introspection tools to perform memory forensics on VMs guided by analyzed results from tracking and monitoring to identify malicious behavior inside a VM. We demonstrate the extensibility of ATOM through virtual machine (VM) clustering. The performance of our framework is evaluated in an open source IaaS system
System event logs contain critical information for diagnosis and monitoring purposes with the growing complexity of modern computer systems. They have been frequently used as a valuable resource in data-driven approaches to enhance system health and stability. A typical procedure in system log analytics is to first parse unstructured logs to structured data, and then apply data mining and machine learning techniques and/or build workflow models from the resulting structured data. Previous work on parsing system event logs focused on offline, batch processing of raw log files. But increasingly, applications demand online monitoring and processing. As a result, a streaming method to parse unstructured logs is needed. We propose an online streaming method Spell, which utilizes a longest common subsequence based approach, to parse system event logs. We show how to dynamically extract log patterns from incoming logs and how to maintain a set of discovered message types in streaming fashion. Enhancement to find more accurate message types is also proposed. We compare Spell against two popular offline batched methods to extract patterns from system event logs on large real data. The results demonstrate that, even compared with the offline alternatives, Spell shows its superiority in terms of both efficiency and effectiveness.
We present ATOM, an efficient and effective framework to enable automated tracking, monitoring, and orchestration of resource usage in an Infrastructure as a Service (IaaS) system. We design a novel tracking method to continuously track important performance metrics with low overhead, and develop a Principal Component Analysis (PCA) approach with quality guarantees to continuously monitor and automatically find anomalies based on the approximate tracking results. Lastly, when potential anomalies are identified, we use introspection tools to perform memory forensics on virtual machines (VMs) to identify malicious behavior inside a VM. We deploy ATOM in an IaaS system to monitor VM resource usage, and to detect anomalies. Various attacks are used as an example to demonstrate how ATOM is both effective and efficient to track and monitor resource usage, detect anomalies, and orchestrate system resource usage.