TWC: Medium: TCloud: A Self-Defending, Self-Evolving and Self-Accounting Trustworthy Cloud Platform


The use of cloud computing has revolutionized the way in which cyber infrastructure is used and managed. The on-demand access to seemingly infinite resources provided by this paradigm has enabled technical innovation and indeed innovative business models and practices. This rosy picture is threatened, however, by increasing nefarious interest in cloud platforms. Specifically, the shared tenant, shared resource nature of cloud platforms, as well as the natural accrual of valuable information in cloud platforms, provide both the incentive and the possible means of exploitation.To address these concerns we are developing a self-defending, self-evolving, and self-accounting trustworthy cloud platform, the TCloud. Our approach in realizing TCloud holds to the following five tenets. First, defense-in-depth through innate containment, separation and diversification at the architectural level. Second, least authority by clear separation of functionality and associated privilege within the architecture. Third, explicit orchestration of security functions based on cloud-derived and external intelligence. Fourth, moving-target-defense through deception and dynamic evolution of the platform. Fifth, verifiable accountability through light weight validation and auditable monitoring, record keeping and analysis.Our approach to fundamentally refactor the cloud architecture to explicitly enable security related functionality lays the foundation for truly trustworthy cloud computing. Given the unrelenting push towards the use of cloud technologies our work has broad applicability across industry, healthcare, government and academia. All software we develop will be released to the community in open source form.


Funding


  • co-PI, NSF SaTC Program, 09/01/13-08/31/17, $999,991.
  • http://www.nsf.gov/awardsearch/showAward?AWD_ID=1314945

  • People


    Feifei Li
    Associate Professor


    Min Du
    PhD student. Research Interest: machine learning techniques for system mining.



    Publications


  • ATOM: Automated Tracking, Orchestration, and Monitoring of Resource Usage in Infrastructure as a Service Systems, (Project Website), Talk
    By Min Du,    Feifei Li
    In Proceedings of IEEE (IEEE BIGDATA 2015),  pages 271-278,  Santa Clara CA,  November,  2015.
    Abstract

    We present ATOM, an efficient and effective framework to enable automated tracking, monitoring, and orchestration of resource usage in an Infrastructure as a Service (IaaS) system. We design a novel tracking method to continuously track important performance metrics with low overhead, and develop a Principal Component Analysis (PCA) approach with quality guarantees to continuously monitor and automatically find anomalies based on the approximate tracking results. Lastly, when potential anomalies are identified, we use introspection tools to perform memory forensics on virtual machines (VMs) to identify malicious behavior inside a VM. We deploy ATOM in an IaaS system to monitor VM resource usage, and to detect anomalies. Various attacks are used as an example to demonstrate how ATOM is both effective and efficient to track and monitor resource usage, detect anomalies, and orchestrate system resource usage.

  • Spell: Streaming Parsing of System Event Logs, Talk
    By Min Du,    Feifei Li
    In Proceedings of In Proceedings of 16th IEEE International Conference on Data Mining (ICDM 2016),  pages 859-864,  Barcelona, Spain,  December,  2016.
    Abstract

    System event logs contain critical information for diagnosis and monitoring purposes with the growing complexity of modern computer systems. They have been frequently used as a valuable resource in data-driven approaches to enhance system health and stability. A typical procedure in system log analytics is to first parse unstructured logs to structured data, and then apply data mining and machine learning techniques and/or build workflow models from the resulting structured data. Previous work on parsing system event logs focused on offline, batch processing of raw log files. But increasingly, applications demand online monitoring and processing. As a result, a streaming method to parse unstructured logs is needed. We propose an online streaming method Spell, which utilizes a longest common subsequence based approach, to parse system event logs. We show how to dynamically extract log patterns from incoming logs and how to maintain a set of discovered message types in streaming fashion. Enhancement to find more accurate message types is also proposed. We compare Spell against two popular offline batched methods to extract patterns from system event logs on large real data. The results demonstrate that, even compared with the offline alternatives, Spell shows its superiority in terms of both efficiency and effectiveness.

  • ATOM: Efficient Tracking, Monitoring, and Orchestration of Cloud Resources (Project Website)
    By Min Du,    Feifei Li
    Vol.0, To Appear IEEE Transactions on Parallel and Distributed Systems (TPDS),  2017.
    Abstract

    The emergence of Infrastructure as a Service framework brings new opportunities, which also accompanies with new challenges in auto scaling, resource allocation, and security. A fundamental challenge underpinning these problems is the continuous tracking and monitoring of resource usage in the system. In this paper, we present ATOM, an efficient and effective framework to automatically track, monitor, and orchestrate resource usage in an Infrastructure as a Service (IaaS) system that is widely used in cloud infrastructure. We use novel tracking method to continuously track important system usage metrics with low overhead, and develop a Principal Component Analysis (PCA) based approach to continuously monitor and automatically find anomalies based on the approximated tracking results. We show how to dynamically set the tracking threshold based on the detection results, and further, how to adjust tracking algorithm to ensure its optimality under dynamic workloads. Lastly, when potential anomalies are identified, we use introspection tools to perform memory forensics on VMs guided by analyzed results from tracking and monitoring to identify malicious behavior inside a VM. We demonstrate the extensibility of ATOM through virtual machine (VM) clustering. The performance of our framework is evaluated in an open source IaaS system