Robert Christensen
PhD student. Research Interest: interactive data analytics and systems.


Book Chapter  Journal  Conference  Workshop  Tech Report]

Conference

2016

  • Spatial Online Sampling and Aggregation, Talk
    By Lu Wang,    Robert Christensen,    Feifei Li,    Ke Yi
    In Proceedings of International Conference on Very Large Data Bases (VLDB 2016),  pages 84-95,  New Delhi India,  August,  2016.
    Abstract

    The massive adoption of smart phones and other mobile devices has generated humongous amount of spatial and spatio-temporal data. The importance of spatial analytics and aggregation is ever-increasing. An important challenge is to support interactive exploration over such data. However, spatial analytics and aggregation using all data points that satisfy a query condition is expensive, especially over large data sets, and could not meet the needs of interactive exploration. To that end, we present novel indexing structures that support spatial online sampling and aggregation on large spatial and spatio-temporal data sets. In spatial online sampling, random samples from the set of spatial (or spatio-temporal) points that satisfy a query condition are generated incrementally in an online fashion. With more and more samples, various spatial analytics and aggregations can be performed in an online, interactive fashion, with estimators that have better accuracy over time. Our design works well for both memory-based and disk-resident data sets, and scales well towards different query and sample sizes. More importantly, our structures are dynamic, hence, they are able to deal with insertions and deletions efficiently. Extensive experiments on large real data sets demonstrate the improvements achieved by our indexing structures compared to other baseline methods.

  • 2015

  • STORM: Spatio-Temporal Online Reasoning and Management of Large Spatio-Temporal Data (Project Website), Talk
    By Robert Christensen ,    Lu Wang,    Feifei Li,    Ke Yi,    Jun Tang,    Natalee Villa
    In Proceedings of 34th ACM SIGMOD International Conference on Management of Data (SIGMOD 2015),  pages 1111-1116,  Melbourne, Australia,  2015.
    Abstract

    We present the STORM system to enable spatio-temporal online reasoning and management of large spatio-temporal data. STORM supports interactive spatio-temporal analytics through novel spatial online sampling techniques. Online spatio-temporal aggregation and analytics are then derived based on the online samples, where approximate answers with approximation quality guarantees can be provided immediately from the start of query execution. The quality of these online approximations improve over time. This demonstration proposal describes key ideas in the design of the STORM system, and presents the demonstration plan.

  • 2013

  • Adaptive Log Compression for Massive Log Data (Project Website), Talk
    By Robert Christensen,    Feifei Li
    In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD 2013, Undergraduate Research Poster) (SIGMOD 2013),  pages 1283-1284,  June,  2013.
    Abstract

    We present novel adaptive log compression schemes. Results show 30% improvement on compression ratios over existing approaches.

  • Optimal Splitters for Temporal and Multi-version Databases, Talk
    By Wangchao Le,    Feifei Li,    Yufei Tao,    Robert Christensen
    In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD 2013),  pages 109-120,  June,  2013.
    Abstract

    Temporal and multi-version databases often generate massive amounts of data, due to the increasing availability of large storage space and the increasing importance of mining and auditing opera- tions from historical data. For example, Google now allows users to limit and rank search results by setting a time range. These data- bases are ideal candidates for a distributed store, which offers large storage space, and parallel and distributed processing power from a cluster of (commodity) machines. A key challenge is to achieve a good load balancing algorithm for storage and processing of these data, which is done by partitioning the database. In this paper, we introduce the concept of optimal splitters for temporal and multi- version databases, which induce a partition of the input data set, and guarantee that the size of the maximum bucket be minimized among all possible configurations, given a budget for the desired number of buckets. We design efficient methods for memory- and disk-resident data respectively, and show that they significantly outperform com- peting baseline methods both theoretically and empirically on large real data sets.