Dong Xie
PhD student. Research Interest: big spatial data systems. Distributed systems.


Book Chapter  Journal  Conference  Workshop  Tech Report]

Journal

2016

  • Exact and Approximate Flexible Aggregate Similarity Search
    By Feifei Li,    Ke Yi,    Yufei Tao,    Bin Yao,    Yang Li,    Dong Xie,    Min Wang
    Vol.25(4), Pages 317-338, The International Journal on Very Large Data Bases (VLDBJ),  2016.
    Abstract

    Aggregate similarity search, also known as aggregate nearest neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the simi- larity is an aggregation (e.g., sum, max) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggre- gation over the distances between p and any subset of phi objects in Q for some support 0 < phi le 1. We call this new definition flexible aggregate similarity search, and accordingly refer to a query as a flexible aggre- gate nearest neighbor (Fann) query. We present algo- rithms for answering Fann queries exactly and approx- imately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experi- ments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality.

  • Conference

    2016

  • Simba: Spatial In-Memory Big Data Analytics (Project Website)
    By Dong Xie,    Feifei Li,    Bin Yao,    Gefei Li,    Liang Zhou,    Zhongpu Chen,    Minyi Guo
    In Proceedings of In Proceedings of 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2016),  pages TBA,  San Francisco, USA,  November,  2016.
    Abstract

    We present the Simba (Spatial In-Memory Big data Analytics) system, which offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba natively extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and DataFrame API. It enables the construction of indexes over RDDs inside the engine in order to work with big spatial data and complex spatial operations. Simba also comes with an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput in big spatial data analysis. This demonstration proposal describes key ideas in the design of Simba, and presents a demonstration plan.

  • Oblivious RAM: A Dissection and Experimental Evaluation (Project Website), Talk
    By Zhao Chang,    Dong Xie,    Feifei Li
    In Proceedings of Very Large Data Bases (VLDB 2016),  pages 1113-1124,  New Delhi, India,  September,  2016.
    Abstract

    Many companies choose the cloud as their data and IT infrastructure platform. The remote access of the data brings the issue of trust, and the potential risk of compromising sensitive information should not be underestimated. Despite the use of strong encryption schemes, adversaries can still learn valuable information regarding encrypted data by observing the data access patterns. To that end, one can hide the access patterns, which may leak sensitive information, using Oblivious RAMs (ORAMs). Numerous works have proposed different ORAM constructions. Nevertheless, many such ORAM constructions are of only theoretical interest, hence, are notuseful in practice. Several more practical ORAM constructions do exist, but they have never been thoroughly compared against and tested on large databases. There are no open source implementation of these schemes, making such a study challenging to carry out (since most ORAMs are quite contrived in terms of both theoretical analysis and practical implementations).These limitations make it difficult for researchers and practitioners to choose and adopt a suitable ORAM for their applications. To address this issue, we provide a thorough study over several practical ORAM constructions, and implement them under the same library. We perform extensive experiments to provide insights into their performance characteristics with respect to efficiency, scalability, and communication cost. Lastly, we plan to release our ORAM implementations through GitHub so that the communities at large may benefit from and contribute to an open source ORAM library under one unified framework.

  • Simba: Efficient In-Memory Spatial Analytics (Project Website), Talk
    By Dong Xie,    Feifei Li,    Bin Yao,    Gefei Li,    Liang Zhou,    Minyi Guo
    In Proceedings of 35th ACM SIGMOD International Conference on Management of Data (SIGMOD 2016),  pages 1071-1085,  June,  2016.
    Abstract

    Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are disk-based and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces the concept and construction of indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba%u2019s superior performance compared against other spatial analytics system.