Zhuoyue Zhao
PhD Student, Research Interests: distributed systems,big data analytics.


Book Chapter  Journal  Conference  Workshop  Tech Report]

Conference

2016

  • Wander Join: Online Aggregation via Random Walks, Talk
    By Feifei Li,    Bin Wu,    Ke Yi,    Zhuoyue Zhao
    In Proceedings of 35th ACM SIGMOD International Conference on Management of Data (SIGMOD 2016),  pages 615-629,  June,  2016.
    Abstract

    Joins are expensive, and online aggregation over joins was proposed to mitigate the cost, which offers users a nice and flexible tradeoff between query efficiency and accuracy in a continuous, online fashion. However, the state-of-the-art approach, in both internal and external memory, is based on ripple join, which is still very expensive and even needs unrealistic assumptions (e.g., tuples in a table are stored in random order). This paper proposes a new approach the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph. Wander join produces independent but non-uniform samples, which gives huge performance gains over the uniform but non-independent samples returned by ripple join. We also design an optimizer that chooses the optimal plan for conducting the random walks without having to collect any statistics a priori. Wander join works for different types of joins including chain, acyclic, cyclic joins, with selection predicates and group-by clauses. It easily extends to $theta$-joins as well. Extensive experiments using the TPC-H benchmark have demonstrated the superior performance of wander join over ripple join in both internal and external memory. We have also implemented and tested wander join in the latest version of PostgreSQL; the results show its excellent efficiency and effectiveness in a full fledged, commercial level DBMS.

  • Wander Join: Online Aggregation For Joins, Talk
    By Feifei Li,    Bin Wu,    Ke Yi,    Zhuoyue Zhao
    In Proceedings of 35th ACM SIGMOD International Conference on Management of Data 2016 (Demo Paper) (SIGMOD 2016),  pages 2121-2124,  June,  2016.
    Abstract

    Joins are expensive, and online aggregation over joins wasproposed to mitigate the cost, which offers a nice and flexible tradeoff between query efficiency and accuracy in a continuous, online fashion. However, the state-of-the-art approach, in both internal and external memory, is based on ripple join, which is still very expensive and may also need very restrictive assumptions (e.g., tuples in a table are stored in random order). We have introduced a new approach, the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph. We have also implemented and tested wander join in the latest version of PostgreSQL; this is the first time online aggregation has been integrated into a full fledged, commercial level DBMS.