Yan Zheng
PhD student


Book Chapter  Journal  Conference  Workshop  Tech Report]

Journal

2013

  • Geometric Inference on Kernel Density Estimates. (Project Website)
    By Jeff M. Phillips,   Bei Wang,   Yan Zheng,   
    Vol.abs/1307.7760, CoRR (CORR 2013),  2013.
    Abstract
  • Conference

    2015

  • Geometric Inference on Kernel Density Estimates. (Project Website)
    By Jeff M. Phillips,   Bei Wang,   Yan Zheng,   
    In Proceedings of Symposium on Computational Geometry (COMPGEOM 2015),  pages 857-871,  2015.
    Abstract
  • Subsampling in Smoothed Range Spaces. (Project Website)
    By Jeff M. Phillips,   Yan Zheng,   
    In Proceedings of ALT (ALT 2015),  pages 224-238,  2015.
    Abstract
  • L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data. (Project Website)
    By Yan Zheng,   Jeff M. Phillips,   
    In Proceedings of KDD (KDD 2015),  pages 1533-1542,  2015.
    Abstract
  • 2013

  • Quality and Efficiency for Kernel Density Estimates in Large Data, Talk
    By Yan Zheng,    Jeffrey Jestes,    Jeff M. Phillips,    Feifei Li
    In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD 2013),  pages 433-444,  June,  2013.
    Abstract

    Kernel density estimates are important for a broad variety of applications including media databases, pattern recognition, computer vision, data mining, and the sciences. Their con- struction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more ef- ficient than previous algorithms. Our algorithms do not re- quire knowledge of the kernel or its bandwidth parameter and are easily parallelizable. We demonstrate how to imple- ment our ideas in a centralized setting and in MapReduce, although our algorithms are applicable to any large-scale data processing framework. Extensive experiments on large real datasets demonstrate the quality, efficiency, and scala- bility of our techniques.