Big data today is stored in a distributed fashion across many different machines or data sources. This poses new algorithmic and system challenges to performing efficient analysis on the full data set. To address these difficulties, the PIs are building the MIDDLE (Mergeable and Interactive Distributed Data LayEr) Summarization System and deploying it on large real-world datasets. The MIDDLE system builds and maintains a special class of summaries that can be efficiently constructed and updated while still allowing fine-grained analysis on the heavy tail. Mergeable summaries can represent any data set with a guaranteed tradeoff between size and accuracy, and any two such summaries can be merged to create a new summary with the same size-accuracy tradeoff.
![]() Guineng Zheng PhD Student. Research Interest: knowledge base construction and application. | ||