Efficient Ranking and Aggregate Query Processing for Probabilistic Data

When dealing with massive quantities of data, ranking and aggregate queries are powerful techniques for focusing attention on the most important answers. Many applications that produce such massive quantities of data inherently introduce uncertainty in the same time, for example, probabilistic match in data integration, imprecise measurements from sensors, fuzzy duplicates in data cleaning, inconsistency in scientific data. Hence, the importance of these queries is even greater in probabilistic data, where a relation can encode exponentially many possible worlds. Uncertainty opens the gate to many possible definitions for ranking and aggregate queries. With the wide presence of probabilistic data, processing ranking and aggregate queries efficiently with the right semantics is of key importance for the successful deployment of probabilistic databases.


  • Funded by NSF IIS Program udner the project ''III:SMALL:Efficient Ranking and Aggregate Query Processing for Probabilistic Data'', sole PI, Feifei Li, 09/01/09-08/31/12, $328,831