Towards Trustworthy Database Systems

Answers to database queries often form the basis for critical decision-making. To improve efficiency and reliability, answers to these queries can be provided by distributed servers close to the querying clients. However, because of the servers' ubiquity, the logistics associated with fully securing them may be prohibitive; moreover, when the servers are run by third parties, the clients may not trust them as much as they trust the original data owners. Thus, the authenticity of the answers provided by servers in response to clients' queries must be verifiable by the clients. More generally, database responses are more useful if they contain the evidence of their own correctness. For example, this enables a consumer to provide her own credit report to a creditor without having the creditor request it from the reporting agency to establish the validity of the report. This project is developing methods for authenticating the validity and authenticity of a variety of database queries, including general relational, data cube, and spatio-temporal queries. Furthermore, techniques that use powerful cryptographic primitives are being developed for providing authentication and confidentiality. This research will enable utilization of this infrastructure in applications where users must rely on the authenticity of the answer, such as in financial systems, network monitoring, traffic control, or applications yet to be imagined. The results of this project will be disseminated through publications in journals and conferences. Furthermore, source code of these methods, in the form of libraries, will be made available over the web. This is a collaborative project with the Datbase Lab at Boston University, with Prof. George Kollios and Prof. Leonid Reyzin.


  • Funded by NSF Trustworthy Computing Program udner the project ''CT-ISG: Collaborative Research: Towards Trustworthy Database Systems'', sole PI, Feifei Li, 10/01/08-09/30/11, $162,620

  • Publications

  • Randomized Synopses for Query Assurance on Data Streams (Project Website), Talk
    By Ke Yi,    Feifei Li,    Marios Hadjieleftheriou,    George Kollios,    Divesh Srivastava
    In Proceedings of 24th IEEE International Conference on Data Engineering (ICDE 2008),  pages 416-425,  Cancun, Mexico,  April,  2008.

    The overwhelming flow of information in manydata stream applications forces many companies to outsourceto a third-party the deployment of a Data Stream Management System (DSMS) for performing desired computations. Remotecomputations intrinsically raise issues of trust, making query execution assurance on data streams a problem with practicalimplications. Consider a client observing the same data stream asa remote server (e.g., network traffic), that registers a continuousquery on the server’s DSMS, and receives answers upon request.The client needs to verify the integrity of the results usingsignificantly fewer resources than evaluating the query locally.Towards that goal, we propose a probabilistic algorithm forselection and aggregate/group-by queries, that uses constantspace irrespective of the result-set size, has low update cost,and arbitrarily small probability of failure. We generalize thisalgorithm to allow some tolerance on the number of errors permitted (irrespective of error magnitude), and also discuss thehardness of permitting arbitrary errors of small magnitude. Wealso perform an empirical evaluation using live network traffic.

  • Proof-Infused Streams: Enabling Authentication of Sliding Window Queries On Streams, Talk
    By Feifei Li,    Ke Yi,    Marios Hadjieleftheriou,    George Kollios
    In Proceedings of 33rd International Conference on Very Large Databases (VLDB 2007),  pages 147-158,  Vienna, Austria,  September,  2007.

    As computer systems are essential components of many critical commercial services, the need for secure online transactions is now becoming evident. The demand for such applications, as the market grows, exceeds the capacity of individual businesses to provide fast and reliable services, making outsourcing technologies a key player in alleviating issues of scale. Consider a stock broker that needs to provide a real-time stock trading monitoring service to clients. Since the cost of multicasting this information to a large audience might become prohibitive, the broker could outsource the stock feed to third-party providers, who are in turn responsible for forwarding the appropriate sub-feed to clients. Evidently, in critical applications the integrity of the third-party should not be taken for granted. In this work we study a variety of authentication algorithms for selection and aggregation queries over sliding windows. Our algorithms enable the end-users to prove that the results provided by the third-party are correct, i.e., equal to the results that would have been computed by the original provider. Our solutions are based on Merkle hash trees over a forest of space partitioning data structures, and try to leverage key features,like update, query, signing, and authentication costs. We present detailed theoretical analysis for our solutions and empirically evaluate the proposed techniques.

  • Query Access Assurance in Outsourced Databases
    By Wangchao Le,    Feifei Li
    Vol.5, No. 2, Pages 178-191, IEEE Transactions on Services Computing (IEEE TSC),  2012.

    Query execution assurance is an important concept in defeating lazy servers in the database as a service model. We show that extending query execution assurance to outsourced databases with multiple data owners is highly inefficient. To cope with lazy servers in the distributed setting, we propose query access assurance (QAA) that focuses on IO-bound queries. The goal in QAA is to enable clients to verify that the server has honestly accessed all records that are necessary to compute the correct query answer, thus eliminating the incentives for the server to be lazy if the query cost is dominated by the IO cost in accessing these records. We formalize this concept for distributed databases, and present two efficient schemes that achieve QAA with high success probabilities. The first scheme is simple to implement and deploy, but may incur excessive server to client communication cost and verification cost at the client side, when the query selectivity or the database size increases. The second scheme is more involved, but successfully addresses the limitation of the first scheme. Our design employs a few number theory techniques. Extensive experiments demonstrate the efficiency, effectiveness and usefulness of our schemes.

  • Authenticated Index Structures for Aggregation Queries (Project Website)
    By Feifei Li,    Marios Hadjieleftheriou,    George Kollios,    Leonid Reyzin
    Vol.13, Pages 32:1-32:35, ACM Transactions on Information and System Security (ACM TISSEC),  2010.

    Query authentication is an essential component in outsourced database (ODB) systems. This arti-cle introduces efficient index structures for authenticating aggregation queries over large data sets. First, we design an index that features good performance characteristics for static environments.Then, we propose more involved structures for the dynamic case. Our structures feature excellentperformance for authenticating queries with multiple aggregate attributes and multiple selectionpredicates. Furthermore, our techniques cover a large number of aggregate types, including distributive aggregates (such as SUM, COUNT, MIN and MAX), algebraic aggregates (such as the AVG), and holistic aggregates (such as MEDIAN and QUANTILE). We have also addressed the issue of authenticating aggregation queries efficiently when the database is encrypted to protect data confidentiality. Finally, we implemented a working prototype of the proposed techniques andexperimentally validated the effectiveness and efficiency of our methods.

  • Dynamic Authenticated Index Structures for Outsourced Databases (Project Website), Talk
    By Feifei Li,    Marios Hadjieleftheriou,    George Kollios,    Leonid Reyzin
    In Proceedings of 25th ACM SIGMOD International Conference on Management of Data (SIGMOD 2006),  pages 121-132,  Chicago, USA,  June,  2006.

    In outsourced database (ODB) systems the database owner publishes its data through a number of remote servers, with the goal of enabling clients at the edge of the network to access and query the data more efficiently. As servers might be untrusted or can be compromised, query authentication becomes an essential component of ODB systems. Exist-ing solutions for this problem concentrate mostly on static scenarios and are based on idealistic properties for certain cryptographic primitives. In this work, first we define a variety of essential and practical cost metrics associated with ODB systems. Then, we analytically evaluate a number of different approaches, in search for a solution that best lever-ages all metrics. Most importantly, we look at solutions that can handle dynamic scenarios, where owners periodi-cally update the data residing at the servers. Finally, we discuss query freshness, a new dimension in data authentication that has not been explored before. A comprehensive experimental evaluation of the proposed and existing approaches is used to validate the analytical models and verify our claims. Our findings exhibit that the proposed solutions improve performance substantially over existing approaches, both for static and dynamic environments.

  • Randomized Synopses for Query Assurance on Data Streams (Project Website)
    By Ke Yi,    Feifei Li,    Marios Hadjieleftheriou,    Divesh Srivastava,    George Kollios
    AT&T Technical Report,  June,  2007.