Distributional Private Information Retrieval
Ryan Lehmkuhl
Abstract:
A private-information-retrieval (PIR) scheme lets a client fetch a record from a remote database without revealing which record it fetched. Classic PIR schemes treat all database records the same but, in practice, some database records are much more popular (i.e., commonly fetched) than others. We introduce distributional PIR, a new type of PIR that can run faster than classic PIR---both asymptotically and concretely---when the popularity distribution is skewed. Distributional PIR provides exactly the same cryptographic privacy as classic PIR. The speedup comes from a relaxed form of correctness: distributional PIR guarantees that in-distribution queries succeed with good probability, while out-of-distribution queries succeed with lower probability. Because of its relaxed correctness, distributional PIR is best suited for applications where "best-effort" retrieval is acceptable. Moreover, for security, a client's decision to query the server must be independent of whether its past queries were successful.
We construct a distributional-PIR scheme that makes black-box use of classic PIR protocols, and prove a lower bound on the server runtime of a natural class of distributional-PIR schemes. On two real-world popularity distributions, our construction reduces compute costs by 5-77x compared to existing techniques. Finally, we build CrowdSurf, an end-to-end system for privately fetching tweets, and show that distributional-PIR reduces the end-to-end server cost by 8x.
Bio:
Ryan is a third year PhD student in the PDOS and CSS groups at MIT working under the excellent guidance of Henry Corrigan-Gibbs in the areas of cryptography, computer security, and systems. He is supported by an MIT Sunlin and Priscilla Chou Fellowship and an NSF Graduate Research Fellowship. Before beginning his PhD, he worked at Opaque, a startup that provides a platform for collaborative data analytics on private data.