Cloud Data Management


A new approach to causal consistency.

Geo-replicated systems are used by applications such as web portals to improve response time. This is possible because content replicas are stored on local servers and then synced by using heavily compressed data updates. Although geo-replication provides fault tolerance in reliable distributed systems and allows for low latency, it implies a tradeoff between ease of programmability and performance. The objective of geo-replication, therefore, is to achieve the strongest possible consistency for an always-available system by hitting a “sweet spot” in the programmability vs. performance tradeoff.

But this is difficult to achieve. The design efficiency of a geo-replicated data platform is measured in terms of consistency guarantees. On the one hand, a strong consistency incurs high latency and does not tolerate network partitions; on the other, eventual consistency results in good performance and tolerates partitions, but it is hard to program. Balanced between the two extremes is a more attractive consistency model called causal consistency. However, the existing implementations of causal consistency suffer from several limitations. They result in computational overheads and delayed visibility of new data items.

Kristina Spirovska, Diego Didona, and Willy Zwaenepoel, researchers at EPFL’s Operating Systems Laboratory, have proposed a new approach to causal consistency, which they call Optimistic Causal Consistency (OCC). The main advantage of OCC over existing systems is that it maximizes the freshness of data returned to clients and augments resource efficiency. The tradeoff is a negligible impact on performance.

The researchers have evaluated the effectiveness of OCC vis-à-vis conventional approaches toward causal consistency. The tests were conducted on the grounds of scalability, response time, sensitivity to write intensity, and freshness of data presented to clients. The results showed that OCC enabled higher resource efficiency, which accounted for a much better performance than that of conventional approaches.

The research team has implemented OCC in a new system called POCC. Taking their work forward, they plan to engage in quantitative analysis of the performance and behavior of POCC in the presence of network partitions and full data center failures.

Suggested readings