Enabling analytics on big graphs using small clusters of machines
Prof. Willy Zwaenepoel ~ Project Website
Analytics over large graphs is an important and challenging part of the big-data problem: consider for example, the recent observation that any two users of a very popular social networking site are separated by at most 4.75 friends on average. X-Stream is a system being built at LABOS that enables analytics on big graphs using small clusters of machines.
X-Stream is based on the philosophy that sequential access to data works best for all storage mediums: main memory, SSD and magnetic disk. It is therefore built around a storage-centric and streaming philosophy: rearrange graph algorithms to stream data from storage. X-Stream is a large and complex system, currently in the region of 10K lines of code and growing fast. Its foundations range across IO complexity theory, operating system storage stacks, storage device and processor architecture, and finally graph algorithms. We also expect to touch areas such as programming languages in the near future as we explore layering domain specific languages on top of X-Stream.
At its limit, X-Stream is currently capable of analytics on graphs with upwards of 32 billion edges on a *single machine using only a 3TB magnetic disk*, a capacity we expect to grow significantly. This already puts X-Stream in some select company as only large and expensive clusters or supercomputers have hitherto been able to handle graphs that big.
X-Stream is supported by a Hasler foundation grant for “Cache conscious graph processing”.