Generating DBMS engines for high-performance data processing
Prof. Christoph Koch ~ Project Website
DBToaster aggressively compiles aggregate queries to incremental (or delta-) form, enabling stream data to be processed highly efficiently, in contrast to operator-centric query plan interpreters.
DBToaster creates query engines for embedding into applications that require real-time, low-latency data processing and monitoring capabilities, optimizing DBToaster-generated engines for long-lived queries, where query results must be kept up-to-date with rapidly changing input data. Using database terminology, DBToaster engines maintain in-memory materialized views. Our performance claims refer to the speed at which DBToaster engines refresh views as the input data changes.
- Algorithmic Trading
- Real-time Data Warehousing
- Network/Cluster Monitoring
- Clickstream Analysis
The DBToaster compiler accepts queries written in SQL, and generates query engine code that can be incorporated directly into any C++ or Scala project (with support for more languages on the way). DBToaster-generated engines use each platform’s native collection types, easily integrating with existing projects.
We have found DBToaster to be a good solution for:
- Maintaining materialized views of complex SQL queries,
- Caring about very high refresh rates / low refresh latencies, and
- Standing / continuous (as opposed to ad-hoc) queries (i.e. you want to monitor the changing result of a given query over time, as the data changes, and do not want to work with extremely large datasets (please note that this is a temporary restriction until we release our parallel/secondary storage runtimes).
DBToaster turns a set of queries into efficient specialized code for processing just these queries. It also generates code that can be linked into applications without additional software (such as a separate database server or CEP engine). Consequently, DBToaster is a very lightweight way of including fixed (parameterized) SQL queries in applications.
Traditional relational databases are slow because they are designed to support arbitrary hand-written queries. Nowadays though, few people execute queries directly. Most queries are generated automatically based on templates. DBToaster custom-tailors each engine it creates to the needs of a specific application. This engine supports only query processing functionality that the application requires, avoiding the overhead of supporting unnecessary features.
DBToaster also employs an innovative technique that exploits incrementality to efficiently maintain query results in real-time as data changes. As a consequence, DBToaster-generated engines provide extremely low-latency access to query results, and efficiently support result value monitoring.
DBToaster-generated code is typically by a factor of 3-4 orders of magnitude faster than existing state-of-the-art data management systems when the time it takes to refresh a view given an update to the base data is measured.