Streaming Market Data with TimeBase

June 19, 2023

This is the first post of a series on using TimeBase to stream real-time market data. TimeBase is a high performance event-based time series database and message broker. I used it on a proprietary trading desk that made markets in futures, and currently use it to build and test equity trading strategies. It was released as open-source in February 2021.

Preface

I am not affiliated with the company that created and maintains TimeBase (Deltix, now EPAM). I’m not currently compensated by them in any way for promoting their product. I’m merely a happy user who is excited that TimeBase is open source and I can show you how to do some cool stuff with it.

What’s Your Problem?

TimeBase addresses several key needs in automated trading. You need to process large amounts of real-time market data. This includes trades, best bid/offer, and/or the entire limit order book. You use this data to calculate indicators/features, determine when and where to place orders, monitoring unrealized P&L, and monitoring/managing risk of your positions.

All data in the trading system needs to be processed quickly and absolutely must be processed in strict time order. This is not trivial when you need to interleave data from multiple sources (e.g. exchanges and the trading system itself). The System Architecture section of “The Algorithmic Trading Platform” by Prerak Sanghvi describes the benefits of using a strictly time-sequenced stream of events. To summarize:

Synchronized: every system component always receives the same data in the same order.
Observable: the system is deterministic and can be debugged offline by replaying the data.
Auditable: You can re-create the state of the system at any point in time.
Streamlined: Tasks like logging and persisting to disk can be delegated to components that are off the critical path.

The data also needs be stored for analysis and debugging. Analysis includes things like running backtests, post-trade evaluation, and investigating market behavior. “Selecting a Database for an Algorithmic Trading System” by Prerak Sanghvi discusses the necessary components of a time-series database for algorithmic trading. To summarize:

Fast data ingest: millions of records per second (quote data can be 100+ million records per day)
Ability to process large amounts of historical data for patterns and trends
Time series operations and real-time analytics (e.g. window functions, aggregations, as-of joins)
Expressive query language
Optimized on-disk layout

TimeBase vs Alternatives

Why use TimeBase instead of other open-source projects like RabbitMQ, Kafka, InfluxDB, TimeScaleDB, or ClickHouse? The main reason is that TimeBase is both a message broker and a time-series database. The TimeBase website has its own “Why TimeBase” page and pages that compare popular time-series databases and message brokers. Here’s a summary of the benefits of TimeBase from those pages:

Based on configuration, it supports microsecond latencies or the ability to handle millions of messages per second on commodity hardware.
Enforces stream schemas with heterogeneous and potentially complex message structures.
The same APIs can be used to stream real-time data and replay historical data.
Able to replicate data to other TimeBase instances or applications.
The open-source community edition has multiple crypto exchange data connectors. The enterprise edition has 50+ built-in data connectors.

TimeBase Structure

This is a high-level summary of the TimeBase architecture page.

Data connectors handle connecting to external data sources and translating their data into the TimeBase format. There are many open source crypto exchange data connectors. The enterprise edition has another 50+ data connectors to all major exchanges and many data vendors.

The message broker provides a publish/subscribe pattern to write/read streaming data. The data is processed via readers and writers to streams.

Writers can only write to one stream. Readers can consume multiple streams simultaneously and the messages from every stream are interleaved so that every message consumed is always in guaranteed time order regardless of which stream they come from. It is extremely important that every consumer receives data strictly sequenced by time!

There are two types of streams, durable and transient. Durable streams are persisted to disk. Transient streams are only in memory and can be lossy or lossless.

Writers to lossy streams are not blocked by slow readers, so slow readers may not receive every message but always receive the next available message once they finish processing a message.
Writers to lossless streams are blocked by slow readers, so every reader always receives every message and every reader can only process data as fast as the slowest reader.

The database handles reading/writing data from/to disk, importing and exporting data, replicating data to other applications, and can aggregate data to regular bars. It has a query language (QQL) you can use to extract, filter, aggregate, and transform data in streams.

There’s also an open-source Web Administrator you can use to manipulate streams (create, delete, edit, import/export). It also allows you to view data, including monitoring live data streaming in to the database.

What’s Next?

Later posts in this series will cover at least the topics below. Please leave a comment or contact me with any other things you would like to see!

Building and running TimeBase from source/Docker
Building and running the Web Administrator from source/Docker
Setting up a data connector
Introduction to the Web Administrator (viewing/monitoring data, import/export)
Introduction to QQL, the quant query language

Thanks to TheRobotJames for helpful feedback, and to Adam Butler for encouraging me to write more!

Preface

What’s Your Problem?

TimeBase vs Alternatives

TimeBase Structure

What’s Next?

Other Posts in this TimeBase Series