← All posts·Engineering

Our Kafka Setup for Sub-200ms Market Data Ingestion

A walkthrough of our streaming architecture: Kafka topics per exchange, Avro schemas, exactly-once semantics, and the backfill system that saved us during the March outage.

trade ingests live market data from multiple exchanges (ASX, NYSE, Binance, Coinbase) and needs to process it into model-ready features in under 200ms. Our streaming backbone is Apache Kafka with a specific topology designed for financial data.

Each exchange gets its own Kafka topic with Avro-encoded messages. We chose Avro over JSON for two reasons: schema evolution (we can add fields without breaking consumers) and wire-format efficiency (roughly 60% smaller payloads than JSON for our tick data schema).

We use exactly-once semantics (EOS) for our feature computation pipeline. This matters because duplicate processing of tick data would corrupt our rolling statistics. The tradeoff is slightly higher latency (~15ms overhead), but correctness is worth more than speed in financial data.

The architecture that saved us: our backfill system. During the March outage, we lost connectivity to ASX for 47 minutes. When the connection restored, our backfill system detected the gap, pulled historical data from the exchange API, replayed it through the same Kafka pipeline, and recomputed all affected features. Models were back to full accuracy within 12 minutes of reconnection.

Infrastructure details: 5-node Kafka cluster, 3x replication for exchange topics, 7-day retention. Feature computation runs on Kafka Streams with RocksDB state stores. Total end-to-end latency from market event to dashboard: p50 = 85ms, p99 = 180ms.