From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewChange Data Capture and Event Sourcing
Key Insight
The fundamental connection between databases and streams arises from viewing database writes as events. A replication log, for instance, is essentially a stream of database write events that leaders produce and followers apply to maintain consistent state. This insight leads to the concept of 'Change Data Capture' (CDC), which involves observing all data changes written to a database and extracting them as an immediate stream, enabling real-time replication to other systems. CDC addresses the challenge of synchronizing data across heterogeneous systems, such as OLTP databases, caches, search indexes, and data warehouses, by making one database the 'leader' and turning others into 'followers,' ensuring consistency if changes are applied in the correct order.
The alternative of 'dual writes,' where application code writes to multiple systems simultaneously, is problematic due to race conditions leading to permanent inconsistencies and fault-tolerance issues if one write fails. CDC, especially when integrated with log-based message brokers that preserve message order, provides a robust solution. While database triggers can implement CDC, parsing the internal replication log (e.g., MySQL binlog, MongoDB oplog) is often more robust. CDC typically operates asynchronously, meaning the source database commits changes without waiting for consumers, which can introduce replication lag.
For new derived data systems, an 'initial snapshot' of the database is required, corresponding to a known offset in the change log from which subsequent changes are applied. 'Log compaction' offers an elegant alternative to repeated snapshots; if changes are key-value updates, compaction retains only the most recent value for each key, effectively providing a full, up-to-date copy of the database within the log itself. Beyond CDC, 'event sourcing' applies similar principles at a higher abstraction level: application logic is built on immutable, application-level events written to an append-only log, explicitly representing user actions rather than low-level state changes. This approach enhances evolvability, debuggability, and auditing, treating mutable application state as a deterministic derivation from this immutable event log.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.