From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewData Encoding and Compatibility Principles
Key Insight
Application features and user requirements evolve, necessitating changes to data stored. Evolvability, the ease of adapting to change, is crucial for system design. Data models like relational databases typically assume a single schema, managed via schema migrations, while schema-on-read ('schemaless') databases allow for a mix of older and newer data formats to coexist without rigid enforcement.
Changes to data formats or schemas often require corresponding application code updates. However, in large applications, code deployments, especially with rolling upgrades for server-side applications or user-controlled updates for client-side applications, mean old and new code versions, along with old and new data formats, may operate simultaneously. This demands compatibility in both directions for smooth operation.
Two types of compatibility are essential: backward compatibility, where newer code can read data written by older code (generally easier to achieve), and forward compatibility, where older code can read data written by newer code (trickier, requiring old code to ignore additions made by newer versions). This translation between in-memory data structures (optimized for CPU access) and byte sequences for storage or network transmission is termed encoding (serialization) and decoding (deserialization).
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.