From "Designing Data-Intensive Applications"
π§ Listen to Summary
Free 10-min PreviewSchema-Driven Binary Encoding Formats: Thrift and Protocol Buffers
Key Insight
Apache Thrift and Protocol Buffers (protobuf) are binary encoding libraries developed at Facebook and Google, respectively, based on a shared principle: they require an explicit schema for all encoded data. Schemas are defined using specialized Interface Definition Languages (IDLs), such as Thrift IDL or Protocol Buffers' `message` syntax. Both frameworks provide code generation tools that translate these schemas into classes for various programming languages, enabling applications to efficiently encode and decode data.
These formats achieve compactness by omitting field names from the encoded data, instead using numeric field tags (e.g., 1, 2, 3) specified in the schema. The encoded data includes these tags, type annotations (e.g., string, integer, list), and length indicators. For example, Thrift's CompactProtocol packs field type and tag into a single byte and uses variable-length integers to represent numbers efficiently (e.g., 1337 encoded in two bytes), reducing the example record size to 34 bytes. Protocol Buffers employs a similar bit-packing strategy, encoding the same record into 33 bytes.
Schema evolution in these formats is managed through field tags: field names can change, but field tags are immutable and critical for data interpretation. Adding new fields with unique tag numbers maintains forward compatibility, as older code can simply ignore unrecognized tags based on datatype annotation. For backward compatibility, new fields must be `optional` or have default values, preventing old code from failing if the new field is absent. Removing fields requires they were `optional`, and their tags cannot be reused. Changing datatypes carries risks, such as precision loss (e.g., 64-bit to 32-bit integer conversion). Protocol Buffersβ `repeated` marker offers a flexible way to evolve single-valued fields into multi-valued lists.
π Continue Your Learning Journey β No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.