From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewRepresenting Many-to-Many Relationships
Key Insight
The effective representation of many-to-many relationships is a crucial differentiator among data models, historically serving as a key strength of the relational model. While document databases handle one-to-many relationships by nesting data (e.g., a user having multiple job positions within a single profile document), true many-to-many scenarios where entities are shared (e.g., a person residing in a region, or working for an organization, where both regions and organizations are separate, referencable entities) reveal limitations. Directly embedding such shared entities leads to data duplication, which introduces consistency risks and update overheads.
To avoid duplication and ensure data consistency, normalization is employed, where meaningful human-readable information is stored once, and other records refer to it via an abstract ID (e.g., 'region_id' instead of a text string 'Greater Seattle Area'). This approach offers benefits like consistent styling, ambiguity avoidance, simplified updates, localization support, and improved searchability. However, the resulting many-to-one and many-to-many relationships, which are naturally handled by joins in relational databases, pose a challenge for document databases that often lack robust join support, forcing application code to emulate joins through multiple queries, increasing complexity and typically degrading performance.
The issues faced by document databases with interconnected data echo historical challenges encountered by early hierarchical database systems like IBM's Information Management System (IMS) from the 1960s, which also struggled with many-to-many relationships and lacked native join capabilities. This historical parallel highlights that while document models excel for self-contained, tree-structured data, applications with evolving and highly interconnected data—where features like linking users to organizations, schools, or recommendations create complex many-to-many relationships—are better served by data models specifically designed for such structures. Graph databases, with their explicit modeling of vertices and edges for relationships, naturally accommodate and efficiently query these intricate connections, offering a more intuitive and performant solution for highly relational data.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.