From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewGraph-Like Data Models: Structure and Querying
Key Insight
Graph data models are optimally suited for applications where data exhibits extensive many-to-many relationships and high interconnectedness. A graph fundamentally consists of vertices (nodes or entities) and edges (relationships or arcs), providing a versatile structure to model various types of data, such as social networks where people are vertices and friendships are edges, or web graphs where pages are vertices and hyperlinks are edges. Beyond homogeneous data, graphs can consistently store and relate completely different types of objects, like Facebook's graph encompassing people, locations, events, and their diverse interactions.
The property graph model, exemplified by Neo4j and Titan, defines each vertex with a unique identifier, sets of incoming and outgoing edges, and a collection of key-value properties. Each edge also possesses a unique identifier, a tail vertex, a head vertex, a label describing the relationship type, and its own collection of properties. This flexible structure allows any vertex to connect to any other, facilitates efficient graph traversal in both directions, and enables the storage of varied information within a single graph while maintaining a clear data model through distinct relationship labels.
Triple-stores, such as Datomic and AllegroGraph, offer an equivalent model, storing all information as simple (subject, predicate, object) statements. Here, the subject corresponds to a graph vertex. An object can either be a primitive datatype value, where the predicate and object form a property's key-value pair for the subject vertex, or it can be another vertex, in which case the predicate acts as an edge connecting the subject (tail) to the object (head). Declarative query languages like Cypher (for property graphs) and SPARQL (for triple-stores, building on RDF) provide powerful ways to express complex traversals, including variable-length paths (e.g., ':WITHIN*0..' in Cypher), allowing the database's query optimizer to efficiently execute complex graph queries.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.