Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 2: Data Models and Query Languages
Key Insight 4 from this chapter

Graph-Like Data Models: Structure and Querying

Key Insight

Graph data models are optimally suited for applications where data exhibits extensive many-to-many relationships and high interconnectedness. A graph fundamentally consists of vertices (nodes or entities) and edges (relationships or arcs), providing a versatile structure to model various types of data, such as social networks where people are vertices and friendships are edges, or web graphs where pages are vertices and hyperlinks are edges. Beyond homogeneous data, graphs can consistently store and relate completely different types of objects, like Facebook's graph encompassing people, locations, events, and their diverse interactions.

The property graph model, exemplified by Neo4j and Titan, defines each vertex with a unique identifier, sets of incoming and outgoing edges, and a collection of key-value properties. Each edge also possesses a unique identifier, a tail vertex, a head vertex, a label describing the relationship type, and its own collection of properties. This flexible structure allows any vertex to connect to any other, facilitates efficient graph traversal in both directions, and enables the storage of varied information within a single graph while maintaining a clear data model through distinct relationship labels.

Triple-stores, such as Datomic and AllegroGraph, offer an equivalent model, storing all information as simple (subject, predicate, object) statements. Here, the subject corresponds to a graph vertex. An object can either be a primitive datatype value, where the predicate and object form a property's key-value pair for the subject vertex, or it can be another vertex, in which case the predicate acts as an edge connecting the subject (tail) to the object (head). Declarative query languages like Cypher (for property graphs) and SPARQL (for triple-stores, building on RDF) provide powerful ways to express complex traversals, including variable-length paths (e.g., ':WITHIN*0..' in Cypher), allowing the database's query optimizer to efficiently execute complex graph queries.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.