Data modeling 101

What is data modeling?

Data modeling is the process of creating a visual representation of the structure and relationship between different types of data. It involves analyzing and organizing the data and then creating one or more diagrams that show how different pieces of data relate to each other.

By organizing and categorizing unstructured information, data modeling allows you to make sense of the data, much like building a Lego castle from a giant collection of Lego bricks. The first step is to group the bricks by their features and properties, such as minifigures, building blocks, and decorations. These groups can then be divided into sub-groups based on additional features like color or size.

By categorizing the bricks in this way, we gain a better understanding of what we have to work with and how we can best use them to achieve our goal (building a cool castle 🏰). Additionally, we can identify relationships between different groups of bricks, such as how a minifigure is made up of various smaller parts like a head, torso, arms, and legs (see figure 1).

This process is similar to data modeling, where we identify entities and their attributes, and define the relationships between them. This helps us to understand the data we have and how to use it effectively to achieve our goals.

Why does it matter?

A strong data model is essential for the success of any project and enables you to adapt to changes in ever-evolving application requirements. In addition, a well-designed data model is crucial for optimal performance. An inefficient model can hinder your ability to utilize your data at its full potential.

Data modeling is a critical process that enables you to make sense of large amounts of information. Without a good data model, it can be challenging to interpret what the data means or how to use it effectively.

To illustrate, imagine a complex Lego build with thousands of pieces. Without instructions it would be nearly impossible to assemble it as it was designed. Similarly, a data model serves as a guide for the data, clarifying how different pieces of information fit together and relate to each other.

A robust data model allows us to:

  1. Organize data logically: By grouping similar pieces of information together and labeling them appropriately, we can easily locate and utilize the data we need.

  2. Analyze data effectively: A well-designed data model simplifies the process of querying data to uncover insights and patterns that might be difficult to detect otherwise.

  3. Make informed decisions: With a clear understanding of the data, we can make better decisions based on the insights we gain.

In short, data modeling is crucial because it transforms raw data into actionable and valuable information.

Types of data models

There are different types of data models, such as conceptual, logical, and physical models, which can be used at different stages of the data modeling process.

Conceptual 🎨

In the first stage of data modeling, the conceptual data model is developed to identify the core processes that the application needs to support. This is often done in collaboration with business experts and stakeholders. The conceptual data model is an abstract representation of the relevant entities and their relationships, and it's used to communicate with people at any technical level.

Logical 🧠

The logical data model complements the conceptual data model and describes how the data will be structured. Based on the input and feedback from business experts and stakeholders, the data modeler extends the conceptual model with entity attributes and necessary relationships not defined in the conceptual model.

Physical 🗄️

The physical data model represents how the database stores the data model in a database management system. It specifies data types for the attributes as well as primary and foreign keys, constraints, and indexes, and is modeled by a developer.

Appfarm Create uses a hybrid of the conceptual and logical data model in the Global Data Model. The physical data model is handled automatically by the platform.

Relational databases vs. document databases

There are two common approaches to storing data in a database, relational databases and document databases.

A relational database and a document database differ in their ability to store structured and unstructured data, respectively. In a relational database, data is organized in tables with pre-defined columns and rows that enforce relationships between entities. On the other hand, a document database stores unstructured data as JSON-like documents within collections, making it self-describing and free from a pre-defined structure. Therefore, there is no need for a defined table reference, as the data itself provides context and can vary in structure.

Relational table structure

A relational database design for a data model for books and their authors could look like the tables below.

Author

IDFirst NameLast Name

1001

J.R.R.

Tolkien

1002

Stephen

King

1003

Mark

Twain

1004

Virginia

Woolf

Book

IDTitleGenreAuthor

2001

The Hobbit

Fantasy

1001

2002

The Adventures of Huckleberry Finn

Fiction

1003

2003

To the Lighthouse

Modernism

1004

2004

The Lord of the Rings

Fantasy

1001

Document structure

In contrast, document databases store data as documents, where each document represents a specific entity and contains all the information related to that entity in a nested data structure. This type of database is best suited for unstructured or semi-structured data, where the data does not have a clear and defined structure.

[
  {
    "id": "1001",
    "firstName": "J.R.R.",
    "lastName": "Tolkien",
    "books": [
      { "id": "2001", "title": "The Hobbit", "genre": "Fantasy" },
      { "id": "2004", "title": "The Lord of the Rings", "genre": "Fantasy" }
    ]
  },
  {
    "id": "1002",
    "firstName": "Stephen",
    "lastName": "King",
    "books": []
  },
  {
    "id": "1003",
    "firstName": "Mark",
    "lastName": "Twain",
    "books": [
      {
        "id": "2002",
        "title": "The Adventures of Huckleberry Finn",
        "genre": "Fiction"
      }
    ]
  },
  {
    "id": "1004",
    "firstName": "Virginia",
    "lastName": "Woolf",
    "books": [
      {
        "id": "2003",
        "title": "To the Lighthouse",
        "genre": "Modernism"
      }
    ]
  }
]

Appfarm Create uses a document database, but adds a layer on top through the Global data model to provide relational capabilities (referential integrity, foreign keys), delivering the best of both worlds.

Key concepts in relational data modeling

Relational data modeling is a crucial process that involves a number of essential concepts for creating an efficient database design.

Entity

An entity is a “thing” or object that we want to keep track of in our database. It can be a person, a place, a thing, or an event.

In Appfarm Create, an entity is described with an object class.

Attribute

An attribute is a characteristic of an entity. It describes some aspect of the entity, such as its name, age, or address.

In Appfarm Create, an attribute is an object class property.

Data type

A data type defines the type of data that can be stored in a property. Common data types include text, number, date, and boolean.

In Appfarm Create, the data type is a part of the definition of an object class property. Simple data types such as boolean, string, integer, and datetime are supported, as well as enums and references to other object classes.

Relationships

A relationship describes how entities are related to each other. For example, a customer can have many orders, and each order belongs to only one customer. Closely connected to this concept is cardinality, which describes the number of instances of one entity that can be related to another entity. This is usually expressed with the terms one or many.

One-to-one

A one-to-one relationship is probably the least common relationship. It defines that there can only be one record in table A associated with a specific record in table B. This can be interpreted as a bijective function from a mathematics perspective.

A real-world example for this scenario is the connection between a human and its brain. A human can only have one brain and a brain can only be connected to one human, and this relationship is constant. There’s (at least for now) no possibility for someone to have their brain replaced or duplicated.

Another, more relevant, example is the connection between a user and their user settings. One user has their set of settings, and a set of settings belongs to exactly one user, as seen in figure 2.

In data modeling this relationship is not that commonly used, and is often modeled as an ordinary one-to-many relationship instead. This provides greater flexibility should the requirements for either brains or user settings change in the future.

One-to-many

A one-to-many relationship is very common. It defines that a record in table A can only be associated with a specific record in table B, while a record in table B can have multiple records in table A associated with it.

As an example, when a customer places an order in an online store, they might want to purchase different items at the same time. To avoid requiring one order for each item, you can create an order object holding the details of that particular order (including the customer, the state of the order, the order date, etc.), while each item is represented as a single order line with a reference to the order object.

A one-to-many relationship is illustrated in figure 3 with a data model consisting of Order and Order Line. An order line can be connected to only one order through its property Order. An order can have any number of order lines referencing it.

Many-to-many

The last type of relationship is the many-to-many-relationship. It defines that multiple records in table A can be connected to multiple records in table B.

To illustrate this visually, imagine drawing lines between books and authors. Each book may have lines connecting it to multiple authors, and each author may have lines connecting them to multiple books.

This is illustrated in figure 4 with a data model consisting of Book, Author and Written By. A Book can be written by multiple authors and an Author can write multiple books. This is modeled as Written By which combines these two object classes and represents the relationship between books and authors. This object class can also be extended to contain more metadata about the relationship (e.g. number of pages that a particular author has written of a particular book).

You may find this guide useful if you want practical examples of how to use many-to-many relationships in Appfarm Create.

Primary key

A primary key is a unique identifier for an entity in a table. It is used to ensure that each record in the table can be uniquely identified.

In Appfarm Create, the primary key is stored in the object class property ID.

Foreign key

A foreign key is a field in a table that refers to the primary key of another table. It is used to establish a relationship between the two tables.

In Appfarm Create, the foreign key is represented as an object class property referencing another object class. In the diagram view of the data model designer it is visualized with an arrow pointing from the property to the referenced object class (Book and Author in the Written By object class are foreign keys in figure 4). The object that has the property of a referenced object class will store the ID of the referenced object as seen in the Written By table below, which references both the Book object class and the Author object class.

Book

ID [primary key]TitleNumber of Pages

6376344c01ed7ce7fb4f1967

Data Modeling 101

1337

Author

ID [primary key]First NameLast Name

6376347f4a2bd917bc5128f2

App

Farmer

Written By

ID [primary key]Book [foreign key]Author [foreign key]

637f9382462b2a1d0e6b823a

6376344c01ed7ce7fb4f1967

6376347f4a2bd917bc5128f2

Normalization

Normalization is the process of organizing data in a database to minimize redundancy and ensure data integrity. It involves breaking down tables into smaller, more specific tables and establishing relationships between them.

Please read this article if you want to learn more about the concepts of database normalization.

By understanding these key concepts, you can create a well-structured and organized relational database that accurately represents the data you want to store, allowing you to build elegant and efficient apps.

Summary

In summary, data modeling is the process of organizing and creating a visual representation of data to help make sense of it. A good data model is essential to the success of a project as it helps understand what the data means and how it can be used effectively.

There are different types of data models, such as conceptual, logical, and physical models, used at different stages of the data modeling process. Relational databases store structured data while document databases store unstructured data. Appfarm Create provides a hybrid of the conceptual and logical data model and adds a layer on top of the non-relational document database, providing it with relational capabilities.

Last updated