All pages
Powered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Background

If you are a new developer, or looking to refresh your knowledge, the following articles provide general background information on important development concepts.

Databases

Databases

Data modeling 101Database normalization

Database normalization

Database normalization is a process used to design tables in a relational database in order to minimize the duplication of information and to ensure good data integrity. This means that data is kept complete, accurate, consistent, and in context. Data integrity is key to ensuring the data is actually useful to its owner.

If the same information is stored in different places in multiple tables, the risk of making the database inconsistent increases significantly. If a person's address is stored in multiple tables, and the address is changed in just one table, it is no longer possible to know which address is correct just by looking at the data. The database has lost data integrity.

It is common to refer to four levels (forms) of normalization in database modeling, levels 1 to 4. There are different requirements that must be met in order to obtain a certain level. In addition, all requirements of the lower levels must be met as well.

The following example presents a simple dataset and the steps to achieve each of the normalization levels in detail, including how it can be represented in Appfarm Create.

It is worth noting that having a data model of a higher normalization level often leads to more tables (object classes in Appfarm Create). This may have an impact on performance due to the need to join different tables to find complete information. With Appfarm Create, this means more data sources with deep data bindings and complex filters. Therefore, in systems with very complex data structures, there may be an advantage in adopting a lower degree of normalization.

Example: Sales records

This example features a dataset containing customers who have bought different items from different suppliers. The customers have also subscribed to different newsletters from the different suppliers.

In order to create a data model that supports this dataset in the best possible way, we will start by looking at the first level of normalization, the first normal form.

First normal form (1NF)

To obtain a data model that is in the first normal form the following requirements must be met:

  1. Each cell must be single-valued.

  2. Entries in a column must be of the same type.

  3. Rows must be uniquely identified.

The example dataset violates the first requirement of the 1NF by having several values in a single cell in the columns Item and Newsletter. The dataset also violates the second requirement by having the values Unknown and No number in the columns Supplier and Supplier Phone. The third requirement is also not met since there is no unique identifier for each row.

The dataset below has been altered in order to satisfy the requirements for the 1NF. There are no longer cells containing more than one value, each column has data that is of the same type (Unknown and No number are replaced with proper values), and each row is now uniquely identified with the addition of the Cust _id column.

The following images show what the data model for this dataset would look like in Appfarm Create. It is a very simple data model, consisting of only one object class: Sales Records. However, this data model has several issues which can be address through further normalization.

Second normal form (2NF)

To obtain a data model that is in the second normal form the following requirement must be met:

  1. All attributes (non-ID columns) must be dependent on the key.

If you look at the non-ID columns in the dataset, are there any columns that are not dependent on the Cust _id column? For example, does the customer ID determine the price of the product? In other words, does the price depend on who buys the product? No. It does not matter who buys the product, the price will remain the same, so the dataset does not meet the requirements of the 2NF.

In order to fulfil the requirements of the 2NF, the columns that are not dependent on the key column must be separated. The Price, Item, Supplier, and Supplier Phone columns can be moved into a new table. Additionally, another new table known as a junction table, will hold the transactions.

The data model in Appfarm Create would have three object classes, with the Sales Records object class referencing both the Customer and Product object classes.

Third normal form (3NF)

To obtain a data model that is in the third normal form the following requirement must be met:

  1. All fields (columns) can be determined only by the key (_id) in the table and no other column.

Looking at the dataset, you can see that Supplier and Supplier Phone always go together. If you know the supplier, you know the supplier's phone number. If you added more items from Microsoft and Sony the phone numbers will keep repeating, meaning that if you want to change the supplier's phone number you have to change it in many places instead of just one.

To address this issue, you can create a new table to hold the supplier information. The Items table can then be updated to store a reference to a supplier instead of storing supplier information.

The data model in Appfarm Create now features four object classes, with the Product object class referencing the new Supplier object class.

Fourth normal form (4NF)

To obtain a data model that is on the fourth normal form the following requirement must be met:

  1. No multi-valued dependencies.

Looking at the Customer table in the dataset, you can see that all columns are dependent on the key (_id) column. However, there is still a problem. The customer Evan Wilson is in the Customer table twice with the same address since he has subscribed to two different newsletters. If he changes his address we would have to change it in two different places, and if he unsubscribes to the Dell news newsletter, should we just replace the value with null? What if Alan Smith in Hanover Park also subscribes to Dell news? Then his name and address will occur twice in the Customer table as well.

To address this issue the Customer table must be normalized further. You can alter the Customer table to just hold the _id, Name, and Shipping Address so that names and addresses aren't stored multiple times per user. Then, you can create a Newsletter table that holds the name of the newsletter and a reference to its supplier. This means a supplier can have multiple newsletters. Finally, you can create a new junction table that holds all the customer subscriptions to newsletters by storing references to both the customers and the newsletters.

The final data model in Appfarm Create consists of six object classes with the addition of Newsletter and Newsletter subscription object classes.

You can see that some properties are marked as Required. This means that an object of this class cannot exist without a required value set. For example, each supplier must have a name. In addition, the Phone property of the supplier class is marked as Unique which means that a phone number can only exist once. This ensures that no suppliers will have the same phone number.

Summary

By normalizing the data model each customer only appears once at the same time as a customer can buy several products and subscribe to different newsletters. The same is true for products and suppliers. Each product appears only once, but can be bought multiple times. Each supplier appears only once but can have both multiple products and newsletters.

Dataset
Dataset
Data model
Data model designer
Dataset
Data model
Data model designer
Dataset
Data model
Data model designer
Dataset
Data model
Data model designer

Data modeling 101

What is data modeling?

Data modeling is the process of creating a visual representation of the structure and relationship between different types of data. It involves analyzing and organizing the data and then creating one or more diagrams that show how different pieces of data relate to each other.

By organizing and categorizing unstructured information, data modeling allows you to make sense of the data, much like building a Lego castle from a giant collection of Lego bricks. The first step is to group the bricks by their features and properties, such as minifigures, building blocks, and decorations. These groups can then be divided into sub-groups based on additional features like color or size.

By categorizing the bricks in this way, we gain a better understanding of what we have to work with and how we can best use them to achieve our goal (building a cool castle 🏰). Additionally, we can identify relationships between different groups of bricks, such as how a minifigure is made up of various smaller parts like a head, torso, arms, and legs (see figure 1).

This process is similar to data modeling, where we identify entities and their attributes, and define the relationships between them. This helps us to understand the data we have and how to use it effectively to achieve our goals.

Why does it matter?

A strong data model is essential for the success of any project and enables you to adapt to changes in ever-evolving application requirements. In addition, a well-designed data model is crucial for optimal performance. An inefficient model can hinder your ability to utilize your data at its full potential.

Data modeling is a critical process that enables you to make sense of large amounts of information. Without a good data model, it can be challenging to interpret what the data means or how to use it effectively.

To illustrate, imagine a complex Lego build with thousands of pieces. Without instructions it would be nearly impossible to assemble it as it was designed. Similarly, a data model serves as a guide for the data, clarifying how different pieces of information fit together and relate to each other.

A robust data model allows us to:

  1. Organize data logically: By grouping similar pieces of information together and labeling them appropriately, we can easily locate and utilize the data we need.

  2. Analyze data effectively: A well-designed data model simplifies the process of querying data to uncover insights and patterns that might be difficult to detect otherwise.

  3. Make informed decisions: With a clear understanding of the data, we can make better decisions based on the insights we gain.

In short, data modeling is crucial because it transforms raw data into actionable and valuable information.

Types of data models

There are different types of data models, such as conceptual, logical, and physical models, which can be used at different stages of the data modeling process.

Conceptual 🎨

In the first stage of data modeling, the conceptual data model is developed to identify the core processes that the application needs to support. This is often done in collaboration with business experts and stakeholders. The conceptual data model is an abstract representation of the relevant entities and their relationships, and it's used to communicate with people at any technical level.

Logical 🧠

The logical data model complements the conceptual data model and describes how the data will be structured. Based on the input and feedback from business experts and stakeholders, the data modeler extends the conceptual model with entity attributes and necessary relationships not defined in the conceptual model.

Physical 🗄️

The physical data model represents how the database stores the data model in a database management system. It specifies data types for the attributes as well as primary and foreign keys, constraints, and indexes, and is modeled by a developer.

Appfarm Create uses a hybrid of the conceptual and logical data model in the . The physical data model is handled automatically by the platform.

Relational databases vs. document databases

There are two common approaches to storing data in a database, relational databases and document databases.

A relational database and a document database differ in their ability to store structured and unstructured data, respectively. In a relational database, data is organized in tables with pre-defined columns and rows that enforce relationships between entities. On the other hand, a document database stores unstructured data as -like documents within collections, making it self-describing and free from a pre-defined structure. Therefore, there is no need for a defined table reference, as the data itself provides context and can vary in structure.

Relational table structure

A relational database design for a data model for books and their authors could look like the tables below.

Author

ID
First Name
Last Name

Book

ID
Title
Genre
Author

Document structure

In contrast, document databases store data as documents, where each document represents a specific entity and contains all the information related to that entity in a nested data structure. This type of database is best suited for unstructured or semi-structured data, where the data does not have a clear and defined structure.

Appfarm Create uses a document database, but adds a layer on top through the to provide relational capabilities (referential integrity, foreign keys), delivering the best of both worlds.

Key concepts in relational data modeling

Relational data modeling is a crucial process that involves a number of essential concepts for creating an efficient database design.

Entity

An entity is a “thing” or object that we want to keep track of in our database. It can be a person, a place, a thing, or an event.

In Appfarm Create, an entity is described with an .

Attribute

An attribute is a characteristic of an entity. It describes some aspect of the entity, such as its name, age, or address.

In Appfarm Create, an attribute is an .

Data type

A data type defines the type of data that can be stored in a property. Common data types include text, number, date, and boolean.

In Appfarm Create, the is a part of the definition of an . Simple data types such as boolean, string, integer, and datetime are supported, as well as and references to other object classes.

Relationships

A relationship describes how entities are related to each other. For example, a customer can have many orders, and each order belongs to only one customer. Closely connected to this concept is cardinality, which describes the number of instances of one entity that can be related to another entity. This is usually expressed with the terms one or many.

One-to-one

A one-to-one relationship is probably the least common relationship. It defines that there can only be one record in table A associated with a specific record in table B. This can be interpreted as a bijective function from a mathematics perspective.

A real-world example for this scenario is the connection between a human and its brain. A human can only have one brain and a brain can only be connected to one human, and this relationship is constant. There’s (at least for now) no possibility for someone to have their brain replaced or duplicated.

Another, more relevant, example is the connection between a user and their user settings. One user has their set of settings, and a set of settings belongs to exactly one user, as seen in figure 2.

In data modeling this relationship is not that commonly used, and is often modeled as an ordinary one-to-many relationship instead. This provides greater flexibility should the requirements for either brains or user settings change in the future.

One-to-many

A one-to-many relationship is very common. It defines that a record in table A can only be associated with a specific record in table B, while a record in table B can have multiple records in table A associated with it.

As an example, when a customer places an order in an online store, they might want to purchase different items at the same time. To avoid requiring one order for each item, you can create an order object holding the details of that particular order (including the customer, the state of the order, the order date, etc.), while each item is represented as a single order line with a reference to the order object.

A one-to-many relationship is illustrated in figure 3 with a data model consisting of Order and Order Line. An order line can be connected to only one order through its property Order. An order can have any number of order lines referencing it.

Many-to-many

The last type of relationship is the many-to-many-relationship. It defines that multiple records in table A can be connected to multiple records in table B.

To illustrate this visually, imagine drawing lines between books and authors. Each book may have lines connecting it to multiple authors, and each author may have lines connecting them to multiple books.

This is illustrated in figure 4 with a data model consisting of Book, Author and Written By. A Book can be written by multiple authors and an Author can write multiple books. This is modeled as Written By which combines these two object classes and represents the relationship between books and authors. This object class can also be extended to contain more metadata about the relationship (e.g. number of pages that a particular author has written of a particular book).

You may find useful if you want practical examples of how to use many-to-many relationships in Appfarm Create.

Primary key

A primary key is a unique identifier for an entity in a table. It is used to ensure that each record in the table can be uniquely identified.

In Appfarm Create, the primary key is stored in the ID.

Foreign key

A foreign key is a field in a table that refers to the primary key of another table. It is used to establish a relationship between the two tables.

In Appfarm Create, the foreign key is represented as an referencing another object class. In the diagram view of the it is visualized with an arrow pointing from the property to the referenced object class (Book and Author in the Written By object class are foreign keys in figure 4). The object that has the property of a referenced object class will store the ID of the referenced object as seen in the Written By table below, which references both the Book object class and the Author object class.

Book

ID [primary key]
Title
Number of Pages

Author

ID [primary key]
First Name
Last Name

Written By

ID [primary key]
Book [foreign key]
Author [foreign key]

Normalization

Normalization is the process of organizing data in a database to minimize redundancy and ensure data integrity. It involves breaking down tables into smaller, more specific tables and establishing relationships between them.

Please read if you want to learn more about the concepts of database normalization.

By understanding these key concepts, you can create a well-structured and organized relational database that accurately represents the data you want to store, allowing you to build elegant and efficient apps.

Summary

In summary, data modeling is the process of organizing and creating a visual representation of data to help make sense of it. A good data model is essential to the success of a project as it helps understand what the data means and how it can be used effectively.

There are different types of data models, such as conceptual, logical, and physical models, used at different stages of the data modeling process. Relational databases store structured data while document databases store unstructured data. Appfarm Create provides a hybrid of the conceptual and logical data model and adds a layer on top of the non-relational document database, providing it with relational capabilities.

Virginia

Woolf

Modernism

1004

2004

The Lord of the Rings

Fantasy

1001

1001

J.R.R.

Tolkien

1002

Stephen

King

1003

Mark

Twain

2001

The Hobbit

Fantasy

1001

2002

The Adventures of Huckleberry Finn

Fiction

1003

2003

6376344c01ed7ce7fb4f1967

Data Modeling 101

1337

6376347f4a2bd917bc5128f2

App

Farmer

637f9382462b2a1d0e6b823a

6376344c01ed7ce7fb4f1967

6376347f4a2bd917bc5128f2

Global Data Model
JSON
Global data model
object class
object class property
data type
object class property
enums
this guide
object class property
object class property
data model designer
this article
Figure 1: Data model of a Lego minifigure’s composition.
Figure 2: One-to-one relationship as a data model.
Figure 3: One-to-many relationship as a data model
Figure 4: Many-to-many relationship as a data model

1004

To the Lighthouse

[
  {
    "id": "1001",
    "firstName": "J.R.R.",
    "lastName": "Tolkien",
    "books": [
      { "id": "2001", "title": "The Hobbit", "genre": "Fantasy" },
      { "id": "2004", "title": "The Lord of the Rings", "genre": "Fantasy" }
    ]
  },
  {
    "id": "1002",
    "firstName": "Stephen",
    "lastName": "King",
    "books": []
  },
  {
    "id": "1003",
    "firstName": "Mark",
    "lastName": "Twain",
    "books": [
      {
        "id": "2002",
        "title": "The Adventures of Huckleberry Finn",
        "genre": "Fiction"
      }
    ]
  },
  {
    "id": "1004",
    "firstName": "Virginia",
    "lastName": "Woolf",
    "books": [
      {
        "id": "2003",
        "title": "To the Lighthouse",
        "genre": "Modernism"
      }
    ]
  }
]