Data Modeling
Last updated Dec 31, 2023
Table of Contents
- Different Levels
- (Design) Patterns
- Data Modeling is changing
- Tools
- Frameworks
- Difference to Dimensional Modeling
- Data Modeling part of Data Engineering?
Data modeling has changed over time; when I started (~20 years ago), choosing between Inmon and Kimball was common.
Today, in the context of data engineering, data modeling creates a structured representation of your organization’s data. Often illustrated visually, this representation helps understand the relationships, constraints, and patterns within the data and serves as a blueprint for gaining business value in designing data systems, such as data warehouses, lakes, or any analytics solution.
In its most straightforward form, data modeling is how we design the flow of our data such that it flows as efficiently and in a structured way, with good data quality and as little redundancy as possible.
Data Modeling is as much about Data Engineering Architecture as it is about modeling the data only. Therefore besides the below links, many approaches and common architecture you can find in Data Engineering Architecture.
It’s getting more about language than really modeling, Shane Gibson says on Making Data Modeling Accessible. For example, a Data Scientist speaks Wide Tables, a Data engineer talks about facts and dimensions, etc., it’s what I call the different levels of data modeling in Data Modeling – The Unsung Hero of Data Engineering- An Introduction to Data Modeling (Part 1).
[
# Different Levels
](https://www.ssp.sh/brain/data-modeling/#different-levels)
How do you think about different levels of modeling? Generally, when I started (20 years ago) it was common to choose between Inmon and Kimball. But today, there are so many layers, levels, and approaches. Did you find a good way of separating or naming the different “levels” (still not sure about levels) to make it clear what is meant? Below I collected a list of what I think so far (I also wrote extensively about, in case of interest).
- Levels of Modeling
- Generation or source database design
- Data integration
- ETL processes
- Data warehouse schema creation
- Data lake structuring
- BI tool presentation layer design
- Machine learning or AI feature engineering
- Data Modeling Approaches
- Conceptual, Logical to physical Data Models
- Other lesser known: Hierarchical Data Modeling, Network Data Modeling and Object-Role Modeling
- Data Modeling Techniques
- Data Architecture Pattern
- General Purpose Data Architecture Pattern
- Staging, Cleansing, Core, Data Mart (Classical Architecture of Data Warehouse) or Medallion Architecture
- Specialized
- Batch vs. Streaming (Streaming vs Batch in Orchestration)
- Data Lake/Lakehouse vs. Data Warehouse Pattern
- Semantic Layer (In-memory vs. Persistence or Semantic vs. Transformation Layer)
- Modern Data Stack / Open Data Stack Pattern
- many more: Data Modeling- The Unsung Hero of Data Engineering- Data Architecture Pattern, Tools and The Future (Part 3)
- General Purpose Data Architecture Pattern
LinkedIn Post and Discussion and dbt Slack. Links (from post): Data Model Matrix.
[
# Different Data Modeling Techniques
](https://www.ssp.sh/brain/data-modeling/#different-data-modeling-techniques)
Nice illustration how different modeling techniques work | Source: GitHub - Data-Engineer-Camp/dbt-dimensional-modelling: Step-by-step tutorial on building a Kimball dimensional model with dbt
Or other data modeling techniques ( my Tweet)
- Enterprise Data Warehouse (Inmon)
- Star Schema (Kimball)
- Data Vault
- One Big Table (OBT)
Source: Data Modeling in the Modern Data Stack | Towards Dev
[
# (Design) Patterns
](https://www.ssp.sh/brain/data-modeling/#design-patterns)
Common approaches are well explained here:
- Dimensional Modeling
- Relational Model
- Graph Data Modeling ?
others
- streaming vs batch processing
- RW Data Pipeline Design Patterns - 1. Data Flow Patterns · Start Data Engineering (Nice Visual)
[
# Data Modeling is changing
](https://www.ssp.sh/brain/data-modeling/#data-modeling-is-changing)
See Data Modeling is changing.
[
# Tools
](https://www.ssp.sh/brain/data-modeling/#tools)
See Data Modeling Tools or Data Modeling- The Unsung Hero of Data Engineering- Data Architecture Pattern, Tools and The Future (Part 3) .
[
# Frameworks
](https://www.ssp.sh/brain/data-modeling/#frameworks)
- BEAM from Agile Data Warehouse Design (Lawrence Corr, Jim Stagnitto)
- ADAPT for OLAP cubes
- …
# Difference to Dimensional Modeling
There is more than dimensional modeling:
- hierarchies, semistructured sources, conformed dimensions, historical updates, and the logic used to keep them up to date
- Source: Serge Gershkovich on LinkedIn
[
# Data Modeling part of Data Engineering?
](https://www.ssp.sh/brain/data-modeling/#data-modeling-part-of-data-engineering)
Data modeling, incredibly Dimensional Modeling with defining facts and dimensions, is a big thing for a data engineer, IMO. It would help if you asked vital questions to optimize for data consumers. Do you want to drill down the different products? Daily or monthly enough —keywords granularity and rollup.
It also lets you think about Big-O implications regarding how often you touch and transfer data. I’d recommend the old Data Warehouse Toolkit from Ralph Kimball, which initiated many of these concepts and is still applicable today. Mostly it’s not done in the beginning, but as soon as you get bigger, you wish you had done more :)
Links: