⛵Lineage modeling
Lineage modeling for complex SQL structures.
Identifying SQL dependencies from an abundance of nested views and tables.
How it works
Data lineage is the process of tracking the flow of data from its origin to its destination. It provides a comprehensive view of your data's lifecycle, helping you understand:
Where your data comes from (sources)
How it's transformed and processed
Where it's stored (destination tables)
How it's used (metrics, BI dashboards, app, etc.)
1. SQL analysis
The first step in mapping data lineage is to analyze SQL code and identify:
Input tables (dependencies)
Output tables and columns
Transformations applied (column-level lineage)
Here's a simple example
In this query, we can identify:
Source tables:
customers
,orders
Target view:
customer_summary
Transformations: COUNT, SUM, GROUP BY (optional)
2. Dependency mapping
Table-level lineage
Table-level may seems basic, but it can give clear pictures when it comes to complex data models (1,000s lines of codes and 100s of SQL scripts). It also provides a simplified version of the DAG without complicating the diagram.
Column-level lineage
Column-level lineage is more advanced; it helps track how individual columns are used and transformed.
Source | Target |
---|---|
| -> |
| -> |
COUNT( | -> |
SUM( | -> |
3. Lineage modeling
The above example showed how the dependencies are extracted from a single SQL file.
Connecting the dots
In actual scenarios, we process all of your SQL scripts all at once.
This approach is designed to work out of the box, meaning you don't have to change your existing pipeline or modify your code to reference tables like {{ ref(..) }}.
The system collects all SQL scripts from your project,
parses each one to identify tables, views, and their relationships,
and then combines this information to create a lineage.
Our goal is to help you save time, reduce errors, and allow you to focus on writing SQL logic while we help maintain the visual representation and documentation.
It offers best value in large projects where manually tracking all dependencies would be impractical.
Fun fact: learn how we display a lineage Auto rearrange
Examples
These are examples from our lineage playground. In real application, there's a data catalog system where you can navigate the tables and view lineage easily.
Example 1 - Financial
Example 2 - Streaming service
Example 3 - Supply chain
Last updated