Last updated
Last updated
Lineage modeling for complex SQL structures.
Identifying SQL dependencies from an abundance of nested views and tables.
Data lineage is the process of tracking the flow of data from its origin to its destination. It provides a comprehensive view of your data's lifecycle, helping you understand:
Where your data comes from (sources)
How it's transformed and processed
Where it's stored (destination tables)
How it's used (metrics, BI dashboards, app, etc.)
The first step in mapping data lineage is to analyze SQL code and identify:
Input tables (dependencies)
Output tables and columns
Transformations applied (column-level lineage)
Here's a simple example
In this query, we can identify:
Source tables: customers
, orders
Target view: customer_summary
Transformations: COUNT, SUM, GROUP BY (optional)
Table-level may seems basic, but it can give clear pictures when it comes to complex data models (1,000s lines of codes and 100s of SQL scripts). It also provides a simplified version of the DAG without complicating the diagram.
Column-level lineage is more advanced; it helps track how individual columns are used and transformed.
The above example showed how the dependencies are extracted from a single SQL file.
Connecting the dots
In actual scenarios, we process all of your SQL scripts all at once.
This approach is designed to work out of the box, meaning you don't have to change your existing pipeline or modify your code to reference tables like {{ ref(..) }}.
The system collects all SQL scripts from your project,
parses each one to identify tables, views, and their relationships,
and then combines this information to create a lineage.
Our goal is to help you save time, reduce errors, and allow you to focus on writing SQL logic while we help maintain the visual representation and documentation.
It offers best value in large projects where manually tracking all dependencies would be impractical.
These are examples from our lineage playground. In real application, there's a data catalog system where you can navigate the tables and view lineage easily.
Fun fact: learn how we display a lineage
customers.customer_id
-> customer_summary.customer_id
customers.name
-> customer_summary.name
COUNT(orders.order_id
)
-> customer_summary.total_orders
SUM(orders.order_total
)
-> customer_summary.lifetime_value