Identifying SQL dependencies from an abundance of nested views and tables.
CREATE TABLE first_table AS ( SELECT a FROM my_table WHERE 1=1);CREATE TABLE second_table AS ( SELECT a as b FROM first_table);
How it works
Data lineage is the process of tracking the flow of data from its origin to its destination. It provides a comprehensive view of your data's lifecycle, helping you understand:
Where your data comes from (sources)
How it's transformed and processed
Where it's stored (destination tables)
How it's used (metrics, BI dashboards, app, etc.)
1. SQL analysis
The first step in mapping data lineage is to analyze SQL code and identify:
Input tables (dependencies)
Output tables and columns
Transformations applied (column-level lineage)
Here's a simple example
CREATEVIEWcustomer_summaryASSELECT c.customer_id, c.name, COUNT(o.order_id) as total_orders, SUM(o.order_total) as lifetime_valueFROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.name;
Table-level may seems basic, but it can give clear pictures when it comes to complex data models (1,000s lines of codes and 100s of SQL scripts). It also provides a simplified version of the DAG without complicating the diagram.
Column-level lineage
Column-level lineage is more advanced; it helps track how individual columns are used and transformed.
Source
Target
customers.customer_id
-> customer_summary.customer_id
customers.name
-> customer_summary.name
COUNT(orders.order_id)
-> customer_summary.total_orders
SUM(orders.order_total)
-> customer_summary.lifetime_value
3. Lineage modeling
The above example showed how the dependencies are extracted from a single SQL file.
Connecting the dots
In actual scenarios, we process all of your SQL scripts all at once.
This approach is designed to work out of the box, meaning you don't have to change your existing pipeline or modify your code to reference tables like {{ ref(..) }}.
The system collects all SQL scripts from your project,
parses each one to identify tables, views, and their relationships,
and then combines this information to create a lineage.
Our goal is to help you save time, reduce errors, and allow you to focus on writing SQL logic while we help maintain the visual representation and documentation.
It offers best value in large projects where manually tracking all dependencies would be impractical.
Examples
These are examples from our lineage playground. In real application, there's a data catalog system where you can navigate the tables and view lineage easily.
Example 1 - Financial
Code
-- Raw transaction data from various systemsCREATE TABLE raw_transactions ( transaction_id BIGINT PRIMARY KEY, transaction_date DATE, posting_date DATE, account_id INT, counterparty_id INT, amount DECIMAL(15,2), currency VARCHAR(3), transaction_type VARCHAR(50), department_id INT, project_id INT, raw_transaction_data JSON);-- First level: Cleaned and standardized transactionsCREATE TABLE clean_transactions ASSELECT transaction_id, transaction_date, posting_date, account_id, counterparty_id, amount, currency, transaction_type, department_id, project_id, raw_transaction_data->>'description'AS transaction_description, raw_transaction_data->>'reference_number'AS reference_numberFROM raw_transactions;-- Second level: Currency conversion to company's base currency (assuming USD)CREATE TABLE normalized_transactions ASSELECT t.*, CASE WHEN t.currency ='USD' THEN t.amount ELSE t.amount * e.exchange_rate END AS amount_usdFROM clean_transactions tLEFT JOIN exchange_rates e ON t.currency = e.currency AND t.transaction_date = e.date;-- Third level: Daily account balancesCREATE TABLE daily_account_balances ASSELECT posting_date, account_id, SUM(amount_usd) AS daily_net_change, SUM(SUM(amount_usd)) OVER (PARTITION BY account_id ORDER BY posting_date) AS running_balanceFROM normalized_transactionsGROUP BY posting_date, account_id;-- Fourth level: Monthly financial statementsCREATE TABLE monthly_financial_statements ASSELECT DATE_TRUNC('month', posting_date) ASmonth, a.account_type, SUM(CASE WHEN a.account_type IN ('Asset', 'Expense') THEN daily_net_change ELSE -daily_net_change END) AS net_change, SUM(CASE WHEN a.account_type IN ('Asset', 'Expense') THEN running_balance ELSE -running_balance END) AS end_balanceFROM daily_account_balances bJOIN accounts a ON b.account_id = a.account_idGROUP BY DATE_TRUNC('month', posting_date), a.account_type;-- Fifth level: Quarterly financial ratiosCREATE TABLE quarterly_financial_ratios ASWITH quarterly_financials AS ( SELECT DATE_TRUNC('quarter', month) ASquarter, account_type, SUM(net_change) AS quarterly_change, SUM(end_balance) AS quarter_end_balance FROM monthly_financial_statements GROUP BY DATE_TRUNC('quarter', month), account_type)SELECTquarter, SUM(CASE WHEN account_type ='Asset' THEN quarter_end_balance ELSE 0 END) AS total_assets, SUM(CASE WHEN account_type ='Liability' THEN quarter_end_balance ELSE 0 END) AS total_liabilities, SUM(CASE WHEN account_type ='Equity' THEN quarter_end_balance ELSE 0 END) AS total_equity, SUM(CASE WHEN account_type ='Revenue' THEN quarterly_change ELSE 0 END) AS total_revenue, SUM(CASE WHEN account_type ='Expense' THEN quarterly_change ELSE 0 END) AS total_expenses, SUM(CASE WHEN account_type ='Revenue' THEN quarterly_change ELSE 0 END) - SUM(CASE WHEN account_type ='Expense' THEN quarterly_change ELSE 0 END) AS net_income, (SUM(CASE WHEN account_type ='Revenue' THEN quarterly_change ELSE 0 END) - SUM(CASE WHEN account_type ='Expense' THEN quarterly_change ELSE 0 END)) / NULLIF(SUM(CASE WHEN account_type ='Revenue' THEN quarterly_change ELSE 0 END), 0) AS profit_margin, SUM(CASE WHEN account_type ='Asset' THEN quarter_end_balance ELSE 0 END) / NULLIF(SUM(CASE WHEN account_type ='Liability' THEN quarter_end_balance ELSE 0 END), 0) AS current_ratioFROM quarterly_financialsGROUP BY quarter;
Example 2 - Streaming service
Code
-- User informationCREATE TABLE users ( user_id INT PRIMARY KEY, username VARCHAR(50), email VARCHAR(100), created_at TIMESTAMP, last_login TIMESTAMP, is_active BOOLEAN);-- User profile detailsCREATE TABLE user_profiles ( profile_id INT PRIMARY KEY, user_id INT REFERENCES users(user_id), first_name VARCHAR(50), last_name VARCHAR(50), date_of_birth DATE, country VARCHAR(50), preferred_language VARCHAR(20));-- Content catalogCREATE TABLE content ( content_id INT PRIMARY KEY, title VARCHAR(100),descriptionTEXT, genre VARCHAR(50), release_date DATE, duration_minutes INT, rating DECIMAL(3,1));-- User viewing historyCREATE TABLE viewing_history ( view_id INT PRIMARY KEY, user_id INT REFERENCES users(user_id), content_id INT REFERENCES content(content_id), view_date TIMESTAMP, duration_watched INT, completed BOOLEAN);-- User subscriptionsCREATE TABLE subscriptions ( subscription_id INT PRIMARY KEY, user_id INT REFERENCES users(user_id), plan_name VARCHAR(50),start_date DATE, end_date DATE, is_active BOOLEAN);-- User reviews and ratingsCREATE TABLE reviews ( review_id INT PRIMARY KEY, user_id INT REFERENCES users(user_id), content_id INT REFERENCES content(content_id), rating INT CHECK (rating BETWEEN 1 AND 5), review_text TEXT, created_at TIMESTAMP);-- View of active usersCREATEVIEWactive_usersASSELECT user_id, username, email, last_loginFROM usersWHERE is_active = TRUE;-- Materialized view of user engagement metricsCREATE MATERIALIZED VIEW user_engagement_metrics ASSELECT u.user_id, u.username, COUNT(DISTINCT vh.content_id) AS unique_content_watched, SUM(vh.duration_watched) AS total_watch_time, AVG(vh.duration_watched) AS avg_watch_time, COUNT(CASE WHEN vh.completed THEN 1 END) AS completed_viewsFROM users uLEFT JOIN viewing_history vh ON u.user_id = vh.user_idGROUP BY u.user_id, u.username;-- Staging table for content performanceCREATE TABLE stg_content_performance ASSELECT c.content_id, c.title, c.genre, COUNT(DISTINCT vh.user_id) AS unique_viewers, SUM(vh.duration_watched) AS total_watch_time, AVG(r.rating) AS avg_ratingFROM content cLEFT JOIN viewing_history vh ON c.content_id = vh.content_idLEFT JOIN reviews r ON c.content_id = r.content_idGROUP BY c.content_id, c.title, c.genre;-- Mart table for genre performanceCREATE TABLE mart_genre_performance ASSELECT genre, COUNT(DISTINCT content_id) AS content_count, SUM(unique_viewers) AS total_unique_viewers, SUM(total_watch_time) AS total_watch_time, AVG(avg_rating) AS avg_genre_ratingFROM stg_content_performanceGROUP BY genre;-- Mart table for user segmentsCREATE TABLE mart_user_segments ASSELECT u.user_id, u.username, uem.unique_content_watched, uem.total_watch_time, s.plan_name, up.country, CASE WHEN uem.total_watch_time >1000 AND s.plan_name ='Premium' THEN 'High Value' WHEN uem.unique_content_watched >50 THEN 'Engaged' WHEN s.is_active = FALSE THEN 'Churned' ELSE 'Standard' END AS user_segmentFROM users uJOIN user_engagement_metrics uem ON u.user_id = uem.user_idLEFT JOIN subscriptions s ON u.user_id = s.user_id AND s.is_active = TRUELEFT JOIN user_profiles up ON u.user_id = up.user_id;-- Final reporting table combining user and content insightsCREATE TABLE report_content_user_insights ASSELECT mus.user_segment, cp.genre, COUNT(DISTINCT vh.user_id) AS unique_viewers, SUM(vh.duration_watched) AS total_watch_time, AVG(r.rating) AS avg_ratingFROM mart_user_segments musJOIN viewing_history vh ON mus.user_id = vh.user_idJOIN stg_content_performance cp ON vh.content_id = cp.content_idLEFT JOIN reviews r ON vh.user_id = r.user_id AND vh.content_id = r.content_idGROUP BY mus.user_segment, cp.genre;
Example 3 - Supply chain
Code
-- Raw material inventoryCREATE TABLE raw_material_inventory ( material_id INT PRIMARY KEY, material_name VARCHAR(100), quantity_on_hand DECIMAL(10,2), unit_of_measure VARCHAR(20), last_updated TIMESTAMP);-- Supplier informationCREATE TABLE suppliers ( supplier_id INT PRIMARY KEY, supplier_name VARCHAR(100), contact_person VARCHAR(100), email VARCHAR(100), phone VARCHAR(20));-- Purchase ordersCREATE TABLE purchase_orders ( po_id INT PRIMARY KEY, supplier_id INT REFERENCES suppliers(supplier_id), order_date DATE, expected_delivery_date DATE, total_amount DECIMAL(12,2));-- Purchase order detailsCREATE TABLE po_details ( po_detail_id INT PRIMARY KEY, po_id INT REFERENCES purchase_orders(po_id), material_id INT REFERENCES raw_material_inventory(material_id), quantity DECIMAL(10,2), unit_price DECIMAL(10,2));-- First level: Aggregate purchase order dataCREATE TABLE aggregate_po_data ASSELECT p.po_id, p.supplier_id, s.supplier_name, p.order_date, p.expected_delivery_date, COUNT(pd.material_id) AS num_materials, SUM(pd.quantity) AS total_quantity, SUM(pd.quantity * pd.unit_price) AS total_amountFROM purchase_orders pJOIN po_details pd ON p.po_id = pd.po_idJOIN suppliers s ON p.supplier_id = s.supplier_idGROUP BY p.po_id, p.supplier_id, s.supplier_name, p.order_date, p.expected_delivery_date;-- Second level: Supplier performance metricsCREATE TABLE supplier_performance ASSELECT supplier_id, supplier_name, COUNT(po_id) AS total_orders, AVG(total_amount) AS avg_order_value, AVG(EXTRACT(DAY FROM (expected_delivery_date - order_date))) AS avg_lead_time, SUM(total_amount) AS total_spendFROM aggregate_po_dataGROUP BY supplier_id, supplier_name;-- Third level: Material procurement analysisCREATE TABLE material_procurement_analysis ASSELECT rmi.material_id, rmi.material_name, COUNT(DISTINCT pd.po_id) AS number_of_orders, SUM(pd.quantity) AS total_ordered_quantity, AVG(pd.unit_price) AS average_unit_price, SUM(pd.quantity * pd.unit_price) AS total_spendFROM raw_material_inventory rmiJOIN po_details pd ON rmi.material_id = pd.material_idGROUP BY rmi.material_id, rmi.material_name;-- Fourth level: Inventory turnover analysisCREATE TABLE inventory_turnover_analysis ASSELECT mpa.material_id, mpa.material_name, mpa.total_ordered_quantity, rmi.quantity_on_hand, mpa.total_ordered_quantity / NULLIF(rmi.quantity_on_hand, 0) AS turnover_ratio, CASE WHEN mpa.total_ordered_quantity / NULLIF(rmi.quantity_on_hand, 0) >3 THEN 'High' WHEN mpa.total_ordered_quantity / NULLIF(rmi.quantity_on_hand, 0) >1 THEN 'Medium' ELSE 'Low' END AS turnover_categoryFROM material_procurement_analysis mpaJOIN raw_material_inventory rmi ON mpa.material_id = rmi.material_id;-- Fifth level: Supplier-Material relationship analysisCREATE TABLE supplier_material_relationship ASSELECT s.supplier_id, s.supplier_name, rmi.material_id, rmi.material_name, COUNT(DISTINCT po.po_id) AS order_count, SUM(pd.quantity) AS total_quantity_ordered, AVG(pd.unit_price) AS avg_unit_price, SUM(pd.quantity * pd.unit_price) AS total_spendFROM suppliers sJOIN purchase_orders po ON s.supplier_id = po.supplier_idJOIN po_details pd ON po.po_id = pd.po_idJOIN raw_material_inventory rmi ON pd.material_id = rmi.material_idGROUP BY s.supplier_id, s.supplier_name, rmi.material_id, rmi.material_name;-- Sixth level: Supply chain risk assessmentCREATE TABLE supply_chain_risk_assessment ASWITH supplier_diversity AS ( SELECT material_id, COUNT(DISTINCT supplier_id) AS supplier_count FROM supplier_material_relationship GROUP BY material_id)SELECT smr.material_id, smr.material_name, smr.supplier_id, smr.supplier_name, smr.total_quantity_ordered, ita.turnover_category, sd.supplier_count, CASE WHEN sd.supplier_count =1 THEN 'High Risk' WHEN sd.supplier_count =2 THEN 'Medium Risk' ELSE 'Low Risk' END AS supplier_diversity_risk, CASE WHEN ita.turnover_category ='High' AND sd.supplier_count =1 THEN 'Critical' WHEN ita.turnover_category ='High'OR sd.supplier_count =1 THEN 'High' WHEN ita.turnover_category ='Medium'OR sd.supplier_count =2 THEN 'Medium' ELSE 'Low' END AS overall_risk_assessmentFROM supplier_material_relationship smrJOIN inventory_turnover_analysis ita ON smr.material_id = ita.material_idJOIN supplier_diversity sd ON smr.material_id = sd.material_id;-- Final level: Strategic procurement insightsCREATE TABLE strategic_procurement_insights ASSELECT scra.material_id, scra.material_name, scra.overall_risk_assessment, COUNT(DISTINCT scra.supplier_id) AS supplier_count, AVG(sp.avg_lead_time) AS avg_lead_time, SUM(smr.total_spend) AS total_spend, MAX(ita.turnover_ratio) AS max_turnover_ratio, STRING_AGG(DISTINCT scra.supplier_name, ', ') AS suppliersFROM supply_chain_risk_assessment scraJOIN supplier_performance sp ON scra.supplier_id = sp.supplier_idJOIN supplier_material_relationship smr ON scra.material_id = smr.material_id AND scra.supplier_id = smr.supplier_idJOIN inventory_turnover_analysis ita ON scra.material_id = ita.material_idGROUP BY scra.material_id, scra.material_name, scra.overall_risk_assessmentORDER BY total_spend DESC, overall_risk_assessment DESC;