Identifying SQL dependencies from an abundance of nested views and tables.
CREATETABLEfirst_tableAS (SELECT a FROM my_tableWHERE1=1);CREATETABLEsecond_tableAS (SELECT a as b FROM first_table);
How it works
Data lineage is the process of tracking the flow of data from its origin to its destination. It provides a comprehensive view of your data's lifecycle, helping you understand:
Where your data comes from (sources)
How it's transformed and processed
Where it's stored (destination tables)
How it's used (metrics, BI dashboards, app, etc.)
1. SQL analysis
The first step in mapping data lineage is to analyze SQL code and identify:
Input tables (dependencies)
Output tables and columns
Transformations applied (column-level lineage)
Here's a simple example
CREATEVIEWcustomer_summaryASSELECT c.customer_id, c.name,COUNT(o.order_id) as total_orders,SUM(o.order_total) as lifetime_valueFROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.name;
Table-level may seems basic, but it can give clear pictures when it comes to complex data models (1,000s lines of codes and 100s of SQL scripts). It also provides a simplified version of the DAG without complicating the diagram.
Column-level lineage
Column-level lineage is more advanced; it helps track how individual columns are used and transformed.
3. Lineage modeling
The above example showed how the dependencies are extracted from a single SQL file.
Connecting the dots
In actual scenarios, we process all of your SQL scripts all at once.
This approach is designed to work out of the box, meaning you don't have to change your existing pipeline or modify your code to reference tables like {{ ref(..) }}.
The system collects all SQL scripts from your project,
parses each one to identify tables, views, and their relationships,
and then combines this information to create a lineage.
Our goal is to help you save time, reduce errors, and allow you to focus on writing SQL logic while we help maintain the visual representation and documentation.
It offers best value in large projects where manually tracking all dependencies would be impractical.
Examples
These are examples from our lineage playground. In real application, there's a data catalog system where you can navigate the tables and view lineage easily.
Example 1 - Financial
Code
-- Raw transaction data from various systemsCREATETABLEraw_transactions ( transaction_id BIGINTPRIMARY KEY, transaction_date DATE, posting_date DATE, account_id INT, counterparty_id INT, amount DECIMAL(15,2), currency VARCHAR(3), transaction_type VARCHAR(50), department_id INT, project_id INT, raw_transaction_data JSON);-- First level: Cleaned and standardized transactionsCREATETABLEclean_transactionsASSELECT transaction_id, transaction_date, posting_date, account_id, counterparty_id, amount, currency, transaction_type, department_id, project_id, raw_transaction_data->>'description'AS transaction_description, raw_transaction_data->>'reference_number'AS reference_numberFROM raw_transactions;-- Second level: Currency conversion to company's base currency (assuming USD)CREATETABLEnormalized_transactionsASSELECT t.*,CASEWHEN t.currency ='USD'THEN t.amountELSE t.amount * e.exchange_rateENDAS amount_usdFROM clean_transactions tLEFT JOIN exchange_rates e ON t.currency = e.currency AND t.transaction_date = e.date;-- Third level: Daily account balancesCREATETABLEdaily_account_balancesASSELECT posting_date, account_id,SUM(amount_usd) AS daily_net_change,SUM(SUM(amount_usd)) OVER (PARTITIONBY account_id ORDER BY posting_date) AS running_balanceFROM normalized_transactionsGROUP BY posting_date, account_id;-- Fourth level: Monthly financial statementsCREATETABLEmonthly_financial_statementsASSELECT DATE_TRUNC('month', posting_date) ASmonth, a.account_type, SUM(CASE WHEN a.account_type IN ('Asset', 'Expense') THEN daily_net_change ELSE -daily_net_change END) AS net_change,
SUM(CASEWHEN a.account_type IN ('Asset', 'Expense') THEN running_balance ELSE-running_balance END) AS end_balanceFROM daily_account_balances bJOIN accounts a ON b.account_id = a.account_idGROUP BY DATE_TRUNC('month', posting_date), a.account_type;-- Fifth level: Quarterly financial ratiosCREATETABLEquarterly_financial_ratiosASWITH quarterly_financials AS (SELECT DATE_TRUNC('quarter', month) ASquarter, account_type,SUM(net_change) AS quarterly_change,SUM(end_balance) AS quarter_end_balanceFROM monthly_financial_statementsGROUP BY DATE_TRUNC('quarter', month), account_type)SELECTquarter,SUM(CASEWHEN account_type ='Asset'THEN quarter_end_balance ELSE0END) AS total_assets,SUM(CASEWHEN account_type ='Liability'THEN quarter_end_balance ELSE0END) AS total_liabilities,SUM(CASEWHEN account_type ='Equity'THEN quarter_end_balance ELSE0END) AS total_equity,SUM(CASEWHEN account_type ='Revenue'THEN quarterly_change ELSE0END) AS total_revenue,SUM(CASEWHEN account_type ='Expense'THEN quarterly_change ELSE0END) AS total_expenses,SUM(CASEWHEN account_type ='Revenue'THEN quarterly_change ELSE0END) -SUM(CASEWHEN account_type ='Expense'THEN quarterly_change ELSE0END) AS net_income, (SUM(CASEWHEN account_type ='Revenue'THEN quarterly_change ELSE0END) -SUM(CASEWHEN account_type ='Expense'THEN quarterly_change ELSE0END)) /NULLIF(SUM(CASEWHEN account_type ='Revenue'THEN quarterly_change ELSE0END), 0) AS profit_margin,SUM(CASEWHEN account_type ='Asset'THEN quarter_end_balance ELSE0END) /NULLIF(SUM(CASEWHEN account_type ='Liability'THEN quarter_end_balance ELSE0END), 0) AS current_ratioFROM quarterly_financialsGROUP BYquarter;
Example 2 - Streaming service
Code
-- User informationCREATETABLEusers ( user_id INTPRIMARY KEY, username VARCHAR(50), email VARCHAR(100), created_at TIMESTAMP, last_login TIMESTAMP, is_active BOOLEAN);-- User profile detailsCREATETABLEuser_profiles ( profile_id INTPRIMARY KEY, user_id INTREFERENCES users(user_id), first_name VARCHAR(50), last_name VARCHAR(50), date_of_birth DATE, country VARCHAR(50), preferred_language VARCHAR(20));-- Content catalogCREATETABLEcontent ( content_id INTPRIMARY KEY, title VARCHAR(100),descriptionTEXT, genre VARCHAR(50), release_date DATE, duration_minutes INT, rating DECIMAL(3,1));-- User viewing historyCREATETABLEviewing_history ( view_id INTPRIMARY KEY, user_id INTREFERENCES users(user_id), content_id INTREFERENCES content(content_id), view_date TIMESTAMP, duration_watched INT, completed BOOLEAN);-- User subscriptionsCREATETABLEsubscriptions ( subscription_id INTPRIMARY KEY, user_id INTREFERENCES users(user_id), plan_name VARCHAR(50),start_dateDATE, end_date DATE, is_active BOOLEAN);-- User reviews and ratingsCREATETABLEreviews ( review_id INTPRIMARY KEY, user_id INTREFERENCES users(user_id), content_id INTREFERENCES content(content_id), rating INTCHECK (rating BETWEEN1AND5), review_text TEXT, created_at TIMESTAMP);-- View of active usersCREATEVIEWactive_usersASSELECT user_id, username, email, last_loginFROM usersWHERE is_active = TRUE;-- Materialized view of user engagement metricsCREATE MATERIALIZED VIEW user_engagement_metrics ASSELECT u.user_id, u.username,COUNT(DISTINCT vh.content_id) AS unique_content_watched,SUM(vh.duration_watched) AS total_watch_time,AVG(vh.duration_watched) AS avg_watch_time,COUNT(CASEWHEN vh.completed THEN1END) AS completed_viewsFROM users uLEFT JOIN viewing_history vh ON u.user_id = vh.user_idGROUP BY u.user_id, u.username;-- Staging table for content performanceCREATETABLEstg_content_performanceASSELECT c.content_id, c.title, c.genre,COUNT(DISTINCT vh.user_id) AS unique_viewers,SUM(vh.duration_watched) AS total_watch_time,AVG(r.rating) AS avg_ratingFROM content cLEFT JOIN viewing_history vh ON c.content_id = vh.content_idLEFT JOIN reviews r ON c.content_id = r.content_idGROUP BY c.content_id, c.title, c.genre;-- Mart table for genre performanceCREATETABLEmart_genre_performanceASSELECT genre,COUNT(DISTINCT content_id) AS content_count,SUM(unique_viewers) AS total_unique_viewers,SUM(total_watch_time) AS total_watch_time,AVG(avg_rating) AS avg_genre_ratingFROM stg_content_performanceGROUP BY genre;-- Mart table for user segmentsCREATETABLEmart_user_segmentsASSELECT u.user_id, u.username, uem.unique_content_watched, uem.total_watch_time, s.plan_name, up.country,CASEWHEN uem.total_watch_time >1000AND s.plan_name ='Premium'THEN'High Value'WHEN uem.unique_content_watched >50THEN'Engaged'WHEN s.is_active = FALSE THEN'Churned'ELSE'Standard'ENDAS user_segmentFROM users uJOIN user_engagement_metrics uem ON u.user_id = uem.user_idLEFT JOIN subscriptions s ON u.user_id = s.user_id AND s.is_active = TRUELEFT JOIN user_profiles up ON u.user_id = up.user_id;-- Final reporting table combining user and content insightsCREATETABLEreport_content_user_insightsASSELECT mus.user_segment, cp.genre,COUNT(DISTINCT vh.user_id) AS unique_viewers,SUM(vh.duration_watched) AS total_watch_time,AVG(r.rating) AS avg_ratingFROM mart_user_segments musJOIN viewing_history vh ON mus.user_id = vh.user_idJOIN stg_content_performance cp ON vh.content_id = cp.content_idLEFT JOIN reviews r ON vh.user_id = r.user_id AND vh.content_id = r.content_idGROUP BY mus.user_segment, cp.genre;
Example 3 - Supply chain
Code
-- Raw material inventoryCREATETABLEraw_material_inventory ( material_id INTPRIMARY KEY, material_name VARCHAR(100), quantity_on_hand DECIMAL(10,2), unit_of_measure VARCHAR(20), last_updated TIMESTAMP);-- Supplier informationCREATETABLEsuppliers ( supplier_id INTPRIMARY KEY, supplier_name VARCHAR(100), contact_person VARCHAR(100), email VARCHAR(100), phone VARCHAR(20));-- Purchase ordersCREATETABLEpurchase_orders ( po_id INTPRIMARY KEY, supplier_id INTREFERENCES suppliers(supplier_id), order_date DATE, expected_delivery_date DATE, total_amount DECIMAL(12,2));-- Purchase order detailsCREATETABLEpo_details ( po_detail_id INTPRIMARY KEY, po_id INTREFERENCES purchase_orders(po_id), material_id INTREFERENCES raw_material_inventory(material_id), quantity DECIMAL(10,2), unit_price DECIMAL(10,2));-- First level: Aggregate purchase order dataCREATETABLEaggregate_po_dataASSELECT p.po_id, p.supplier_id, s.supplier_name, p.order_date, p.expected_delivery_date,COUNT(pd.material_id) AS num_materials,SUM(pd.quantity) AS total_quantity,SUM(pd.quantity * pd.unit_price) AS total_amountFROM purchase_orders pJOIN po_details pd ON p.po_id = pd.po_idJOIN suppliers s ON p.supplier_id = s.supplier_idGROUP BY p.po_id, p.supplier_id, s.supplier_name, p.order_date, p.expected_delivery_date;-- Second level: Supplier performance metricsCREATETABLEsupplier_performanceASSELECT supplier_id, supplier_name,COUNT(po_id) AS total_orders,AVG(total_amount) AS avg_order_value,AVG(EXTRACT(DAYFROM (expected_delivery_date - order_date))) AS avg_lead_time,SUM(total_amount) AS total_spendFROM aggregate_po_dataGROUP BY supplier_id, supplier_name;-- Third level: Material procurement analysisCREATETABLEmaterial_procurement_analysisASSELECT rmi.material_id, rmi.material_name,COUNT(DISTINCT pd.po_id) AS number_of_orders,SUM(pd.quantity) AS total_ordered_quantity,AVG(pd.unit_price) AS average_unit_price,SUM(pd.quantity * pd.unit_price) AS total_spendFROM raw_material_inventory rmiJOIN po_details pd ON rmi.material_id = pd.material_idGROUP BY rmi.material_id, rmi.material_name;-- Fourth level: Inventory turnover analysisCREATETABLEinventory_turnover_analysisASSELECT mpa.material_id, mpa.material_name, mpa.total_ordered_quantity, rmi.quantity_on_hand, mpa.total_ordered_quantity /NULLIF(rmi.quantity_on_hand, 0) AS turnover_ratio,CASEWHEN mpa.total_ordered_quantity /NULLIF(rmi.quantity_on_hand, 0) >3THEN'High'WHEN mpa.total_ordered_quantity /NULLIF(rmi.quantity_on_hand, 0) >1THEN'Medium'ELSE'Low'ENDAS turnover_categoryFROM material_procurement_analysis mpaJOIN raw_material_inventory rmi ON mpa.material_id = rmi.material_id;-- Fifth level: Supplier-Material relationship analysisCREATETABLEsupplier_material_relationshipASSELECT s.supplier_id, s.supplier_name, rmi.material_id, rmi.material_name,COUNT(DISTINCT po.po_id) AS order_count,SUM(pd.quantity) AS total_quantity_ordered,AVG(pd.unit_price) AS avg_unit_price,SUM(pd.quantity * pd.unit_price) AS total_spendFROM suppliers sJOIN purchase_orders po ON s.supplier_id = po.supplier_idJOIN po_details pd ON po.po_id = pd.po_idJOIN raw_material_inventory rmi ON pd.material_id = rmi.material_idGROUP BY s.supplier_id, s.supplier_name, rmi.material_id, rmi.material_name;-- Sixth level: Supply chain risk assessmentCREATETABLEsupply_chain_risk_assessmentASWITH supplier_diversity AS (SELECT material_id,COUNT(DISTINCT supplier_id) AS supplier_countFROM supplier_material_relationshipGROUP BY material_id)SELECT smr.material_id, smr.material_name, smr.supplier_id, smr.supplier_name, smr.total_quantity_ordered, ita.turnover_category, sd.supplier_count,CASEWHEN sd.supplier_count =1THEN'High Risk'WHEN sd.supplier_count =2THEN'Medium Risk'ELSE'Low Risk'ENDAS supplier_diversity_risk,CASEWHEN ita.turnover_category ='High'AND sd.supplier_count =1THEN'Critical'WHEN ita.turnover_category ='High'OR sd.supplier_count =1THEN'High'WHEN ita.turnover_category ='Medium'OR sd.supplier_count =2THEN'Medium'ELSE'Low'ENDAS overall_risk_assessmentFROM supplier_material_relationship smrJOIN inventory_turnover_analysis ita ON smr.material_id = ita.material_idJOIN supplier_diversity sd ON smr.material_id = sd.material_id;-- Final level: Strategic procurement insightsCREATETABLEstrategic_procurement_insightsASSELECT scra.material_id, scra.material_name, scra.overall_risk_assessment,COUNT(DISTINCT scra.supplier_id) AS supplier_count,AVG(sp.avg_lead_time) AS avg_lead_time,SUM(smr.total_spend) AS total_spend,MAX(ita.turnover_ratio) AS max_turnover_ratio,STRING_AGG(DISTINCT scra.supplier_name, ', ') AS suppliersFROM supply_chain_risk_assessment scraJOIN supplier_performance sp ON scra.supplier_id = sp.supplier_idJOIN supplier_material_relationship smr ON scra.material_id = smr.material_id AND scra.supplier_id = smr.supplier_idJOIN inventory_turnover_analysis ita ON scra.material_id = ita.material_idGROUP BY scra.material_id, scra.material_name, scra.overall_risk_assessmentORDER BY total_spend DESC, overall_risk_assessment DESC;