🏖️Relational modeling

Relational modeling from multiple SQL queries into ER diagram.

Read our blog: https://blog.getdatascale.com/automated-relational-modeling-from-multiple-sql-queries-a250e9ac7c2a

How it works

Related: https://getdatascale.com/datamodeling

Typically, data models rely on DDLs to generate ER diagrams. But there's a catch: without primary keys and foreign keys, you’re left with an incomplete view.

That's where we come in.

"You might have to deal with many-to-many relationships or have a complex transactional data model with non-standard naming conventions forcing you to join fields like employee_id to person_id (which you wouldn’t know if the Sr. Engineer hadn’t told you).", SeattleDataGuy

Here's how we take it further:

Instead of just using DDLs, we also analyze SQL joins.

By examining how tables are joined in SQL queries, we can infer relationships and automatically populate the foreign keys and primary keys in your data model.


Examples

Example 1 - SaaS App

While existing data modeling tools typically rely solely on DDLs to create ER diagram, we automatically map out PK & FK keys from SQL JOINs.

Code
WITH tag_engagements AS (
  SELECT 
    t.tag_name,
    pt.photo_id,
    COUNT(l.user_id) AS likes,
    COUNT(c.id) AS comments
  FROM tags t
  JOIN photo_tags pt ON t.id = pt.tag_id
  LEFT JOIN likes l ON pt.photo_id = l.photo_id
  LEFT JOIN comments c ON pt.photo_id = c.photo_id
  GROUP BY t.tag_name, pt.photo_id
)
SELECT
    tag_name,
    AVG(likes + comments) AS avg_engagement,
    COUNT(*) AS hashtag_usage
FROM tag_engagements
GROUP BY tag_name
ORDER BY avg_engagement DESC;

Example 2 - Financial

It works with any non-id fields (e.g., ON t.currency = e.currency AND t.transaction_date = e.date;), as long as the query contains JOIN statements.

Code
CREATE TABLE normalized_transactions AS
SELECT
    t.*,
    CASE 
        WHEN t.currency = 'USD' THEN t.amount
        ELSE t.amount * e.exchange_rate
    END AS amount_usd
FROM clean_transactions t
LEFT JOIN exchange_rates e 
ON t.currency = e.currency AND t.transaction_date = e.date;


CREATE TABLE monthly_financial_statements AS
SELECT
    DATE_TRUNC('month', posting_date) AS month,
    a.account_type,
    SUM(CASE WHEN a.account_type IN ('Asset', 'Expense') THEN daily_net_change ELSE -daily_net_change END) AS net_change,
    SUM(CASE WHEN a.account_type IN ('Asset', 'Expense') THEN running_balance ELSE -running_balance END) AS end_balance
FROM daily_account_balances b
JOIN accounts a ON b.account_id = a.account_id
GROUP BY DATE_TRUNC('month', posting_date), a.account_type;

Example 3 - SaaS App v2

Code
-- Total posts by users
SELECT SUM(user_posts.total_posts_per_user)
FROM (SELECT users.username,COUNT(photos.image_url) AS total_posts_per_user
		FROM users
		JOIN photos ON users.id = photos.user_id
		GROUP BY users.id) AS user_posts;
		
-- User ranking by postings
SELECT users.username, COUNT(photos.image_url)
FROM users
JOIN photos ON users.id = photos.user_id
GROUP BY users.id
ORDER BY 2 DESC;

-- User segments based on their followings and the tags
WITH user_segments AS (
    SELECT 
        u.id AS user_id,
        COUNT(DISTINCT f.followee_id) AS following_count,
        COUNT(DISTINCT pt.tag_id) AS tag_count
    FROM 
        users u
    LEFT JOIN follows f ON u.id = f.follower_id
    LEFT JOIN photos p ON u.id = p.user_id
    LEFT JOIN photo_tags pt ON p.id = pt.photo_id
    GROUP BY u.id
)
SELECT 
    user_id,
    CASE 
        WHEN following_count >= 1000 AND tag_count >= 100 THEN 'Influencers'
        WHEN following_count >= 500 AND tag_count >= 50 THEN 'Active Users'
        WHEN following_count >= 100 AND tag_count >= 30 THEN 'Engaged Users'
        ELSE 'Casual Users'
    END AS user_segment
FROM 
    user_segments;

-- Which hashtags are most effective in driving engagements?
WITH tag_engagements AS (
  SELECT 
    t.tag_name,
    pt.photo_id,
    COUNT(l.user_id) AS likes,
    COUNT(c.id) AS comments
  FROM tags t
  JOIN photo_tags pt ON t.id = pt.tag_id
  LEFT JOIN likes l ON pt.photo_id = l.photo_id
  LEFT JOIN comments c ON pt.photo_id = c.photo_id
  GROUP BY t.tag_name, pt.photo_id
)
SELECT
    tag_name,
    AVG(likes + comments) AS avg_engagement,
    COUNT(*) AS hashtag_usage
FROM tag_engagements
GROUP BY tag_name
ORDER BY avg_engagement DESC;

Last updated