Typically, data models rely on DDLs to generate ER diagrams. But there's a catch: without primary keys and foreign keys, you’re left with an incomplete view.
That's where we come in.
"You might have to deal with many-to-many relationships or have a complex transactional data model with non-standard naming conventions forcing you to join fields like employee_id to person_id (which you wouldn’t know if the Sr. Engineer hadn’t told you).", SeattleDataGuy
Here's how we take it further:
Instead of just using DDLs, we also analyze SQL joins.
By examining how tables are joined in SQL queries, we can infer relationships and automatically populate the foreign keys and primary keys in your data model.
Examples
Example 1 - SaaS App
While existing data modeling tools typically rely solely on DDLs to create ER diagram, we automatically map out PK & FK keys from SQL JOINs.
Code
WITH tag_engagements AS (
SELECT
t.tag_name,
pt.photo_id,
COUNT(l.user_id) AS likes,
COUNT(c.id) AS comments
FROM tags t
JOIN photo_tags pt ON t.id = pt.tag_id
LEFT JOIN likes l ON pt.photo_id = l.photo_id
LEFT JOIN comments c ON pt.photo_id = c.photo_id
GROUP BY t.tag_name, pt.photo_id
)
SELECT
tag_name,
AVG(likes + comments) AS avg_engagement,
COUNT(*) AS hashtag_usage
FROM tag_engagements
GROUP BY tag_name
ORDER BY avg_engagement DESC;
Example 2 - Financial
It works with any non-id fields (e.g., ON t.currency = e.currency AND t.transaction_date = e.date;), as long as the query contains JOIN statements.
Code
CREATE TABLE normalized_transactions AS
SELECT
t.*,
CASE
WHEN t.currency = 'USD' THEN t.amount
ELSE t.amount * e.exchange_rate
END AS amount_usd
FROM clean_transactions t
LEFT JOIN exchange_rates e
ON t.currency = e.currency AND t.transaction_date = e.date;
CREATE TABLE monthly_financial_statements AS
SELECT
DATE_TRUNC('month', posting_date) AS month,
a.account_type,
SUM(CASE WHEN a.account_type IN ('Asset', 'Expense') THEN daily_net_change ELSE -daily_net_change END) AS net_change,
SUM(CASE WHEN a.account_type IN ('Asset', 'Expense') THEN running_balance ELSE -running_balance END) AS end_balance
FROM daily_account_balances b
JOIN accounts a ON b.account_id = a.account_id
GROUP BY DATE_TRUNC('month', posting_date), a.account_type;
Example 3 - SaaS App v2
Code
-- Total posts by users
SELECT SUM(user_posts.total_posts_per_user)
FROM (SELECT users.username,COUNT(photos.image_url) AS total_posts_per_user
FROM users
JOIN photos ON users.id = photos.user_id
GROUP BY users.id) AS user_posts;
-- User ranking by postings
SELECT users.username, COUNT(photos.image_url)
FROM users
JOIN photos ON users.id = photos.user_id
GROUP BY users.id
ORDER BY 2 DESC;
-- User segments based on their followings and the tags
WITH user_segments AS (
SELECT
u.id AS user_id,
COUNT(DISTINCT f.followee_id) AS following_count,
COUNT(DISTINCT pt.tag_id) AS tag_count
FROM
users u
LEFT JOIN follows f ON u.id = f.follower_id
LEFT JOIN photos p ON u.id = p.user_id
LEFT JOIN photo_tags pt ON p.id = pt.photo_id
GROUP BY u.id
)
SELECT
user_id,
CASE
WHEN following_count >= 1000 AND tag_count >= 100 THEN 'Influencers'
WHEN following_count >= 500 AND tag_count >= 50 THEN 'Active Users'
WHEN following_count >= 100 AND tag_count >= 30 THEN 'Engaged Users'
ELSE 'Casual Users'
END AS user_segment
FROM
user_segments;
-- Which hashtags are most effective in driving engagements?
WITH tag_engagements AS (
SELECT
t.tag_name,
pt.photo_id,
COUNT(l.user_id) AS likes,
COUNT(c.id) AS comments
FROM tags t
JOIN photo_tags pt ON t.id = pt.tag_id
LEFT JOIN likes l ON pt.photo_id = l.photo_id
LEFT JOIN comments c ON pt.photo_id = c.photo_id
GROUP BY t.tag_name, pt.photo_id
)
SELECT
tag_name,
AVG(likes + comments) AS avg_engagement,
COUNT(*) AS hashtag_usage
FROM tag_engagements
GROUP BY tag_name
ORDER BY avg_engagement DESC;