Postgresql IN子句与嵌套SELECT与JOIN性能
问题描述:
我现在有一个查询,运行良好,但将有缩放问题。我发现的解决方案非常慢。我期待加快第二个查询。Postgresql IN子句与嵌套SELECT与JOIN性能
,将无法很好地扩展旧的查询:
SELECT user.score
FROM users
WHERE
user.id IN (
SELECT user_id
FROM companies_users
ON companies_users.company_id = X
)
然后我会在不同的分数循环将它们分组。得分范围从-10到10.问题来自IN SELECT语句和迭代。可能有超过一百万个user_ids被返回。
替代我来了应该变得更好,但疯狂慢:
SELECT
COUNT(*) as total_scores,
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = 10 AND cu.company_id = X) as "10",
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = 9 AND cu.company_id = X) as "9",
...
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = -9 AND cu.company_id = X) as "-9",
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = -10 AND cu.company_id = X) as "-10"
FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE cu.company_id = X
第一个查询需要反复进入工作数据。第二个很好走。
有没有办法将JOIN从嵌套的SELECT中拉出来?这似乎导致第二个查询中的大部分放缓。另外,我是否对第一个查询在处理数百万个ID时不能很好地进行扩展?
答
,会是什么问题:
SELECT u.score
FROM companies_users cu
JOIN users u ON cu.user_id = u.id
WHERE cu.company_id=?
GROUP BY u.score
ORDER BY u.score
?
此外,你有适当的指数?您需要companies_users(company_id)
上的索引,以及users(id)
上的索引。您可以尝试在company_users(user_id)上添加一个,以防计划者决定以相反方式执行查询。 EXPLAIN
和EXPLAIN ANALYZE
是你的朋友。
感谢您的回复!这非常接近完美。我其实在寻找不同分数的数字。我用你的解决方案,但将选择部分改为u.score,count(u.score)并获得所有数据!再次感谢。 – amiksch