20201012——3道sql题
问题1:SQL
有两张数据表A,B,通过SQL语句输出表C
输入表A:
name score_1
张三 90
李四 85
王五 60
孙六 55
输入表B:
name score_2
张三 90
王五 65
孙六 58
朱七 70
输出表C:
name score_1 score_2 maxExam totalScore
张三 90 90 90 180
李四 85 null 85 85
王五 60 65 65 125
孙六 55 58 58 113
朱七 null 70 70 70
*/
我的答案:
select
X.name,A.score_1,B.score_2,
case when A.score_1>B.score_2 or B.score_2 is null then A.score_1
else B.score_2
end as maxExam,
A.score_1+B.score_2 as totalScore
from
(select name from A
union
select name from B
) X
left join
A on X.name=A.name
left join
B on X.name=B.name
;
在hive中执行了一下,结果不对,如下图,原因是+运算缺少nvl函数。
/*
问题2:SQL
有一个订单表A,分别有order_id(订单id)、user_id(用户id)、amt(金额)三个字段,
用sql实现以下功能:
i. 求订单总量为top3的用户及交易笔数,同时求出其交易笔数占全量订单笔数的占比。
ii. 求每个用户top3交易金额的订单。
iii. 求订单总量从大到小排名在50%之前的用户。
*/
准备数据:
select
user_id
,count(1) over(partition by user_id) as order_num
,count(1) over(partition by user_id)/count(1) as order_num_percent
from
A order by order_num desc limit 3;
hive中执行报错:
第三行就报错了,尝试加一个group by user_id(结果错误,count1不应该被分组)
select
user_id
,count(1) over(partition by user_id) as order_num
,count(1) over(partition by user_id)/count(1) as order_num_percent
from
A
group by user_id
order by order_num desc limit 3;
结果错误。
——但是为什么会报错?注释掉order_num_percent可以执行。再看一下报错信息:
Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
百度翻译:无法将窗口调用拆分为组。至少有1个组只能依赖于输入列。还要检查循环依赖项。
回忆下开窗函数的用法,可以用于分组求topN,我这样用为什么不可以?有什么区别呢?假如注释掉order_num_percent:
select
user_id
,count(1) over(partition by user_id) as order_num
-- ,count(1) over(partition by user_id)/count(1) as order_num_percent
from
A order by order_num desc limit 3;
user1有4个订单,这个没问题。
select
user_id
,count(1) over(partition by user_id) as order_num
,count(1) as count1
from
A;
这样依然报相同错。——count over的原理是什么?我有点明白报错原因了:
user_id
,count(1) over(partition by user_id) as order_num
这两行,其实与group by user_id是一样的,可以在每个user_id的组中进行操作,比如count、rank,也可以拿到其他的字段,问题就在这里!——没有group by的count只有一个reduce!这是互相矛盾的!
¥¥¥
第二问:ii. 求每个用户top3交易金额的订单。——用rank
select user_id,order_id,amt
from
(select user_id,order_id,amt
,rank() over(partition by user_id order by amt desc) as rank from A) x
where rank<=3;
第三问:iii. 求订单总量从大到小排名在50%之前的用户。
select user_id,rank() over(order by order_num desc) as rank from
(select user_id,count(1) as order_num from A group by user_id
order by order_num desc) x
where rank<(select count(distinct(user_id)) from A)/2
order by rank;
又报错了:不支持的子查询表达user_id:只有*的“连接副词”被允许。这个没有搞懂,可以替换成join。
¥¥¥
/*
问题3:SQL
给定一个二叉树的表格,包含两列:Node 和 Parent,Node表示二叉树的节点值,
Parent是Node的父节点。
表名为BST:
Column Type
Node Integer
Parent Integer
我们希望能找到二叉树每个节点类型,并按照大小排序。按如下方式输出结果:
当节点是二叉树的根节点,输出Root
当节点是叶子节点时,输出Leaf
当节点两者都不是时,输出Inner
输入样例
Node Parent
1 2
3 2
6 8
9 8
2 5
8 5
5 null
输出样例
1 Leaf
2 Inner
3 Leaf
5 Root
6 Leaf
8 Inner
9 Leaf
二叉树样例:
5
2 8
1 3 6 9
*/
select distinct a.Node,
case when a.Parent is null then 'Root'
when b.Parent is not null then 'Inner'
else 'Leaf'
end as Type
from BST a left join BST b on a.Node=b.Parent
order by a.Node
;
结果符合预期。