的BigQuery:与标准的SQL

的BigQuery:与标准的SQL

问题描述:

Quering我有这个表:的BigQuery:与标准的SQL

client_id session_id time action transaction_id 
1 1 15:01 view NULL  
1 1 15:02 basket NULL  
1 1 15:03 basket NULL  
1 1 15:04 purchase 1 
1 2 15:05 basket NULL  
1 2 15:06 purchase 2 
1 2 15:07 view NULL  

而且我希望会话内部,所有以前的行动来注册,在15:03 TRANSACTION_ID首次(因此发生TRANSACTION_ID = NULL)

session_id time transaction_id 
1 15:01 1 
1 15:02 1 
1 15:03 NULL  
1 15:04 1 
2 15:05 2 
2 15:06 2 
2 15:07 NULL  

下面是BigQuery的标准SQL

#standardSQL 
SELECT 
    client_id, session_id, time, action, 
    (CASE 
    WHEN ROW_NUMBER() 
     OVER (PARTITION BY client_id, session_id, grp, action ORDER BY time) = 1 
    THEN MAX(transaction_id) OVER (PARTITION BY client_id, session_id, grp) END 
) AS transaction_id 
FROM (
    SELECT *, 
    COUNTIF(transaction_id IS NOT NULL) 
     OVER(PARTITION BY client_id, session_id 
     ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp 
    FROM YourTable 
) 
-- ORDER BY client_id, session_id, time 

你可以用虚拟数据如下

试玩
#standardSQL 
WITH YourTable AS (
    SELECT 1 AS client_id, 1 AS session_id, '15:01' AS time, 'view' AS action, NULL AS transaction_id UNION ALL 
    SELECT 1, 1, '15:02', 'basket', NULL UNION ALL 
    SELECT 1, 1, '15:03', 'basket', NULL UNION ALL 
    SELECT 1, 1, '15:04', 'purchase', 1 UNION ALL 
    SELECT 1, 1, '15:05', 'basket', NULL UNION ALL 
    SELECT 1, 1, '15:06', 'basket', NULL UNION ALL 
    SELECT 1, 1, '15:07', 'purchase', 3 UNION ALL 
    SELECT 1, 2, '15:08', 'basket', NULL UNION ALL 
    SELECT 1, 2, '15:09', 'purchase', 2 UNION ALL 
    SELECT 1, 2, '15:10', 'view', NULL 
) 
SELECT 
    client_id, session_id, time, action, 
    (CASE 
    WHEN ROW_NUMBER() 
     OVER (PARTITION BY client_id, session_id, grp, action ORDER BY time) = 1 
    THEN MAX(transaction_id) OVER (PARTITION BY client_id, session_id, grp) END 
) AS transaction_id 
FROM (
    SELECT *, 
    COUNTIF(transaction_id IS NOT NULL) 
     OVER(PARTITION BY client_id, session_id 
     ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp 
    FROM YourTable 
) 
-- ORDER BY client_id, session_id, time 

输出为预期

client_id session_id time action  transaction_id 
1   1   15:01 view  1  
1   1   15:02 basket  1  
1   1   15:03 basket  null  
1   1   15:04 purchase 1  
1   1   15:05 basket  3  
1   1   15:06 basket  null  
1   1   15:07 purchase 3  
1   2   15:08 basket  2  
1   2   15:09 purchase 2  
1   2   15:10 view  null  
+0

非常感谢您的回答!如果session_id = 1中没有事务,但代码将如何更改,但第一个“视图”(或另一个操作)在第一个session_id中。与他相反,显示transaction_id = 2 – Zzema

+0

@Zzema - 我没有看到代码需要改变 - 它仍然产生你期望的结果(根据你的问题) - 你真的尝试过吗? –

+0

是的,我试了一下,谢谢)我的评论与改变的条件没有写在问题中有关......但是,在阅读了关于窗口函数之后,我想出了如何重新编写你的答案,再次感谢 – Zzema

嗯。 。 。假设有每个会话只能有一个事务ID,那么你可以使用窗口功能:

select t.*, 
     (case when row_number() over (partition by client_id, session_id, action 
            order by time) = 1 
      then max(transactc 
ion_id) over (partition by client_id, session_id) 
     end) as new_transaction_id 
from t 
+0

非常感谢您的回答!如果session_id = 1中没有事务,但代码将如何更改,但第一个“视图”(或另一个操作)在第一个session_id中。与他相反显示transaction_id = 2 – Zzema

+0

@Zzema。 。 。如果在一个会话中没有事务,那么值就是'NULL',正如你的问题所指定的那样:“而且我希望在会话中,所有先前的操作都注册第一次发生的transaction_id”。 –