将分隔列分隔为HIVE中的唯一行
问题描述:
我有一个数据集。请参阅下面的样品行:将分隔列分隔为HIVE中的唯一行
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507; 1460777656:440515; 1460778054:440488; 1460778157:440481,440600;
每列由一个空格分隔(共3列)。列名是id(int),unid(string),time_stamp(string)。
我想分割数据集,使得每个唯一元件,例如进入下面的行: -
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
- 94654 6802D326-9F9B -4FC8-B2DD-F878EADE31F2 1460777656:440515
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
- 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481
- 94654 6802D326-9 F9B-4FC8-B2DD-F878EADE31F2 1460778157:440600
每个子点是每一行。我已经使用了下面的查询,但它给了我上面的输出。我用下面的代码,它不工作: -
选择ID,UNID,TIME_DATE 从表 侧视爆炸(SPLIT(TIME_DATE, '\;'))作为TIME_DATE TIME_DATE;
输出: - 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507; 1460777656:440515; 1460778054:440488; 1460778157:440481,440600; (以下行重复5次)
帮助将不胜感激!在此先感谢:)
答
首先,我不得不用管道替换分号。所以:
CREATE temporary TABLE tbl
(id int,
unid string,
time_stamp string);
INSERT INTO tbl
VALUES (
94654, '6802D326-9F9B-4FC8-B2DD-F878EADE31F2' , '1460695483:440507|1460777656:440515|1460778054:440488|1460778157:440481,440600');
SELECT
id,
unid,
time_stamp
FROM
(
SELECT
id,
unid,
split(time_stamp,'\\|') ts
FROM
tbl
) t
lateral VIEW explode(t.ts) bar AS time_stamp;
这给我们:
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460777656:440515
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481,440600
你必须做分割,并在单独的步骤爆炸。所以我们在派生表中进行拆分,并在外部查询中进行爆炸/横向视图。
非常感谢你安德鲁!:) – zerxes