基于Flink SQL的实时处理架构

 

本博文介绍基于Flink多数据源->数据治理->Kafka->作业治理->DB的实时处理架构。数据治理,将读库降低为原来的2/3次,并使得多数据源转为统一Schema。基于统一的Schema数据流,作业治理将基于Jstorm的业务逻辑编码项目,简化成多条SQL语句的配置。

基于Flink SQL的实时处理架构

 

sar案例

可以先参阅 storm项目迁移flink案例:系统活动情况报告

而 Flink-sar-etl 流程我们经过进一步的划分抽象得到:

基于Flink SQL的实时处理架构

 

作业治理

关于作业治理中Flink SQL窗口的使用可以参阅:《Flink 1.3 Table and SQL Beta Java API 总结》

用户Json配置,例如:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

<code><code><code><code>  "user.sqls": [

    {

       "sql": "select key,TUMBLE_START(rowtime, INTERVAL '4' SECOND),target,sum(sum_sar),max(max_sar),count(cnt_sar) from result1 GROUP BY key,target,TUMBLE(rowtime, INTERVAL '4' SECOND)",

       "attribute.name": "key,start_time,target,sum_sar,max_sar,cnt_sar",

       "attribute.type": "STRING,SQL_TIMESTAMP,STRING,DOUBLE,DOUBLE,LONG",

       "result.table.name": "result2",

       "result.eventtime.colnum": "2",

       "result.eventtime.coltype": "TIMESTAMP",

       "result.attribute.name": "key,start_time,target,sum_sar,max_sar,cnt_sar",

       "result.attribute.type": "STRING,SQL_TIMESTAMP,STRING,DOUBLE,DOUBLE,LONG",

    },

    {

       "sql": "select key,TUMBLE_START(rowtime, INTERVAL '8' SECOND),target,sum(sum_sar),max(max_sar),count(cnt_sar) from result2 GROUP BY key,target,TUMBLE(rowtime, INTERVAL '8' SECOND)",

       "attribute.name": "key,start_time,target,sum_sar,max_sar,cnt_sar",

       "result.table.name": "result3",

       "result.eventtime.colnum": "2",

       "result.eventtime.coltype": "TIMESTAMP",

       "result.attribute.name": "key,start_time,target,sum_sar,max_sar,cnt_sar",

       "result.attribute.type": "STRING,SQL_TIMESTAMP,STRING,DOUBLE,DOUBLE,LONG",

    }

 </code></code></code></code>