Flink官方文档笔记16 概念总览

Concepts 概念

The Hands-on Training explains the basic concepts of stateful and timely stream processing that underlie Flink’s APIs, and provides examples of how these mechanisms are used in applications.
实际操作培训解释了作为Flink api基础的有状态和及时流处理的基本概念,并提供了如何在应用程序中使用这些机制的示例。

Stateful stream processing is introduced in the context of Data Pipelines & ETL and is further developed in the section on Fault Tolerance.
有状态流处理在数据管道和ETL上下文中介绍,在容错部分进一步发展。

Timely stream processing is introduced in the section on Streaming Analytics.
流分析部分将介绍及时的流处理。

This Concepts in Depth section provides a deeper understanding of how Flink’s architecture and runtime implement these concepts.
深入部分的概念提供了对Flink的架构和运行时如何实现这些概念的更深入的理解。

Flink’s APIs Flink的API集合

Flink offers different levels of abstraction for developing streaming/batch applications.
Flink为开发流/批处理应用程序提供了不同层次的抽象。

Flink官方文档笔记16 概念总览

  • 1

The lowest level abstraction simply offers stateful and timely stream processing. It is embedded into the DataStream API via the Process Function.
最低层次的抽象仅仅提供有状态和及时的流处理。它通过Process函数嵌入到DataStream API中。

It allows users to freely process events from one or more streams, and provides consistent, fault tolerant state.
它允许用户自由地处理来自一个或多个流的事件,并提供一致的、容错的状态。

In addition, users can register event time and processing time callbacks, allowing programs to realize sophisticated computations.
此外,用户可以注册事件时间和处理时间回调,让程序实现复杂的计算。

  • 2

In practice, many applications do not need the low-level abstractions described above, and can instead program against the Core APIs: the DataStream API (bounded/unbounded streams) and the DataSet API (bounded data sets).
在实践中,许多应用程序不需要上面描述的低级抽象,而是可以根据核心API进行编程:DataStream API(有界/无界流)和DataSet API(有界数据集)。

These fluent APIs offer the common building blocks for data processing, like various forms of user-specified transformations, joins, aggregations, windows, state, etc.
这些连贯api为数据处理提供了常见的构建块,比如用户指定的各种形式的转换、连接、聚合、窗口、状态等。

Data types processed in these APIs are represented as classes in the respective programming languages.
在这些api中处理的数据类型在各自的编程语言中表示为类。

The low level Process Function integrates with the DataStream API, making it possible to use the lower-level abstraction on an as-needed basis.
底层流程功能与DataStream API集成,使得在需要的基础上使用底层抽象成为可能。

The DataSet API offers additional primitives on bounded data sets, like loops/iterations.
DataSet API在有界数据集上提供了额外的原语,比如循环/迭代。

  • 3

The Table API is a declarative DSL centered around tables, which may be dynamically changing tables (when representing streams).
表API是一种以表为中心的声明式DSL,它可以动态更改表(在表示流时)。

The Table API follows the (extended) relational model: Tables have a schema attached (similar to tables in relational databases) and the API offers comparable operations, such as select, project, join, group-by, aggregate, etc.
表API遵循(扩展的)关系模型:表附带一个模式(类似于关系数据库中的表),并且API提供了类似的操作,如选择、项目、连接、分组、聚合等。

Table API programs declaratively define what logical operation should be done rather than specifying exactly how the code for the operation looks.
表API程序以声明的方式定义应该执行的逻辑操作,而不是确切地指定操作代码的外观。

Though the Table API is extensible by various types of user-defined functions, it is less expressive than the Core APIs, and more concise to use (less code to write).
尽管表API可以通过各种类型的用户定义函数进行扩展,但它的表达能力不如核心API,而且使用起来更简洁(需要编写的代码更少)。

In addition, Table API programs also go through an optimizer that applies optimization rules before execution.
此外,表API程序在执行之前还需要通过一个优化器应用优化规则。

One can seamlessly convert between tables and DataStream/DataSet, allowing programs to mix the Table API with the DataStream and DataSet APIs.
可以在表和数据令/数据集之间无缝转换,允许程序将表API与数据令和数据集API混合使用。

  • 4

The highest level abstraction offered by Flink is SQL.
Flink提供的最高级抽象是SQL

This abstraction is similar to the Table API both in semantics and expressiveness, but represents programs as SQL query expressions.
这种抽象在语义和表达上与表API相似,但是将程序表示为SQL查询表达式。

The SQL abstraction closely interacts with the Table API, and SQL queries can be executed over tables defined in the Table API.
SQL抽象与表API密切交互,可以在表API中定义的表上执行SQL查询。