使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

One shared characteristic among popular data visualization tools such as Power BI is the ease at which data can be extracted from a variety of disparate data sources, usually at a click of a button, as shown in Figure 1. Such convenience, though, tends to come at a cost as you often have little control over how background scripts used to extract data are generated. Yet, this should be of utmost concern for data architects and BI developers alike as rarely do you find auto-generated scripts that are efficient and optimal. In this article, join me as I put on my DBA hat and trace, monitor and review SQL batch statements that are auto-generated by the Get Data feature in Power BI using SQL Server Extended Events.

在流行的数据可视化工具(如Power BI)中,一个共同的特征是可以轻松地从各种不同的数据源中提取数据,通常只需单击一个按钮即可,如图1所示。由于您经常无法控制用于提取数据的后台脚本的生成方式,因此需要付出一定的代价。 然而,对于数据架构师和BI开发人员而言,这应该是最重要的问题,因为您很少会找到高效且最佳的自动生成脚本。 在本文中,请加入我的DBA帽子,并跟踪,监视和查看由SQL Server Extended Events中的Power BI中的“获取数据”功能自动生成SQL批处理语句。

使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

Figure 1: Import data from a SQL Server database in Power BI

图1:从Power BI中SQL Server数据库导入数据

使用导航器窗口将数据导入Power BI (Import Data into Power BI Using the Navigator Window)

The Get Data feature in Power BI allows for a connection into SQL Server to be established using different connectivity modes. However, for the purposes of this discussion, we will concentrate only on the Import mode of data connectivity in Power BI. The Import connectivity mode works by extracting a copy of the dataset from a data source and save it within a Power BI (.pbix) document. This mode of connectivity is suitable in environments wherein your connection to the data source is unstable or when working with a data dump. One obvious downside to storing a copy of your data within a Power BI document is that it could potentially increase the overall size of your Power BI document.

Power BI中的“获取数据”功能允许使用不同的连接模式建立与SQL Server的连接。 但是,出于讨论目的,我们将仅关注Power BI中的数据连接的导入模式。 导入连接模式通过从数据源提取数据集的副本并将其保存在Power BI(.pbix)文档中而起作用。 此连接模式适用于您与数据源的连接不稳定或使用数据转储的环境。 在Power BI文档中存储数据副本的明显缺点是,它可能会增加Power BI文档的整体大小。

Whenever Power BI connects to a data source such as SQL Server using an Import connectivity mode, it tends to leave an imprint of itself within that data source environment. Fortunately, by running either a SQL Server Profiler or SQL Server Extended Events, we can get to see just what kind of activities occur when Power BI extracts data out of a SQL Server environment. SQL Server Profiler is now a deprecated feature in SQL Server, hence, we will only make use of SQL Server Extended Events in our demo, specifically the SQL:BatchCompleted event as we are more interested in monitoring back-end queries generated by clicking on the Power BI interface.

每当Power BI使用导入连接模式连接到数据源(例如SQL Server)时,它往往会在该数据源环境中留下自己的烙印。 幸运的是,通过运行SQL Server Profiler或SQL Server扩展事件,我们可以了解到Power BI从SQL Server环境中提取数据时,发生了什么样的活动。 SQL Server Profiler现在是SQL Server中不推荐使用的功能,因此,我们将仅在演示中使用SQL Server扩展事件,特别是SQL:BatchCompleted事件,因为我们对监视通过单击生成的后端查询更感兴趣。 Power BI接口。

Assuming that you have already setup the SQL Server Extended Events session for the SQL:BatchCompleted event, we begin our data import in Power BI by configuring a connection to our SQL Server instance, as shown in Figure 2.

假设您已经为SQL:BatchCompleted事件设置了SQL Server扩展事件会话,我们将通过配置与SQL Server实例的连接来开始在Power BI中导入数据,如图2所示。

使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

Figure 2: Configuring SQL Server database connection in Power BI

图2:在Power BI中配置SQL Server数据库连接

  1. SQL Batch Statements #1: Get Database Names

    The first batch of T-SQL queries is generated as soon as you click the OK button shown at the bottom of Figure 2. This batch generates two SQL:BatchCompleted events as shown in Figure 3. The batch text from the first event runs a SELECT @@VERSION command whilst the second SQL Server extended events is used to retrieve a list of user-defined database names available to the SQL Server instance that you are connected to.

    使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

    Figure 3: Displaying SQL Server extended events

    Power BI then takes this list of user-defined databases and displays it within the Navigator window, as shown in Figure 4

    Figure 4: Power BI Navigator window

    SQL批处理语句#1:获取数据库名称

    单击图2底部所示的OK按钮,将立即生成第一批T-SQL查询。该批处理将生成两个SQL:BatchCompleted事件,如图3所示。第一个事件的批处理文本将运行SELECT。在第二个SQL Server扩展事件中使用@@ VERSION命令来检索用户定义的数据库名称列表,该列表可用于您连接到SQL Server实例。

    使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

    图3:显示SQL Server扩展事件

    然后,Power BI将获取此用户定义的数据库列表,并将其显示在Navigator窗口中,如图4所示。

    图4:Power BI Navigator窗口

  2. SQL Batch Statements #2: Get Tables and Views

    As soon as you expand any of the database nodes shown in Figure 4, Power BI will submit another batch of SQL statements to the SQL Server engine. This time, to retrieve a list of database tables and views associated with a given user.

    使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

    Figure 5: Displaying SQL Server extended events

    Once this latest batch of statements is done running, the Navigator wizard in Power BI will be refreshed to show an extended sub-list of tables and views that the logged in user has access to. Clicking the checkbox next to a table/view in the sub-list will not generate a call to the SQL Server engine but you will get a preview of the table structure as well as a data sample, as shown in Figure 6

    Figure 6: Preview of data sample

    SQL批处理语句#2:获取表和视图

    展开图4中所示的任何数据库节点后,Power BI都会向SQL Server引擎提交另一批SQL语句。 这次,检索与给定用户关联的数据库表和视图的列表。

    使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

    图5:显示SQL Server扩展事件

    一旦这最后一批语句运行完毕 ,Power BI中的Navigator向导将刷新,以显示登录用户有权访问的表和视图的扩展子列表。 单击子列表中表/视图旁边的复选框将不会生成对SQL Server引擎的调用,但是您将获得表结构的预览以及数据示例,如图6所示。

    图6:数据样本预览

  3. SQL Batch Statements #3: Get Tables and Views

    Finally, when you click the Load button in Figure 6, Power BI will submit several T-SQL statements to the SQL Server engine. The majority of T-SQL statements in this batch will perform the same exercise as we have seen in prior batches with the exception of statements querying sys.foreign_key_columns and sys.index_columns system objects which is Power BI’s way of checking for any entity-relationships linked to the table/view we are attempting to load. You can always skip this check by unticking the Include relationship columns option shown in Figure 1. Finally, the last statement in the batch performs the actual data import from our data source into Power BI using the sp_executesql stored procedure.

    使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

    Figure 7: Executing sp_executesql Stored procedure

    So just to wrap up, using the default settings to load a single table into Power BI from SQL Server via the Import connectivity mode, the following batches are generated:

    • Batch 1 – with 2 T-SQL statements
    • Batch 2 – with 3 T-SQL statements
    • Batch 3 – with 7 T-SQL statements

    Although we have demonstrated that relying on Power BI to generates back-end queries to import data creates a total of 12 T-SQL statements for a single table, the end-to-end extraction of data from SQL Server to Power BI is still relatively quick. However, there are still some caveats, for instance, the auto-generated data extraction script uses the EXECUTE sp_executesql command which has been widely advocated for preventing SQL injection attempts yet several authors has also demonstrated in the past that using it can lead to a performance overhead of your SQL Server instance.

    There is also the fact that, just like in SSMS, the Power BI Navigator will retrieve all user-defined databases within a connected SQL Server instance regardless of whether you have access to them or not. This is more annoying than a real issue but as a DBA you might have to attend to support tickets wherein users are trying to access databases that they are actually prevented from accessing. What makes it worse is that the error message displayed in Power BI does not out rightly inform you that a given database is inaccessible – as is the case in SSMS (see Figure 8), instead it returns “We couldn’t authenticate with the credentials provided. Please try again.” error message shown in Figure 9 making it seem like it’s a credentials issue. Which could lead to some users locking out their password in an attempt to gain access into a database that they shouldn’t be accessing in the first place

    Figure 8: Error in SSMS

    使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

    Figure 9: Error in Power BI Desktop

    SQL批处理语句#3:获取表和视图

    最后,当您单击图6中的Load按钮时,Power BI将向SQL Server引擎提交多个T-SQL语句。 除了查询sys.foreign_key_columns和sys.index_columns系统对象的语句(这是Power BI检查链接的任何实体关系的方式)之外,该批处理中的大多数T-SQL语句将执行与先前批处理中相同的操作。到我们试图加载的表/视图。 您始终可以通过取消选中图1中所示的“包括关系列”选项来跳过此检查。最后,该批处理中的最后一条语句使用sp_executesql存储过程执行了从我们的数据源到Power BI的实际数据导入。

    图7:执行sp_executesql存储过程

    因此,总结一下,使用默认设置通过导入连接模式将单个表从SQL Server加载到Power BI中,将生成以下批处理:

    • 批次1 –带有2条T-SQL语句
    • 批次2 –具有3条T-SQL语句
    • 批次3 –具有7条T-SQL语句

    尽管我们已经证明依靠Power BI生成后端查询以导入数据可以为单个表创建总共12条T-SQL语句,但是从SQL Server到Power BI的端到端数据提取仍然相对快。 但是,仍然存在一些警告,例如,自动生成的数据提取脚本使用EXECUTE sp_executesql命令,该命令被广泛提倡防止SQL注入尝试,但过去也有几位作者证明了使用它可以提高性能。 SQL Server实例的开销

    还有一个事实是,就像在SSMS中一样,Power BI Navigator将在连接SQL Server实例中检索所有用户定义的数据库,而不管您是否有权访问它们。 这比实际问题更令人讨厌,但作为DBA,您可能必须参加支持票证,在这些票证中,用户试图访问实际上被阻止访问的数据库。 更糟糕的是,Power BI中显示的错误消息没有正确地通知您给定的数据库不可访问-就像SSMS中那样(请参见图8),而是返回“我们无法使用凭据进行身份验证”提供。 请再试一遍。” 错误消息,如图9所示,似乎是一个凭据问题。 这可能导致某些用户锁定密码,从而试图访问他们本来不应该访问的数据库

    使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

    图8:SSMS中的错误

    图9:Power BI Desktop中的错误

使用自定义SQL语句将数据导入Power BI (Import Data into Power BI by Using Custom SQL Statement)

Another approach to using the Get Data feature in Power BI involves specifying a user-defined SQL statement instead of having Power BI generates it for you. This way, you have more control of the filters you want to apply and the fields you want to retrieve. To demonstrate this approach, we begin by configuring a SQL Server connection, as shown in Figure 10

在Power BI中使用“获取数据”功能的另一种方法涉及指定用户定义SQL语句,而不是让Power BI为您生成它。 这样,您可以更好地控制要应用的过滤器和要检索的字段。 为了演示这种方法,我们首先配置一个SQL Server连接,如图10所示。

使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

Figure 10: Configuring SQL Server Connection

图10:配置SQL Server连接

This time clicking OK at the bottom of Figure 10 does not result in Power BI submitting SQL batches into SQL Server, instead, you are redirected to the data preview window shown in Figure 11.

这次单击图10底部的OK不会导致Power BI将SQL批处理提交到SQL Server中,而是将您重定向到图11所示的数据预览窗口。

使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

Figure 11: Data preview window in Power BI

图11:Power BI中的“数据预览”窗口

Only after you have clicked the Load button in Figure 11, does Power BI generate SQL batch statements. The auto-generated batch contains three SQL:BatchCompleted events as shown in Figure 12. The first event runs a SELECT @@VERSION command whilst the second and the third are a repetition of the custom SQL statement we provided in Figure 10.

仅在单击图11中的Load按钮之后,Power BI才会生成SQL批处理语句。 自动生成的批处理包含三个SQL:BatchCompleted事件,如图12所示。第一个事件运行SELECT @@ VERSION命令,而第二个和第三个事件是我们在图10中提供的自定义SQL语句的重复。

使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

Figure 12: SQL:BatchCompleted SQL Server Extended events

图12:SQL:BatchCompleted SQL Server扩展事件

It’s that simple! You no longer have to worry about Power BI querying system objects to get database and table/view metadata or users trying to access databases they shouldn’t be accessing. However, there is a catch in that unlike in the Navigator window approach wherein your Power BI model entities were created using the names of source tables/views, entities created out of importing data using a user-defined SQL command are prefixed with Query which means you have to edit your model and rename the object accordingly, as shown in igure 13.

就这么简单! 您不再需要担心Power BI查询系统对象来获取数据库和表/视图元数据,也不必担心用户尝试访问不应访问的数据库。 但是,有一个问题是,与Navigator窗口方法不同,在Navigator窗口方法中,Power BI模型实体是使用源表/视图的名称创建的,而使用用户定义SQL命令导入数据而创建的实体则以Query为前缀。您必须编辑模型并相应地重命名对象,如图13所示。

使用SQL Server扩展事件监视Power BI中“获取数据”功能的批处理语句

Figure 13: Data imported using Custom SQL Statement

图13:使用自定义SQL语句导入的数据

Another downside to using the custom SQL statement option is that you can only import one dataset at a time, whereas in the Navigator approach, you could select as many tables/views as you like. The rest of the comparisons are between the two approaches is given in Table 1.

使用自定义SQL语句选项的另一个缺点是一次只能导入一个数据集,而在Navigator方法中,可以选择任意多个表/视图。 表1给出了其余两种方法之间的比较。

Get Data Using Navigator Window

Get Data by Using Custom SQL Statement

By defaults, a total of 12 T-SQL statements are generated

By defaults, a total of 3 T-SQL statements are generated

The database name is optional

The database name is mandatory

Can import multiple tables at a time

Can only import one table per T-SQL statement

Less control of the generated T-SQL statement

More Control of generated T-SQL statement

T-SQL is written for you by Power BI

T-SQL is user-defined

The entity name is derived from the source table name

The entity name is a generic name prefixed with Query

Relationships can be detected by Power BI

Joins and relationships are defined by the user

使用导航器窗口获取数据

使用自定义SQL语句获取数据

默认情况下,总共生成12条T-SQL语句

默认情况下,总共生成3条T-SQL语句

数据库名称是可选的

数据库名称是必需的

可以一次导入多个表

每个T-SQL语句只能导入一个表

对生成的T-SQL语句的控制较少

对生成的T-SQL语句的更多控制

T-SQL是由Power BI为您编写的

T-SQL是用户定义的

实体名称是从源表名称派生的

实体名称是前缀为Query的通用名称

Power BI可以检测到关系

联接和关系由用户定义

Table 1

表格1

摘要 (Summary)

Power BI is an exciting and powerful data visualization tool but like all such tools, it has its pros and cons, particularly when importing data. We hope this article provides a comprehensible approach on monitoring batch statements with Power BI using SQL Server extended events. Also, you get an idea of whether to write your own T-SQL queries to extract data or rely on Power BI to auto-generate scripts that will facilitate data import from SQL Server.

Power BI是令人兴奋且功能强大的数据可视化工具,但与所有此类工具一样,它也有其优缺点,尤其是在导入数据时。 我们希望本文提供一种可理解的方法,用于使用SQL Server扩展事件通过Power BI监视批处理语句。 另外,您将了解是编写自己的T-SQL查询以提取数据还是依靠Power BI自动生成脚本,以方便从SQL Server导入数据。

翻译自: https://www.sqlshack.com/monitor-batch-statements-of-the-get-data-feature-in-power-bi-using-sql-server-extended-events/