Power BI Desktop is a useful reporting and analytical tool to represent data in various formats. These presentations help us to quickly understand information and circulate it to stakeholders in a visual fashion.

Power BI Desktop是有用的报表和分析工具,可以表示各种格式的数据。 这些演示文稿帮助我们快速了解信息,并以视觉方式将其分发给利益相关者。

We can connect Power BI Desktop to disparate data sources including multiple relational and non-relation databases, Cloud, SharePoint, and Microsoft Exchange etc. In the real world, we use various file formats to represent our documents.

我们可以将Power BI Desktop连接到分散的数据源,包括多个关系和非关系数据库,云,SharePoint和Microsoft Exchange等。在现实世界中,我们使用各种文件格式来表示我们的文档。

In Power BI, we can get data from below file formats

在Power BI中,我们可以从以下文件格式获取数据

  • Excel

  • Text\CSV

    文字\ CSV
  • XML

  • JSON

  • SharePoint Folder


Although popular as an export format, you cannot easily modify a PDF document, and it can contain the text, images, tables, and charts in PDF document.


In Power BI Desktop, we cannot get data from PDF documents directly. Also, we do not have any option to use PDF as a data source. But suppose we want to extract a table from the PDF file and prepare visualizations on it. We can import data from PDF with following methods.

在Power BI Desktop中,我们无法直接从PDF文档获取数据。 此外,我们没有任何选择将PDF用作数据源。 但是,假设我们要从PDF文件中提取一个表格并对其进行可视化处理。 我们可以使用以下方法从PDF导入数据。

  • Convert the PDF file to a Microsoft Word or HTML document and copy the table in an appropriate data source such as Microsoft Excel. It is a lengthy process and if we have to do the same process repeatedly, it quickly becomes cumbersome

    将PDF文件转换为Microsoft Word或HTML文档,然后在适当的数据源(例如Microsoft Excel)中复制表格。 这是一个漫长的过程,如果我们必须重复执行相同的过程,则很快就会变得麻烦
  • We can also use programming languages such as R to extract the required data from a PDF file. It requires extensive technical knowledge of a machine language such as R to work with these PDF files. If the format of the PDF files changes or we need to import a table from another PDF file, we require a change in the programming code. So this again creates organizational friction to contact developers and ask for the code

    我们还可以使用R之类的编程语言从PDF文件中提取所需的数据。 要使用这些PDF文件,需要机器语言(例如R)的广泛技术知识。 如果PDF文件的格式发生更改,或者我们需要从另一个PDF文件导入表格,则需要更改编程代码。 因此,这再次造成组织上的摩擦,无法联系开发人员并要求提供代码

In both approaches, we require extra overhead to prepare PDF reports in Power BI. In this article, we will show how import data from a PDF file easily and without any coding.

在这两种方法中,我们都需要额外的开销才能在Power BI中准备PDF报告。 在本文中,我们将展示如何轻松地且无需任何编码即可从PDF文件导入数据。

Power BI PDF连接器 (Power BI PDF Connector)

The Power BI Desktop contains many preview features. These features are not enabled by default. We can use the PDF Connector preview feature in Power BI to use a PDF file as a data source.

Power BI Desktop包含许多预览功能。 这些功能默认情况下未启用。 我们可以使用Power BI中的PDF Connector预览功能将PDF文件用作数据源。

Launch the Power BI and go to the File menu. In the File menu, go to Options and settings and click on Options.

启动Power BI并转到“文件”菜单。 在文件菜单中,转到选项和设置 ,然后单击选项

It opens various Power BI configuration options. We can configure various options such as Data load, Power BI Query, R scripting, security, Report settings. We need to click on the Preview features menu option to use the PDF connector in Power BI Desktop data source.

它会打开各种Power BI配置选项。 我们可以配置各种选项,例如数据加载,Power BI查询,R脚本,安全性,报告设置。 我们需要单击“ 预览功能”菜单选项,才能在Power BI Desktop数据源中使用PDF连接器。

It opens the various preview features in the Power BI Desktop.

它将在Power BI桌面中打开各种预览功能。

Note: This preview list is updated regularly. Some of the preview features might move to general availability. We will also get new features with the new release of Power BI Desktop.

注意: 此预览列表会定期更新。 某些预览功能可能会变为通用。 新版Power BI Desktop也将提供新功能。

In the preview feature list, enable Get Data from PDF files. We can also read the documentation from the Learn more. It opens a web page for Microsoft Docs. You should have an active internet connection to go through these documentations.

在预览功能列表中,启用从PDF文件获取数据 。 我们还可以从了解更多信息中阅读文档。 它会打开Microsoft Docs的网页。 您应该具有活动的Internet连接才能阅读这些文档。

Click on the checkbox in front of Get data from PDF files and click Ok. We need to restart Power BI to enable the preview feature. Close Power BI Desktop and re-open it.

单击“ 从PDF文件获取数据”前面的复选框,然后单击“ 确定” 。 我们需要重新启动Power BI才能启用预览功能。 关闭Power BI Desktop,然后重新打开。

Once we relaunch Power BI Desktop, go to Get Data->More again from the menu bar. In the following screenshot, you can get a new option under ‘File’. We get the option ‘PDF (Beta) to use as a data source.

重新启动Power BI Desktop之后,从菜单栏中再次转到“ 获取数据”->“更多” 。 在以下屏幕截图中,您可以在“文件”下获得一个新选项。 我们获得了选项“ PDF(Beta)”用作数据源。

Click on PDF (Beta) connector and connect. We again get a warning message that this PDF connector is still under development. Since it is still a preview version, it might have a few bugs that will be resolved before the general availability of this feature.

单击PDF(测试版)连接器并连接。 我们再次收到警告消息,表明该PDF连接器仍在开发中。 由于它仍然是预览版,因此可能会有一些错误,在此功能正式发布之前,将予以解决。

Click on Continue. If we do not want this message, then you can put a check on Don’t warn me again for this connector.

单击继续 。 如果我们不希望出现此消息,则可以选中“ 不要再次对此连接器发出警告”

In this article, we are going to import a sample PDF file that contains the Monthly sales analysis. We can see the text, images along with data table in this PDF file. We need to import table from the following PDF file. We can prepare Power BI Reports using this data.

在本文中,我们将导入一个包含每月销售分析的样本PDF文件。 我们可以在此PDF文件中看到文本,图像以及数据表。 我们需要从以下PDF文件导入表格。 我们可以使用此数据准备Power BI报告。

Once you click on Connect and provide path of this PDF file from a local directory.

单击“ 连接”并从本地目录提供此PDF文件的路径。

It connects to PDF file and opens a Navigator. It lists out the table from PDF and a list of all pages. If we select the page, it shows the complete page content in the Navigator preview window.

它连接到PDF文件并打开一个导航器 。 它列出了PDF中的表格和所有页面的列表。 如果选择页面,它将在“导航器”预览窗口中显示完整的页面内容。

power bi导入文件夹_从Power BI Desktop中的PDF文件导入数据

We need to get data from this table, therefore, put a check in front of the table, and it shows the table on the right-hand side page.


You can see a table from our PDF file. In this preview, we need to make a few changes to display the table contents in a proper format. We can make the changes as well in data imported from the PDF file. Click on Edit.

您可以从我们的PDF文件中看到一个表格。 在此预览中,我们需要进行一些更改以以正确的格式显示表内容。 我们也可以对从PDF文件导入的数据进行更改。 点击编辑

It opens a Power Query Editor. Power Query Editor helps to make the changes in table format, add or remove any column, row, split column etc.

它会打开一个Power Query Editor。 Power Query Editor有助于以表格格式进行更改,添加或删除任何列,行,拆分列等。

In this Power Query Editor, you can see that the fourth row contains the column names. We do not want to have the top three rows in our table. Therefore, we need to remove these three rows from the top.

在此Power Query编辑器中,您可以看到第四行包含列名称。 我们不想在表中有前三行。 因此,我们需要从顶部删除这三行。

power bi导入文件夹_从Power BI Desktop中的PDF文件导入数据

We need to open the list of options for Remove Rows. In the options list, click on Remove Top Rows. It opens a pop-up box to specify the number of rows from the top we want to remove. We want to remove the top three rows from the top, therefore, enter the value and click OK.

我们需要打开“ 删除行”的选项列表。 在选项列表中,单击“ 删除顶部行”。 它会打开一个弹出框,以指定要删除的顶部行数。 我们要从顶部删除顶部的三行,因此,输入值并单击OK

power bi导入文件夹_从Power BI Desktop中的PDF文件导入数据

It removes the number of rows specified from the top (in our case three rows) as shown in below image.


Now the top row in this table contains the column list. Currently, we do not have any column name defined for the table. We want the top row to represent the column names, therefore, click on the Use First Row as Headers as shown in the image above. It uses the top row as a header of the table. Once we clicked on the Use First Rows as Headers, we can see the column name in our data table.

现在,此表的第一行包含列列表。 当前,我们没有为表定义任何列名。 我们希望第一行代表列名,因此,如上图所示,单击“ 将第一行用作 标题” 。 它使用第一行作为表格的标题。 单击“将第一行用作标题”后 ,我们可以在数据表中看到列名。

We might do further filtering in our data. We do not want any NULL values to be there in a table for the location column. Click on the location column, remove the checkbox from the NULL value, and click OK.

我们可能会进一步过滤数据。 我们不希望在location列的表中存在任何NULL值。 单击位置列,从NULL值中删除复选框,然后单击确定

It removes Null values from the table, and you can see data in the tabular format now. You can match this data with the PDF file to have validation.

它从表中删除了Null值,现在您可以以表格格式查看数据。 您可以将此数据与PDF文件进行匹配以进行验证。

We need to click on the Close & Apply from the menu bar to save the changes. It makes all required changes and we can see table fields in the FIELDS section. It identifies the data type accurately, and we can use this data for the reporting purpose easily.

我们需要单击菜单栏中的“ 关闭并应用 ”以保存更改。 它进行了所有必需的更改,我们可以在FIELDS部分中看到表字段。 它可以准确识别数据类型,并且我们可以轻松地将此数据用于报告目的。

We can choose the required fields from this data set and create a Power BI Visualization.

我们可以从该数据集中选择必填字段,然后创建Power BI Visualization。

We can view data as well from the ‘Data’ tab on the left-hand side.


结论 (Conclusion)

In this article, we explored the quick and easy way to get the data from a PDF data source. It is an exciting enhancement to Power BI Desktop functionality.

在本文中,我们探索了从PDF数据源获取数据的快速简便的方法。 它是Power BI Desktop功能的令人兴奋的增强。

