增强的PolyBase SQL 2019-安装和基本概述

SQL Server 2019 is recently launched in the ignite 2018 event by Microsoft. We can get an overview of SQL 2019 preview version and learn how to install it on Windows environment by following up the article SQL Server 2019 overview and installation.

Microsoft最近在ignite 2018活动中启动了SQL Server 2019。 通过跟踪文章SQL Server 2019概述和安装,我们可以获得SQL 2019预览版的概述并了解如何在Windows环境中安装它。

We will explore SQL 2019 Enhanced PolyBase feature in a series of article. In this first part of the article, we will explore below topics

我们将在系列文章中探索SQL 2019增强型PolyBase功能。 在本文的第一部分中,我们将探讨以下主题

  • Overview of ETL and PolyBase

    ETL和PolyBase概述
  • Install PolyBase into SQL 2019

    将PolyBase安装到SQL 2019
  • Overview and Installation of Azure Data Studio

    概述和安装Azure Data Studio
  • SQL Server 2019 preview extension in Azure Data Studio

    Azure Data Studio中SQL Server 2019预览扩展

ETL和PolyBase概述 (Overview of ETL and PolyBase)

In today’s industry requirement, we have data in various databases such as Oracle, MongoDB, Teradata, PostgreSQL, etc. The application requires accessing data from these various data sources and combining data into a single source. It is a challenging task for the database developers and data scientists. We normally use ETL (Extract-Transform-Load) to move the data around the different sources.

在当今的行业需求中,我们将数据存储在Oracle,MongoDB,Teradata,PostgreSQL等各种数据库中。该应用程序需要从这些各种数据源访问数据并将数据合并为一个源。 对于数据库开发人员和数据科学家而言,这是一项艰巨的任务。 通常,我们使用ETL(Extract-Transform-Load)在不同源上移动数据。

Below are the steps involved in ETL processes

以下是ETL流程中涉及的步骤

  • Read data from the data source of your choice and extract the specific data

    从您选择的数据源中读取数据并提取特定数据
  • Transform process works on this data based on the logic, rules, and convert data

    转换过程根据逻辑,规则对这些数据进行处理,然后转换数据
  • Load process writes the data to the destination database

    加载过程将数据写入目标数据库

增强的PolyBase SQL 2019-安装和基本概述

ETL provides great values to apply business logic to the data transform data from various sources and move the data into a single destination or multiple formats. ETL process is having some challenges as below:

ETL提供了巨大的价值,可以将业务逻辑应用于来自各种来源的数据转换数据,并将数据移至单个目标或多种格式。 ETL流程面临以下挑战:

  • We need to move data from the source that will require extra resources in terms of disk space

    我们需要从源移动数据,这将需要额外的磁盘空间资源
  • Data security is also another aspect. Copy of the data should be should be secured from unauthorized access

    数据安全性也是另一方面。 应当保护数据副本免遭未经授权的访问
  • An ETL process is slow to process and requires efforts to maintain due to its complex logic

    ETL流程处理缓慢,由于逻辑复杂,需要维护

In SQL Server 2016, we came across new feature ‘PolyBase’ that allows querying relational and non-relational databases. This data virtualization allows integrating data from the multiple sources without moving the data. This actually creates a virtual data layer called as data lake or data hub. We can access all data from the single sources that allows controlling security as well from a single point. We can query Hadoop and Azure Blob Storage using PolyBase in SQL Server 2016.

在SQL Server 2016中,我们遇到了新功能“ PolyBase”,该功能允许查询关系数据库和非关系数据库。 这种数据虚拟化允许集成多个来源的数据而无需移动数据。 实际上,这将创建一个称为数据湖或数据中心的虚拟数据层。 我们可以从单一来源访问所有数据,从而可以从一个角度控制安全性。 我们可以在SQL Server 2016中使用PolyBase查询Hadoop和Azure Blob存储。

In the article, SQL Server 2016 – PolyBase tutorial, we explored query a CSV file stored in Azure Blob storage from SQL Server 2016 using PolyBase.

SQL Server 2016 – PolyBase教程文章中 ,我们探讨了使用PolyBase从SQL Server 2016查询存储在Azure Blob存储中的CSV文件。

SQL 2019 provides enhancement to PolyBase to access data from various data sources such as Oracle, Teradata, MongoDB, and PostgreSQL. We can also access data from any data sources with an ODBC driver. We can create external tables that link to these data sources (SQL Server, Oracle, Teradata, MongoDB, or any data source with an ODBC). Users can access these data from external tables similar to a relational database table. These external tables are linked to the data sources and when we execute any query, data from an external table is retrieved and shown to the user.

SQL 2019对PolyBase进行了增强,以访问来自各种数据源(例如Oracle,Teradata,MongoDB和PostgreSQL)的数据。 我们还可以使用ODBC驱动程序访问任何数据源中的数据。 我们可以创建链接到这些数据源(SQL Server,Oracle,Teradata,MongoDB或任何具有ODBC的数据源)的外部表。 用户可以从类似于关系数据库表的外部表访问这些数据。 这些外部表链接到数据源,并且当我们执行任何查询时,都会检索来自外部表的数据并将其显示给用户。

On the image below, we can see PolyBase in SQL Server 2019:

在下图上,我们可以看到SQL Server 2019中的PolyBase:

增强的PolyBase SQL 2019-安装和基本概述

将PolyBase安装到SQL Server 2019中 (Install PolyBase into SQL Server 2019)

Let us first install PolyBase into SQL 2019. In an earlier article, SQL Server 2019 installation on Windows, we installed SQL 2019 preview version. Therefore, I will not cover complete installation here.

首先让我们将PolyBase安装到SQL 2019中。在较早的文章Windows上SQL Server 2019安装中 ,我们安装了SQL 2019预览版。 因此,我不会在此处介绍完整的安装。

Put a checkbox against ‘PolyBase Query Service for external data’ in the feature selection page.

在功能选择页面中选中“针对外部数据的PolyBase查询服务”复选框。

增强的PolyBase SQL 2019-安装和基本概述

You need to install Oracle JRE 7 update 51 or higher to install Polybase. If it is not installed, you will get below error message while checking the rules for installation.

您需要安装Oracle JRE 7 update 51或更高版本才能安装Polybase。 如果尚未安装,则在检查安装规则时会出现以下错误消息。

增强的PolyBase SQL 2019-安装和基本概述

To fix this error, go to ‘Java SE Runtime Environment 8 Downloads‘ and download Java SE Runtime Environment 8u191E. Double click on the setup file to install it.

要解决此错误,请转到“ Java SE Runtime Environment 8下载 ”并下载Java SE Runtime Environment 8u191E。 双击安装文件进行安装。

增强的PolyBase SQL 2019-安装和基本概述

In the next page, we need to do the PolyBase Configuration. If we are installing PolyBase on a standalone instance, select the option ‘ Use this SQL Server as standalone PolyBase enabled instance’

在下一页中,我们需要进行PolyBase配置。 如果要在独立实例上安装PolyBase,请选择选项“将此SQL Server用作独立的启用PolyBase的实例”

We can also set up PolyBase a scale-out configuration in which we define the head node and compute nodes. This allows getting performance improvement for the large data sets. You can get more information about this option from PolyBase scale-out groups as shown in below image obtained from this page.

我们还可以设置PolyBase的横向扩展配置,在其中定义头节点和计算节点。 这样可以提高大型数据集的性能。 您可以从PolyBase横向扩展组中获取有关此选项的更多信息,如从此页面获得的下图所示。

增强的PolyBase SQL 2019-安装和基本概述

In this article, we will use PolyBase on standalone SQL Server instance. Therefore, select the first option ‘Use this server as a standalone PolyBase enabled instance’ and click Next.

在本文中,我们将在独立SQL Server实例上使用PolyBase。 因此,选择第一个选项“将此服务器用作启用了PolyBase的独立实例”,然后单击“ 下一步”

增强的PolyBase SQL 2019-安装和基本概述

In the next page, we can specify the service accounts for below two PolyBase services. Service account should be the same for both the services.

在下一页中,我们可以为以下两个PolyBase服务指定服务帐户。 两种服务的服务帐户应相同。

  • SQL Server PolyBase engine

    SQL Server PolyBase引擎
  • SQL Server PolyBase data movement

    SQL Server PolyBase数据移动

增强的PolyBase SQL 2019-安装和基本概述

Review the configuration and click on Install.

查看配置,然后单击“ 安装”

增强的PolyBase SQL 2019-安装和基本概述

Below is the confirmation page after ‘PolyBase Query Service for External data’ service installation is successful.

以下是成功安装“用于外部数据的PolyBase Query Service”服务之后的确认页面。

增强的PolyBase SQL 2019-安装和基本概述

Check the services in the configuration manager. It should be in running state.

在配置管理器中检查服务。 它应该处于运行状态。

增强的PolyBase SQL 2019-安装和基本概述

概述和安装Azure Data Studio (Overview and Installation of Azure Data Studio)

In the previous articles, Azure Data Studio, we learned that Azure Data Studio is a new GUI based tool that works on Windows, Mac OS and Linux operating systems. It connects to SQL Server, Azure database, and SQL Data Warehouse.

在上一篇文章Azure Data Studio中 ,我们了解到Azure Data Studio是一种基于GUI的新工具,可在Windows,Mac OS和Linux操作系统上运行。 它连接到SQL Server,Azure数据库和SQL数据仓库。

Azure Data studio is now a new name for SQL Operation Studio. Azure Data studio provides support for the SQL Server 2019 new features in the October release such as support to big data clusters, enhanced PolyBase, Azure notebook, Azure resource explorer.

Azure数据工作室现在是SQL Operation Studio的新名称。 Azure Data Studio在10月发行版中提供了对SQL Server 2019新功能的支持,例如对大数据群集,增强的PolyBase,Azure笔记本,Azure资源浏览器的支持。

We can install Azure Data Studio on Windows, Linux, and MacOS. In this article, we will install on the windows environment.

我们可以在Windows,Linux和MacOS上安装Azure Data Studio。 在本文中,我们将安装在Windows环境中。

Follow the below steps:

请按照以下步骤操作:

  • Download the latest October release of Azure Data Studio from the link

    链接下载最新的十月份版本的Azure Data Studio

增强的PolyBase SQL 2019-安装和基本概述

Once the setup download is complete, double-click to launch the setup wizard.

设置下载完成后,双击以启动安装向导。

.

增强的PolyBase SQL 2019-安装和基本概述

Accept the license agreement and click on Next.

接受许可协议,然后单击“ 下一步”

增强的PolyBase SQL 2019-安装和基本概述

Specify the destination directory. The default location is ‘C:\Program Files\Azure Data Studio’. We need to have at least 365.2 MB of free disk space in the disk.

指定目标目录。 默认位置为“ C:\ Program Files \ Azure Data Studio”。 磁盘中至少需要有365.2 MB的可用磁盘空间。

增强的PolyBase SQL 2019-安装和基本概述

Setup creates the start menu folder. We can select the folder in the start menu. If we do not want to create the startup menu folder, put a checkbox on ‘Don’t create a Start Menu folder’.

安装程序将创建开始菜单文件夹。 我们可以在开始菜单中选择文件夹。 如果我们不想创建启动菜单文件夹,请在“不创建开始菜单文件夹”上打勾。

增强的PolyBase SQL 2019-安装和基本概述

We can also select to create a desktop icon. This also adds a PATH in the environment variable.

我们还可以选择创建一个桌面图标。 这还会在环境变量中添加一个PATH。

增强的PolyBase SQL 2019-安装和基本概述

We can also register Azure Data studio to use an editor for the supported file types. To do so, put a check here as shown below.

我们还可以注册Azure Data Studio以使用受支持文件类型的编辑器。 为此,请在此处进行检查,如下所示。

增强的PolyBase SQL 2019-安装和基本概述

Configuration is now completed, Click on Install to complete the installation process of Azure Data Studio.

现在配置已完成,单击“ 安装”以完成Azure Data Studio的安装过程。

增强的PolyBase SQL 2019-安装和基本概述

增强的PolyBase SQL 2019-安装和基本概述

We get the below screen once the setup is complete for the Azure Data Studio. We can launch the Azure Data Studio from here itself or from the Start menu.

为Azure Data Studio完成设置后,我们将显示以下屏幕。 我们可以从此处本身或从“开始”菜单启动Azure Data Studio。

增强的PolyBase SQL 2019-安装和基本概述

Default screen for the Azure Data Studio is as shown below. SQL Server 2019 is in preview state so here we get the option whether we want to enable preview features. Click on yes to enable the preview features.

Azure Data Studio的默认屏幕如下所示。 SQL Server 2019处于预览状态,因此在这里我们可以选择是否要启用预览功能。 单击“是”以启用预览功能。

增强的PolyBase SQL 2019-安装和基本概述

Enter the connection details like instance name, authentication type, server group (we can select existing server group or create a new group).

输入连接详细信息,例如实例名称,身份验证类型,服务器组(我们可以选择现有服务器组或创建新组)。

Azure Data Studio also allows specifying the friendly name for the connection in the recent release.

Azure Data Studio还允许在最新版本中为连接指定友好名称。

增强的PolyBase SQL 2019-安装和基本概述

As shown below, we are connected to SQL 2019 preview instance with the friendly name in Azure Data Studio.

如下所示,我们使用Azure Data Studio中的友好名称连接到SQL 2019预览实例。

增强的PolyBase SQL 2019-安装和基本概述

Now in order to use SQL 2019 preview version all features, we need to install ‘SQL Server 2019 (Preview)’ extension from the Marketplace.

现在要使用SQL 2019预览版的所有功能,我们需要从市场上安装'SQL Server 2019(Preview)'扩展。

Click on the ‘SQL Server 2019 (Preview)’ extension in the Marketplace and we can get an overview of the preview extension. You can go through it to get more information about the extension.

在市场中单击“ SQL Server 2019(Preview)”扩展,我们将获得预览扩展的概述。 您可以遍历它以获取有关扩展的更多信息。

增强的PolyBase SQL 2019-安装和基本概述

Click on Install opens up a webpage where we can download the SQL Server 2019 extension (preview) .vsix file.

增强的PolyBase SQL 2019-安装和基本概述

单击安装打开一个网页 ,我们可以在其中下载SQL Server 2019扩展名(预览).vsix文件。

Now go to file -> “Install Extensions from VSIX Package” and provide the path of the downloaded .vsix file.

现在转到文件->“从VSIX软件包安装扩展”,并提供下载的.vsix文件的路径。

增强的PolyBase SQL 2019-安装和基本概述

Click Yes to install the extension. This will take some time to install this SQL Server 2019 preview extension.

单击“ 是”以安装扩展程序。 安装此SQL Server 2019预览扩展将需要一些时间。

增强的PolyBase SQL 2019-安装和基本概述

We get the below message after the extension is successfully installed. Click on Reload Now to install its dependencies and take this extension into effect.

成功安装扩展程序后,我们会收到以下消息。 单击立即重新加载以安装其依赖项并使此扩展生效。

增强的PolyBase SQL 2019-安装和基本概述

结论 (Conclusion)

In this article, we took an overview of SQL 2019 PolyBase enhancements, Azure Data Studio installation and its extension to support SQL Server 2019 preview features. In the next article, we will create sample database objects in the Oracle and create external tables to access these objects from the SQL 2019 PolyBase external tables.

在本文中,我们概述了SQL 2019 PolyBase增强功能,Azure Data Studio安装及其扩展以支持SQL Server 2019预览功能。 在下一篇文章中,我们将在Oracle中创建示例数据库对象,并创建外部表以从SQL 2019 PolyBase外部表访问这些对象。

目录 (Table of contents)

Enhanced PolyBase SQL 2019 – Installation and basic overview
Enhanced PolyBase SQL 2019 – External tables for Oracle DB
Enhanced PolyBase SQL 2019 – External tables using t-SQL
Enhanced PolyBase SQL 2019 – External tables SQL Server, Catalog view and PushDown
Enhanced PolyBase SQL 2019 – MongoDB and external table
增强的PolyBase SQL 2019-安装和基本概述
增强的PolyBase SQL 2019-Oracle DB的外部表
增强的PolyBase SQL 2019-使用t-SQL的外部表
增强的PolyBase SQL 2019-外部表SQL Server,目录视图和下推式
增强的PolyBase SQL 2019 – MongoDB和外部表

翻译自: https://www.sqlshack.com/enhanced-polybase-sql-2019-installation-and-basic-overview/