在SQL Server中实现人工神经网络（ANN）

In this article, we will be discussing Microsoft Neural Network in SQL Server. This is the seventh article of our SQL Server Data mining techniques series. Naïve Bayes, Decision Trees, Time Series, Association Rules, Clustering, and Linear Regression are the other techniques that we discussed until this article.

在本文中，我们将讨论SQL Server中的Microsoft神经网络。这是我们SQL Server数据挖掘技术系列的第七篇文章。朴素的贝叶斯，决策树，时间序列，关联规则，聚类和线性回归是我们在本文之前讨论的其他技术。

什么是人工神经网络？ (What is an Artificial Neural Network?)

An Artificial Neural Network (ANN) can be considered as a classification and as a forecasting technique. Microsoft Neural Network in SQL Server is typically a more sophisticated technique than Decision Trees and Naïve Bayes. This technique tries to simulate how the human brain works. In this technique, there are three layers, Input, Hidden, and Output, as shown in the below screenshot.

人工神经网络（ANN）可以被视为分类和预测技术。与决策树和朴素贝叶斯相比，SQL Server中的Microsoft神经网络通常是一种更为复杂的技术。该技术试图模拟人脑的工作方式。在此技术中，共有三层，即输入，隐藏和输出，如下面的屏幕快照所示。

The input layer is mapped to the input attributes. If you remember the AdventureWorks example, we are looking at Age, Gender, Number of Children are the inputs to the Input layer.

输入层映射到输入属性。如果您还记得AdventureWorks的示例，我们将查看“年龄”，“性别”，“孩子数”作为“输入”层的输入。

The Hidden layer is an intermediate layer where every input with weightage is received to each node in the hidden layer.

隐藏层是中间层，每个具有权重的输入都被接收到隐藏层中的每个节点。

The Output layer is mapped to the predicted attributes. In our AdventureWorks example, Bike Buyer will be mapped to the output layer.

输出层映射到预测属性。在我们的AdventureWorks示例中，Bike Buyer将被映射到输出层。

A neuron is a basic unit that combines multiple inputs and a single output. Combinations of inputs are done with different techniques, and the Microsoft Neural Network uses Weighted Sum. Maximum, Average, logical AND, logical OR are the other techniques used by the different implementation.

神经元是将多个输入和一个输出组合在一起的基本单元。输入的组合使用不同的技术完成，Microsoft神经网络使用加权总和。最大值，平均值，逻辑与，逻辑或是不同实现使用的其他技术。

After these inputs are calculated, then the activation function is used. In theory, sometimes, small input will have a large output, and on the other hand, large input might be insignificant to the output. Therefore, typically non-linear functions are used for activation. In Microsoft Neural Network uses tanh as the hidden layer activation function and sigmoid function for the output layer.

计算完这些输入后，即可使用**功能。从理论上讲，有时小输入将具有大输出，另一方面，大输入可能对输出无关紧要。因此，通常将非线性函数用于**。在Microsoft Neural Network中，将tanh用作隐藏层**功能，并将Sigmoid函数用作输出层。

反向传播 (Backpropagation)

Backpropagation is the core part of the Artificial Neural Network. Unlike other techniques, this technique has the learning capability. The Learning capability is achieved via Backpropagation. In this technique, the error is calculated, and the weights points will be modified.

反向传播是人工神经网络的核心部分。与其他技术不同，该技术具有学习能力。学习能力是通过反向传播实现的。在此技术中，将计算误差，并将修改权重点。

Let us see how Microsoft Neural Network in SQL Server works.

让我们看看SQL Server中的Microsoft神经网络如何工作。

At the initial stage, random values between -1 to 1 are assigned as weightages
在初始阶段，将-1到1之间的随机值分配为权重
For the training set. The algorithm calculated the output and output error
对于训练集。该算法计算输出和输出误差
The Backpropagation process calculates the error for each output and hidden neurons in the network
反向传播过程计算网络中每个输出和隐藏神经元的误差
The weightages are modified
重量已修改
Repeat from step 2 until the condition is satisfied with minimum error
从第2步开始重复，直到条件满足且误差最小

在SQL Server中实现人工神经网络 (Implementing an Artificial Neural Network in SQL Server)

Let us do the same example of Bike Buyer that we did for Naïve Bayes and Decision Trees. Like we did for all the other examples, let us create the Data Source pointing to the AdventureWorksDW database and Data Source View with vTargetMail.

让我们做一个与“朴素贝叶斯”和“决策树”相同的自行车购买者示例。就像我们对所有其他示例所做的一样，让我们创建指向AdventureWorksDW数据库的数据源和带有vTargetMail的数据源视图。

Then let us select the Microsoft Neural Network, as shown in the following screenshot.

然后，让我们选择Microsoft Neural Network，如以下屏幕截图所示。

Then let us select the Input and Predict attribute, as shown in the below screenshot.

然后，让我们选择Input and Predict属性，如下面的屏幕快照所示。

We have chosen input attributes that will make sense to predict the bike buyer. For example, we do not think that attributes such as Address, email address are important variables.

我们选择了可以预测自行车购买者的输入属性。例如，我们认为地址，电子邮件地址等属性不是重要变量。

Next is to change the Content-Type. Though Neural Network supports continuous types, in this example, only Yearly Income should be continuous, and other content types should be changed to Discrete due to the nature of the data set. After those changes, the screen should be as follows.

接下来是更改Content-Type。尽管神经网络支持连续类型，但在此示例中，仅年收入应该是连续的，并且由于数据集的性质，其他内容类型应更改为“离散”。进行这些更改之后，屏幕应如下所示。

After this modification, the rest of the wizard can be configured with default values, as we discussed in our first article of the series.

修改之后，向导的其余部分可以配置为默认值，正如我们在本系列的第一篇文章中所讨论的那样。

The above model is to perform classification whether the customer is a bike buyer or a not. Now, let us add another neural network model to forecast yearly income, like the decision tree algorithm, Microsoft Neural Network in SQL Server can be used as a classification and forecasting technique.

以上模型用于执行分类，无论客户是不是自行车购买者。现在，让我们添加另一个神经网络模型来预测年收入，就像决策树算法一样，SQL Server中的Microsoft神经网络可以用作分类和预测技术。

You can add another mining model the same structure without creating another structure, as shown in the below screenshot.

您可以在不创建其他结构的情况下添加具有相同结构的其他挖掘模型，如下面的屏幕快照所示。

After the model is created, you need to change the Yearly Income to predict, as shown in the below screenshot.

创建模型后，您需要更改年收入以进行预测，如下面的屏幕快照所示。

Now you have two mining models in the same data mining structure.

现在，您在相同的数据挖掘结构中拥有两个挖掘模型。

Then let us process both models together and view the results. Further, if you want, you can process the model by model.

然后，让我们一起处理这两个模型并查看结果。此外，如果需要，可以按模型处理模型。

模型查看器 (Model Viewer)

Let us view the results for the Bike Buyer prediction model built using the Microsoft Artificial Neural Network algorithm, as shown in the below screenshot.

让我们查看使用Microsoft人工神经网络算法构建的Bike Buyer预测模型的结果，如下面的屏幕快照所示。

Above screenshot indicates that Customers whose age is 93 are more favorable of buying a car. Further, we can filter the results. The following screenshot shows the results for Single, Female who has two cars.

上面的屏幕截图表明，年龄为93岁的客户更喜欢购买汽车。此外，我们可以过滤结果。以下屏幕截图显示了拥有两辆车的Single，Female的搜索结果。

By analyzing these views, the user can understand what are contributing attributes towards the classification of a Bike Buyer.

通过分析这些视图，用户可以了解什么对自行车购买者的分类有贡献。

Let us look at the Year Income prediction model viewer, as shown in the below screenshot.

让我们看一下“年收入”预测模型查看器，如下面的屏幕快照所示。

Since Yearly Income is a continuous attribute, you can choose them in ranges, as shown above. In this model, we can filter for several attributes. Similarly, you can get an understanding of what are the most significant factors for each range of Yearly Income attribute.

由于“年收入”是一个连续属性，因此可以在范围内选择它们，如上所示。在此模型中，我们可以过滤几个属性。同样，您可以了解“年收入”每个范围中最重要的因素是什么。

预测 (Prediction)

Let us see how we can use these models to predict. As we discussed in the previous article, Microsoft Neural Network in SQL Server can be used to predict from DMX queries and the provided user interface.

让我们看看如何使用这些模型进行预测。正如我们在上一篇文章中讨论的那样，SQL Server中的Microsoft神经网络可用于根据DMX查询和提供的用户界面进行预测。

Following is the user interface for making predictions:

以下是用于进行预测的用户界面：

The same results can be achieved by using a DMX query from the SQL Server Management Studio (SSMS), as shown in the below screenshot.

通过使用来自SQL Server Management Studio（SSMS）的DMX查询，可以实现相同的结果，如下面的屏幕快照所示。

Let us forecast the Yearly Income using the ANN_YearlyIncome model from the DMX, as shown below.

让我们使用DMX的ANN_YearlyIncome模型预测年收入，如下所示。

In case you are accessing data mining models from the application, DMX queries can be used.

如果您要从应用程序访问数据挖掘模型，则可以使用DMX查询。

型号参数 (Model Parameters)

There are Microsoft Artificial Neural Network related model parameters to achieve better results. As we discussed in the previous articles, by changing the parameter values, you will be able to achieve better results. All of these parameters are available only in Enterprise edition.

有Microsoft人工神经网络相关的模型参数可以达到更好的效果。正如我们在前几篇文章中讨论的那样，通过更改参数值，您将能够获得更好的结果。所有这些参数仅在企业版中可用。

HIDDEN_NODE_RATIO (HIDDEN_NODE_RATIO)

This parameter specifies a number used in determining the number of nodes in the hidden layer. The algorithm calculates the number of nodes in the hidden layer as HIDDEN_NODE_RATIO * sqrt({the number of input nodes} * {the number of output nodes}).

此参数指定一个数字，该数字用于确定隐藏层中的节点数。该算法将隐藏层中的节点数计算为HIDDEN_NODE_RATIO * sqrt（{输入节点数} * {输出节点数}）。

HOLDOUT_PERCENTAGE (HOLDOUT_PERCENTAGE)

This parameter specifies the percentage of cases within the training data used to calculate the holdout error for this algorithm. HOLDOUT_PERCENTAGE is used as part of the stopping criteria while training the mining model. The default value for this parameter is 30.

此参数指定训练数据中用于计算该算法的保持误差的案例百分比。在训练挖掘模型时，将HOLDOUT_PERCENTAGE用作停止条件的一部分。此参数的默认值为30。

HOLDOUT_SEED (HOLDOUT_SEED)

This parameter specifies a number to use to seed the pseudo-random generator when randomly determining the holdout data for this algorithm. This value is unique to this algorithm and is unrelated to any holdout parameters set in the mining structure. The default values for this parameter is 0

此参数指定一个数字，当为该算法随机确定保持数据时，可使用该数字作为伪随机生成器的种子。该值对该算法是唯一的，并且与在挖掘结构中设置的任何保持参数无关。此参数的默认值为0

SAMPLE_SIZE (SAMPLE_SIZE)

This parameter defines the number of cases that are used to train the model. The algorithm either uses the number specified by SAMPLE_SIZE or total_cases * (1 – HOLDOUT_PERCENTAGE/100), depending on which one is smaller.

此参数定义用于训练模型的案例数。该算法使用SAMPLE_SIZE指定的数字或total_cases *（1-HOLDOUT_PERCENTAGE / 100），具体取决于哪个较小。

结论 (Conclusion)

Microsoft Artificial Neural Network in SQL Server is one of the most sophisticated algorithms available in the SQL Server Data Mining family. This technique tries to simulate how the brain works with input and outputs. Further, this technique can be used to solve classification and regression problems like the Decision Tree algorithm. Both discrete and continuous input variables can be used for this technique.

SQL Server中的Microsoft人工神经网络是SQL Server数据挖掘家族中可用的最复杂的算法之一。该技术试图模拟大脑如何处理输入和输出。此外，该技术可用于解决分类和回归问题，例如决策树算法。离散输入变量和连续输入变量均可用于此技术。