ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

ssas报表项目数据集

In this article, we’ll discuss some tips and best practices regarding the design of OLAP cubes in Analysis Services Multidimensional (SSAS). Most tips – if not all – are applicable for SSAS 2008 to 2016 (and later most likely). Since Analysis Services Tabular – the in-memory columnstore OLAP database from Microsoft – is a completely different design experience, it will not be included in this article.

在本文中,我们将讨论有关Analysis Services MultiDimension(SSAS)中的OLAP多维数据集设计的一些技巧和最佳实践。 大多数技巧(即使不是全部)也适用于SSAS 2008到2016(以后很可能)。 由于Analysis Services Tabular(Microsoft的内存列存储OLAP数据库)是完全不同的设计体验,因此本文中将不包括。

The list provided is not exhaustive. Whole books can be written about the subject, so a selection must be made. The tips in this article are considered by the author as basic needs for any cube design. However, as usual, best practices are only useful as a rule of thumb. Sometimes it can be necessary to deviate from the general rule.

提供的列表并不详尽。 可以撰写有关该主题的整本书,因此必须进行选择。 作者将本文中的提示视为任何多维数据集设计的基本需求。 但是,像往常一样,最佳实践仅是经验法则。 有时可能有必要偏离一般规则。

Examples and screenshots in this article are created using the AdventureWorks 2014 Enterprise sample OLAP cube, which can be downloaded from Codeplex.

本文中的示例和屏幕截图是使用AdventureWorks 2014 Enterprise示例OLAP多维数据集创建的,可以从Codeplex下载。

数据源和数据源视图 (Data Source and Data Source View)

If possible, use a domain service account to access the data source as it is the most secure option. The service account option can be used as well, if for example the SSAS service is already configured using a domain account. However, it’s a good idea to separate the two, as the data source only needs read access to a specific source and nothing more.

如果可能的话,使用域服务帐户访问数据源是最安全的选择。 如果例如已经使用域帐户配置了SSAS服务,则也可以使用服务帐户选项。 但是,将两者分开是个好主意,因为数据源仅需要对特定源的读取访问权限,而无需其他任何操作。

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

Note you can also change the maximum number of connections to the data source. If your source allows high concurrency and you have many process cores available, you can consider changing this to a higher number to have more parallel processing.

请注意,您还可以更改与数据源的最大连接数。 如果您的源允许高并发性,并且您有许多可用的处理核心,则可以考虑将其更改为更大的数目以进行更多的并行处理。

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

In the data source view, you can already create the relationships between the fact table and the dimensions. This will help you later when building the cube: the dimension usage tab will already be prepopulated. Also create a diagram for each star schema (one fact table + related dimensions). This will declutter the overall view and it will be easier to make changes to the model at a later point in time. For example: you have added a new dimension table to the data source view and you want to create a new relationship. If you have a large model with hundreds of tables, this can be a challenging task. You can create a new diagram by right-clicking the All Tables node in the Diagram Organizer.

在数据源视图中,您已经可以在事实表和维度之间创建关系 。 这将在以后构建多维数据集时为您提供帮助:已经预先填充了“维度用法”选项卡。 还要为每个星形模式创建一个图 (一个事实表+相关维)。 这将使整体视图混乱,并且在以后的时间点更容易对模型进行更改。 例如:您已经在数据源视图中添加了新的维度表,并且想要创建新的关系。 如果您的大型模型具有数百个表,那么这可能是一项艰巨的任务。 您可以通过右键单击图管理器中的“ 所有表”节点来创建新图。

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

Finally, assign user friendly names to the tables. This might seem unnecessary since users don’t see the DSV directly. However, any dimension or measure group is created from a table in the DSV. If they already have decent names, you won’t need to rename them during the creation process. If you forget to change the name of such an object, you can still do it after the creation, but this means the ID of the object (which doesn’t change) and the actual name will differ. This can be annoying when scripting out certain aspects of the cube using XMLA. You can find the FriendlyName property in the properties window of a table in the DSV:

最后,为表分配用户友好名称 。 这似乎是不必要的,因为用户不会直接看到DSV。 但是,任何维或度量值组都是从DSV中的表创建的。 如果它们已经有不错的名称,则在创建过程中无需重命名它们。 如果您忘记更改此类对象的名称,创建后仍可以执行此操作,但这意味着该对象的ID(不会更改)和实际名称会有所不同。 使用XMLA对多维数据集的某些方面进行脚本编写时,这可能会很烦人。 您可以在DSV中表的属性窗口中找到FriendlyName属性:

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

To recap:

回顾一下:

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

The most important rule of the DSV: do not create calculations! Keep calculations in your source – typically a data warehouse – and use the DSV is a layer to define relationships and friendly names. There are a couple of reasons for this:

DSV的最重要规则:请勿创建计算! 将计算结果保存在您的数据源(通常是数据仓库)中,并使用DSV定义关系和友好名称。 造成这种情况的原因有两个:

  • It’s inherently messy to write code in the DSV: there is no Intellisense or parsing of code.

    在DSV中编写代码本质上是凌乱的:没有Intellisense或代码解析。
  • The business logic is hidden. It’s difficult to see if a table is a reference to an existing table/view or a named query. This makes debugging harder. If a measure in the cube doesn’t have the expected value, it takes a while before you end up in the DSV.

    业务逻辑是隐藏的 。 很难看出表是对现有表/视图的引用还是命名查询。 这使调试更加困难。 如果多维数据集中的度量没有期望值,则需要一段时间才能最终进入DSV。
  • The generated SQL doesn’t always guarantee optimal results. When you write your SQL statement inside a view, you have more control over the final SQL statement.

    生成SQL并不总是保证最佳结果。 在视图中编写SQL语句时,您可以更好地控制最终SQL语句。

For these reasons, it is recommended to create any calculations or data type conversions either directly in the ETL that populates the data warehouse or in the form of views on top of the DWH.

由于这些原因,建议直接在填充数据仓库的ETL中或以DWH顶部的视图的形式直接创建任何计算或数据类型转换。

立方体 (Cubes)

一个或多个立方体 (One or more cubes)

Are you going to build one cube containing all your star schemas, or rather multiple cubes where each cube contains exactly one data mart? There is no right or wrong answer here; it depends. The advantage of multiple cubes is easier maintenance and development. The advantage of one large cube is that you can drill across fact tables (in other words, combine multiple measure of multiple fact tables in one visualization). All calculations are defined in the same place, but development of a larger cube might be more difficult, especially when it comes to security. You can also logically split up a large cube using perspectives, which makes things less confusing and overwhelming for end users. Keep in mind this is an Enterprise feature.

您要构建一个包含所有星型架构的多维数据集,还是要构建多个多维数据集,其中每个多维数据集仅包含一个数据集市? 这里没有对与错的答案; 这取决于。 多个多维数据集的优点是易于维护和开发。 一个大型多维数据集的优点在于,您可以在事实表中进行钻取(换句话说,可以在一个可视化文件中组合多个事实表的多种度量)。 所有计算都在同一位置定义,但是开发更大的多维数据集可能会更加困难,尤其是在安全性方面。 您还可以使用透视图在逻辑上拆分一个大的多维数据集,这对于最终用户而言,不会造成混乱和不堪重负。 请记住,这是企业功能。

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

The general rule is as follows: if you want to combine measures of multiple fact tables, you almost have no other choice than to build one large cube. However, if your data marts are truly independent of each other, you can build individual cubes. The last option has the consequence you can only report on one single cube in a Power BI Desktop report (although you can combine multiple visualizations from multiple cubes in a Power BI dashboard).

一般规则如下:如果要合并多个事实表的度量,则除了构建一个大多维数据集外几乎别无选择。 但是,如果您的数据集市真正彼此独立,则可以构建单个多维数据集。 最后一个选项的结果是,您只能在Power BI Desktop报表中报告一个多维数据集(尽管您可以在Power BI仪表板中组合来自多个多维数据集的多个可视化)。

措施 (Measures)

Two main guidelines:

两个主要准则:

  • Less is more. If you don’t need a measure, don’t include it. Having too many measures is not only confusing for end users, but can also take up more caching space.

    少即是多。 如果您不需要采取措施,请不要包括在内。 过多的措施不仅会使最终用户感到困惑,而且还会占用更多的缓存空间。
  • If you can calculate it in advance, please do so. For example, you can calculate currency conversion in the cube itself, but performance will be better if the calculations are already done in advance. The same is true for easy measures like A + B. You can do these in the ETL or in a SQL view. Keep the cube for calculations which are hard to do in SQL (because they depend on filter context), such as year-to-date, ratios and moving averages.

    如果可以提前计算,请这样做。 例如,您可以在多维数据集本身中计算货币换算,但是如果预先进行计算,则性能会更好。 对于诸如A + B的简单度量也是如此。您可以在ETL或SQL视图中执行这些操作。 保留多维数据集用于SQL中难以执行的计算(因为它们取决于过滤器上下文),例如年至今,比率和移动平均值。

Some other tips:

其他一些技巧:

  • Measure expressions tend to be faster than regular calculated measures.

    度量表达式往往比常规计算的度量更快。
  • Try to replace LastNonEmpty measure (which is a semi-additive measure) by LastChild measures (also semi-additive). These are typically faster because they can scan the latest partition if your measure group is partition by time.

    尝试通过LastChild措施(也半加成)取代LastNonEmpty测量(这是一个半添加措施)。 这些通常更快,因为如果度量值组按时间分区,它们可以扫描最新的分区。
  • My personal preference is to have distinct dimensions instead of role-playing dimensions. For example, you can have one date dimension, but add it multiple times as a cube dimension. In the next screenshot, you can see the date dimension has been added twice: one time for Ship dates and one for Delivery Dates.

    我个人的喜好是拥有不同的维度而不是角色扮演的维度。 例如,您可以有一个日期维度,但可以多次将其添加为多维数据集维度。 在下一个屏幕截图中,您可以看到日期维度已添加了两次:一次是发货日期,一次是交货日期。

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

The problem is that all attributes of both dimensions have the same names. If you drag Calendar onto a Pivot Table, it’s hard to see from which dimension it came (depending on the reporting tool it’s easy to track down or not). You can easily solve this by creating multiple views in the source.

问题在于两个维度的所有属性都具有相同的名称。 如果将Calendar拖到数据透视表上,则很难看到它来自哪个维度(取决于报表工具,很容易找到或不容易找到)。 您可以通过在源代码中创建多个视图来轻松解决此问题。

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

分区 (Partitioning)

The concept of partition has been discussed in the following articles:

在以下文章中讨论了分区的概念:

I’ll just summarize the main benefits:

我将总结一下主要好处:

  • Faster processing due to parallelism

    由于并行性,处理速度更快
  • Faster scans of the data due to parallelism

    由于并行性,数据扫描速度更快
  • Optional partition elimination if partitions are set-up correctly and if the query supports it. For example, if you partition your data by month and you only need the data for the last month, only one partition is read which can lead to huge performance savings.

    如果正确设置了分区并且查询支持,则消除可选的分区。 例如,如果按月对数据进行分区,而只需要最后一个月的数据,则仅读取一个分区,这可以节省大量性能。

集合体 (Aggregations)

Another potential query performance improvement is building aggregates. Suppose your data is kept at the daily level. If you build aggregates on the monthly level, this can lead to performance improvements when you report on the monthly or yearly level. For example, if you require data for one month, you can just read the aggregates. If you require data on the year level, you just need to add 12 months. Adding 12 numbers together is much faster than aggregation all the source data.

另一个潜在的查询性能改进是构建聚合。 假设您的数据保持在每日水平。 如果您在月度级别上构建汇总,那么在月度或年度级别上进行报告时可以提高性能。 例如,如果您需要一个月的数据,则可以读取汇总。 如果您需要年级的数据,则只需添加12个月。 将12个数字相加要比汇总所有源数据快得多。

You can build aggregations using a wizard.

您可以使用向导构建聚合。

ssas报表项目数据集_Analysis Services(SSAS)多维设计技巧–数据源视图和多维数据集...

SSAS will use the combination of heuristics and dimensions properties to decide for which dimension attributes aggregations should be build. For more info, check out Aggregations and Aggregation Designs on docs.microsoft.com.

SSAS将结合试探法和维度属性来决定应为哪些维度属性构建聚合。 有关更多信息,请在docs.microsoft.com上查看“ 聚合和聚合设计 ”。

Another option is to use the Usage Based Aggregation Wizard, which will use information from sampled queries to build aggregations specific to solve the needs of those queries. Since these aggregations are build using actual query data, they will likely be more effective than the aggregations build by the previous wizard. Keep in mind user query patterns might change over time, so it’s a good idea to periodically rerun this wizard.

另一个选择是使用基于用法的聚合向导 ,该向导将使用采样查询中的信息来构建特定的聚合来解决这些查询的需求。 由于这些聚合是使用实际的查询数据构建的,因此它们可能比上一个向导构建的聚合更有效。 请记住,用户查询模式可能会随时间而变化,因此,最好定期重新运行此向导。

结论 (Conclusion)

This article discussed some best practices around the data source view and the building of a cube in Analysis Services Multidimensional. Remember best practices are not carved in stone, but are rather general guidelines. To recap:

本文讨论了有关数据源视图和在Analysis Services MultiDimension中构建多维数据集的一些最佳做法。 请记住,最佳做法不是一成不变的,而是一般准则。 回顾一下:

  • Keep your cube user friendly and don’t bloat it with too many dimension attributes and measures. Only build what you need.

    保持多维数据集用户友好,不要过多使用维度属性和度量。 只建立您需要的东西。
  • If you can build functionality (a calculated attribute or a measure) in a previous layer – such as the data warehouse – please do. Try to avoid development in the data source view.

    如果可以在上一层(例如数据仓库)中构建功能(计算所得的属性或度量),请这样做。 尝试避免在数据源视图中进行开发。
  • Partition your cube and build aggregations on those partitions. Revisit those aggregations from time to time.

    对多维数据集进行分区,并在这些分区上构建聚合。 不时重新查看这些聚合。

Next articles in this series

本系列的下一篇文章

参考链接 (Reference Links)

翻译自: https://www.sqlshack.com/analysis-services-ssas-multidimensional-design-tips-data-source-view-and-cubes/

ssas报表项目数据集