面向数据科学家的实用统计学_数据科学家必须了解业务x统计

面向数据科学家的实用统计学

By definition, statistics is a science that deals with the collection, classification, analysis, and interpretation of data. The field is often supported by the usage of probabilistic math theory and is used to assess specific hypotheses.

根据定义,统计学是一门处理数据收集,分类,分析和解释的科学。 该领域通常受到概率数学理论的支持,并用于评估特定的假设。

The definition cannot sound more technical than it is already, and it seems business doesn’t have any to do with it. Moreover, why Data scientists need to know both of these things?

这个定义听起来比现在还没有技术性,而且似乎与业务没有任何关系。 此外,为什么数据科学家需要同时了解这两个方面?

Well, statistics is more than just an advanced math class. It is the way to go for every business to get an edge from their competition. I would argue that a lot of great business leaders have made a business decision not only coming from their gut feeling but also supported by the statistic.

好吧,统计信息不只是高级数学课程。 这是每家企业从竞争中脱颖而出的一种方法。 我认为许多伟大的商业领袖不仅根据他们的直觉做出了一项业务决策,而且还得到了统计数据的支持。

Any data scientist project is a data project to solve any problem the companies have. It doesn’t matter if your advance deep learning model having a 99% precision and could guess every person that passes the room; if it does not solve the business problems, it is useless.

任何数据科学家项目都是用于解决公司存在的任何问题的数据项目。 先进的深度学习模型是否具有99%的精度并能猜出每一个经过房间的人都没关系; 如果不能解决业务问题,那就没用了。

The problem is, how exactly statistic is essential to the business and why we, as Data scientists need to understand both sides of the statistic and business? This is the question you might wonder. Well, let me explain to you a little bit more regarding business and statistics for Data Scientist.

问题是,统计对于业务而言到底有多重要?为什么我们作为数据科学家需要理解统计和业务的两面? 这是您可能想知道的问题。 好吧,让我向您介绍一下有关Data Scientist的业务和统计信息的更多信息。

商业x统计 (Business x Statistic)

Business and Statistic seems like two different worlds that would not merge, but it is significantly integrated. How do you ask? Let me tell you in a few passages below.

商业和统计似乎是两个不同的世界,它们不会合并,但已高度集成。 你怎么问 让我在下面的几段中告诉您。

数据驱动的业务和统计 (Data-Driven Bussiness and Statistic)

You might often hear the term “Data-Driven” in many articles or other data science-related study material. It might say something along the line, such as “This business is data-driven,” or “The decision is based on data, so it is data-driven,” and many more.

您可能经常在许多文章或其他与数据科学相关的研究材料中听到“数据驱动”一词。 它可能会说些类似的话,例如“此业务是由数据驱动的”或“决策是基于数据的,因此它是数据驱动的”,等等。

Here, you might think that using data for your business decision means you are data-driven. Is it true? If by data, it means just looking at the number and executing the decision based on the data snapshot, then it is a no.

在这里,您可能会认为使用数据进行业务决策意味着您受到数据驱动。 是真的吗 如果通过数据表示仅查看编号并根据数据快照执行决策,则为

Data-Driven business is more than that. The company might contain plenty of data, but if the information does not relate to the current business, it is useless.

数据驱动的业务远不止于此。 该公司可能包含大量数据,但是如果该信息与当前业务无关,则无用

For example, company A which has already been established for ten years wants to sell a new product for a new market. They requested their data team to profile a new market segment for their new product based on the data they have, which they claim to have a lot of data. The team then takes a look at their company data and found out by a “lot of data” means that many spreadsheet data that only contain useless attributes such as id, name, email, and phone number.

例如,已经成立十年的公司A希望为新市场销售新产品。 他们要求数据团队根据他们拥有的数据为自己的新产品确定一个新的市场细分,他们声称拥有很多数据。 然后,团队查看他们的公司数据,并通过“大量数据”发现,这意味着许多电子表格数据仅包含无用的属性,例如ID,名称,电子邮件和电话号码。

Above is an example of data that could not solve your business problem, but how about if we have a “probable” dataset to segment the new customers. Let’s say their salary, occupation, preferences, and age. Then this is a point where we need statistics in our business to evaluate the quality of the data and help the company to decide which business strategy to do.

以上是无法解决您的业务问题的数据示例,但是如果我们有一个“可能的”数据集来细分新客户,该如何处理。 假设他们的薪水,职业,偏好和年龄。 这就是我们需要业务统计信息以评估数据质量并帮助公司确定要执行的业务策略的时候

考虑业务统计 (Considering Statistic in Business)

I already explain in brief why statistics is vital in the business, but what kind of statistics specifically crucial in the business. Here, we need back to the core of the business “What is the thing that matters in your business question?”; Is it the sales number, or the profit, or any kind of question you could ask. This is what we called key metrics.

我已经简要地解释了为什么统计在业务中至关重要,但是什么样的统计在业务中特别重要。 在这里,我们需要回到业务的核心“在您的业务问题中最重要的是什么?” ; 是销售数量,利润还是您可能要问的任何问题。 这就是我们所谓的关键指标

For example, company A key metrics is their monthly sales number. In this case, what company A need to address is what kind of analysis they want from their key metrics. Well, The most obvious one is how the monthly sales number throughout the years. Let me give an example data below.

例如,公司A的关键指标是其每月销售额。 在这种情况下,公司A需要解决的是他们想要从其关键指标进行什么样的分析。 好吧,最明显的是多年来的月销售量。 让我在下面给出示例数据。

面向数据科学家的实用统计学_数据科学家必须了解业务x统计
Data example Graph created by Author
数据示例作者创建的图形

Now, with a simple statistic and analysis, we could see that the number of sales is increasing until February, where the sales are dropping each year immensely. In this case, statistics could help by providing the exact percentage of dropping in each year, and from a business perspective, it is worth investigating to determine the drop causes.

现在,通过简单的统计和分析,我们可以看到销售数量一直在增长,直到2月份,每年的销售量都在急剧下降。 在这种情况下,统计数据可以通过提供每年下降的确切百分比来提供帮助,并且从业务角度来看,值得调查以确定下降的原因。

This is how statistic could help the business; it is not just pinpoint the problems that company have and help the business to make a business decision but also understanding what kind of the sales profile that the business has.

统计数据可以如何帮助业务发展; 不仅要确定公司存在的问题并帮助企业做出业务决策,而且要了解企业具有什么样的销售概况。

数据科学家的业务与统计 (Business and Statistic for Data Scientist)

Then, what about Data Scientist? It seems what I explained above is only applied for the business and not for Data Scientist. Well, we might need to define what Data Scientist did most of the time.

那么,数据科学家呢? 我上面解释的内容似乎仅适用于企业,而不适用于数据科学家。 好吧,我们可能需要定义数据科学家在大多数时间里所做的事情。

面向数据科学家的实用统计学_数据科学家必须了解业务x统计
‘Academical’ Data Scientist Graph created by Author
作者创建的“学术”数据科学家图

The above graph is a theoretical activity what Data Scientist do every day. While it is not wrong, the reality in the working environment is way different.

上图是数据科学家每天所做的理论活动。 没错,但工作环境中的实际情况却大不相同。

面向数据科学家的实用统计学_数据科学家必须了解业务x统计
Realistic Data Scientist activity graph created by Author
作者创建的现实数据科学家活动图

In the above graph, we could see that it is not just about cleaning and preparing data, but we need to comply with data compliance and ethic as well as to integrate any of our data projects with business problems.

在上图中,我们可以看到,这不仅涉及清理和准备数据,还需要遵守数据合规性和道德规范,并将我们的任何数据项目与业务问题集成在一起。

Any data scientist project is a data project to solve any problem the companies have. It doesn’t matter if your advance deep learning model having a 99% precision and could guess every person that passes the room; if it does not solve the business problems, it is useless.

任何数据科学家项目都是用于解决公司存在的任何问题的数据项目。 先进的深度学习模型是否具有99%的精度并能猜出每一个经过房间的人都没关系; 如果不能解决业务问题,那就没用了

Every data scientist needs to understand what kind of business your company is working on and what business problem you try to solve. It is unavoidable when you are working on the company to interacting with other departments.

每个数据科学家都需要了解您的公司正在从事哪种业务以及您要解决什么业务问题。 在公司上与其他部门进行交互时,这是不可避免的。

For example, the sales department wants to increase the number of sales. To do this, the sales team ask the data science team to create a new customer prediction model. You might think to just pull the data and train it to any machine learning model, right?

例如,销售部门希望增加销售数量。 为此,销售团队请数据科学团队创建新的客户预测模型。 您可能会认为只需提取数据并将其训练为任何机器学习模型,对吗?

No, it is often not the case. What you need to do first is to convince the sales department would the project is viable or not and set a reasonable target. This is why data scientists need to understand the Business and the Statistic side as well.

不,通常不是这样。 您首先需要做的是说服销售部门该项目是否可行并设定合理的目标。 这就是为什么数据科学家还需要了解业务和统计方面的原因。

To determine if you could execute the data project, you need useful data. In this case, you need a statistic to evaluate the quality of your data.

为了确定是否可以执行数据项目,您需要有用的数据。 在这种情况下, 您需要一个统计数据来评估数据的质量

Also, often time, people who are not working with the data would set up an unreasonable target. For example, the sales department wants to increase the sales number by 100% next month. To prove if this target is reasonable or not, you need to evaluate it from your current data. Simple trend analysis and estimation would do the trick, but you can do this only if you understand the statistic and the business.

而且,通常情况下,不使用数据的人会设定不合理的目标。 例如,销售部门希望下个月将销售数量增加100%。 为了证明该目标是否合理,您需要根据当前数据进行评估。 简单的趋势分析和估计就可以解决问题,但是只有在了解统计信息和业务后才能执行此操作。

You might say, “Isn’t the machine learning model are created to improve the sales? If so, the estimation would be useless”. Well, the point of having a machine learning model is, of course, to solve the business problem, like increasing the sales number. As much as it is true, you still need to keep your target under a reasonable number.

您可能会说:“不是建立机器学习模型来提高销售量吗? 如果是这样,估计将毫无用处。” 嗯,拥有机器学习模型的重点当然是要解决业务问题,例如增加销售数量。 尽管确实如此,但您仍然需要将目标保持在合理的数量范围内。

There would bound a problem that your machine learning model would not foresee; for example, the resignation of the salesman, the department restructuring, distribution accident, and many more. It is great to have ambition but try to deliver your promised target in a reasonable realm.

这就存在一个问题,您的机器学习模型将无法预见。 例如,业务员辞职,部门重组,分销事故等等。 有雄心壮志,但是要在合理的范围内实现您所承诺的目标,这很棒。

Selecting an appropriate Key Metric could also fall to the Data Scientist’s responsibility, especially when the company is new to the data-driven business. In this case, business and statistics would become your best friend.

选择适当的关键指标也可能属于数据科学家的责任,尤其是在公司刚开始从事数据驱动业务时。 在这种情况下,业务和统计将成为您最好的朋友。

结论 (Conclusion)

Data scientists cannot work without knowing both the Business and the Statistic. Both of these aspects are the data scientist working gear, and knowing these could give you leverage to become a better data scientist.

数据科学家无法不了解业务和统计信息就无法工作。 这两个方面都是数据科学家的工作工具,而了解这些方面可以使您有能力成为更好的数据科学家。

As a data scientist, you need to understand the business and the statistic aspect because:

作为数据科学家,您需要了解业务和统计方面,因为:

  1. Data project is not just cleaning data and creating a model, but it involves solving the business problem,

    数据项目不仅是清理数据和创建模型,还涉及解决业务问题,
  2. Evaluating whether the data project is viable or not to solve the business problem require statistic,

    评估数据项目是否可行以解决业务问题需要统计,
  3. The statistic also required to have a reasonable target for your data project, because your machine learning model could not foresee everything, and

    该统计还需要为您的数据项目确定一个合理的目标,因为您的机器学习模型无法预见一切,并且
  4. Business and statistic is a knowledge you could use to improve your position in the company or when you are applying for the data scientist position.

    业务和统计信息是您可以用来提高公司职位或申请数据科学家职位时的知识。

I hope it helps!

希望对您有所帮助!

翻译自: https://towardsdatascience.com/data-scientist-must-know-business-x-statistics-7bb8575a9525

面向数据科学家的实用统计学