Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂


转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/junjun_zhao/article/details/79146948


1.4 Satisficing and optimizing metrics (满足和优化指标 )

(字幕来源:网易云课堂)

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

It’s not always easy to combine all the things you care about into a single real number evaluation metric.In those cases I’ve found it sometimes useful to set up satisficing as well as optimizing metrics.Let me show you what I mean.Let’s say that you’ve decided you care about the classification accuracy of your cat’s classifier,this could have been F1 score or some other measure of accuracy,but let’s say that in addition to accuracy you also care about the running time.So how long it takes to classify an image and classifier A takes 80 milliseconds,B takes 95 milliseconds,and C takes 1,500 milliseconds,that’s 1.5 seconds to classify an image.So one thing you could do is combine accuracy and running time into an overall evaluation metric.And so the costs such as maybe the overall cost is accuracy minus 0.5 times running time.But maybe it seems a bit artificial to combine accuracy and running time using a formula like this,like a linear weighted sum of these two things.So here’s something else you could do instead which is that you might want to choose a classifier that maximizes accuracy but subject to, that the running time,that is the time it takes to classify an image,that that has to be less than or equal to 100 milliseconds.

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

要把你顾及到的所有事情,组合成单实数评估指标 有时并不容易,在那些情况里 我发现有时候设立满足和优化指标是很重要有用的,让我告诉你是什么意思吧,假设你已经决定你很看重,猫分类器的分类准确度,这可以是 F1 分数 或者用其他衡量准确度的指标,但除了准确度之外 我们还需要考虑运行时间,就是需要多长时间来分类一张图,分类器 A 需要 80 毫秒,B 需要 95 毫秒,C 需要 1500 毫秒,就是说需要 1.5 秒来分类图像,你可以这么做,将准确度,和运行时间组合成一个整体评估指标,所以成本 比如说 总体成本是准确度减去 0.5 乘以运行时间,这种组合方式可能太刻意,只用这样的公式来组合准确度和运行时间,两个数值的线性加权求和,你还可以做其他事情,就是你可能选择一个分类器,能够最大限度提高准确度 但必须满足运行时间要求,就是对图像进行分类所需的时间,必须小于等于 100 毫秒。

So in this case we would say that accuracy is an optimizing metric because you want to maximize accuracy.You want to do as well as possible on accuracy but that running time is what we call a satisficing metric.Meaning that it just has to be good enough,it just needs to be less than 100 milliseconds and beyond that you don’t really care,or at least you don’t care that much.So this will be a pretty reasonable way to trade off or to put together accuracy as well as running time.And it may be the case that so long as the running time is less that 100 milliseconds,your users won’t care that much whether it’s 100 milliseconds or 50 milliseconds or even faster.

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

所以在这种情况下 我们就说准确度是一个优化指标,因为你想要准确度最大化,你想做的尽可能准确,但是运行时间 就是我们所说的满足指标,意思是它必须足够好,它只需要小于 100 毫秒,达到之后 你不在乎这指标有多好,或者至少你不会那么在乎,所以这是一个相当合理的权衡方式,或者说将准确度和运行时间结合起来的方式,实际情况可能是 只要运行时间少于 100 毫秒,你的用户就不会在乎,运行时间是 100 毫秒还是 50 毫秒 甚至更快。

And by defining optimizing as well as satisficing metrics,this gives you a clear way to pick the, quote, best classifier,which in this case would be classifier B because of all the ones with a running time better than 100 milliseconds it has the best accuracy.So more generally, if you have N metrics that you care aboutit’s sometimes reasonable to pick one of them to be optimizing.So you want to do as well as is possible on that one.And then N minus 1 to be satisficing,meaning that so long as they reach some threshold such as running times faster than 100 milliseconds,but so long as they reach some threshold,you don’t care how much better it is in that threshold,but they have to reach that threshold.

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

通过定义优化和满足指标,就可以给你提供一个明确的方式 去选择“最好的”分类器,在这种情况下 分类器B最好,因为在所有的运行时间都小于 100 毫秒的分类器中,它的准确度最好,所以更一般地说 如果你要考虑 N 个指标,有时候选择其中一个指标做为优化指标是合理的,所以你想尽量优化那个指标,然后剩下 N-1个指标都是满足指标,意味着只要它们达到一定阈值,例如运行时间快于 100 毫秒,但只要达到一定的阈值,你不在乎它超过那个门槛之后的表现,但它们必须达到这个门槛。

Here’s another example.Let’s say you’re building a system to detect wake words,also called trigger words.So this refers to the voice control devices like the Amazon Echo where you wake up by saying Alexaor some Google devices which you wake up by saying okay Google or some Apple devices which you wake up by saying Hey Sirior some Baidu devices we should wake up by saying you ni hao Baidu.Oh I guess, you want to read the Chinese, that’s ni hao Baidu.Right, so these are the wake words you use to tell one of these voice control devicesto wake up and listen to something you want to say.And for these other Chinese characters for ni hao Baidu.

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

这里是另一个例子,假设你正在构建一个系统来检测唤醒语,也叫触发词,这指的是语音控制设备,比如亚马逊 Echo 你会说 Alexa,或者用 Okay Google 来唤醒谷歌设备,或者对于苹果设备 你会说 Hey Siri,或者对于某些百度设备 我们用你好百度唤醒,如果你想知道中文怎么写的话 就是你好百度,对的 这些就是唤醒词,可以唤醒这些语音控制设备,然后监听你想说的话,这些是你好百度的汉字。

So you might care about the accuracy of your trigger word detection system.So when someone says one of these trigger words,how likely are you to actually wake up your device,and you might also care about the number of false positives.So when no one actually said this trigger word,how often does it randomly wake up? So in this case maybe one reasonable way of combining these two evaluation metrics might be to maximize accuracy,so when someone says one of the trigger words,maximize the chance that your device wakes up.And subject to that,you have at most one false positive every 24 hours of operation, right? So that your device randomly wakes up only once per day on average when no one is actually talking to it.So in this case accuracy is the optimizing metric and a number of false positives every 24 hours is the satisficing metric where you’d be satisfied so long asthere is at most one false positive every 24 hours.

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

所以你可能会在乎触发字检测系统的准确性,所以当有人说出其中一个触发词时,有多大概率可以唤醒你的设备,你可能也需要顾及假阳性的数量就是没有人在说这个触发词时,它被随机唤醒的概率有多大?所以这种情况下 组合这两种评估指标的合理方式,可能是最大化精确度,所以当某人说出唤醒词时,你的设备被唤醒的概率最大化,然后必须满足,24 小时内最多只能有 1 次假阳性 对吧?,所以你的设备平均每天只会,没有人真的在说话时随机唤醒一次,所以在这种情况下 准确度是优化指标,然后每 24 小时发生一次假阳性,是满足指标 你只要,每 24 小时最多有一次假阳性 就满足了。

To summarize, if there are multiple things you care about by say there’s one as the optimizing metric that you want to do as well as possible on and one or more as satisficing metrics were you’ll be satisfice.Almost it does better than some thresholdyou can now have an almost automatic way of quickly looking at multiple cost size and picking the, quote, best one.Now these evaluation metrics must be evaluated or calculated on a training set or a development set or maybe on the test set.So one of the things you also need to do is set up training,dev or development, as well as test sets.In the next video, I want to share with you some guide lines for how to set up training, dev, and test sets.So let’s go on to the next.

总结一下 如果你需要顾及多个指标,比如说 有一个优化指标,你想尽可能优化的,然后还有一个或多个满足指标 需要满足的,需要达到一定的门槛,现在你就有一个全自动的方法,在观察多个成本大小时 选出”最好的”那个,现在这些评估指标必须是,在训练集或开发集或测试集上计算或求出来的,所以你还需要做一件事 就是设立训练集,开发集 还有测试集,在下一个视频里 我想和大家分享一些,如何设置训练 开发和测试集的指导方针,我们下一个视频继续。


重点总结:

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

满足和优化指标

假设有三个不同的分类器性能表现如下:

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

又时对于某一问题,对模型的效果有一定的要求,如要求模型准确率尽可能的高,运行时间在 100 ms以内。这里以 Accuracy 为优化指标,以Running time为满足指标,我们可以从中选出 B 是满足条件的最好的分类器。

一般的,如果要考虑 N 个指标,则选择一个指标为优化指标,其他 N-1个指标都是满足指标:

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标

参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(3-1)– 机器学习策略(1)


PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

Coursera | Andrew Ng (03-week1-1.4)—满足和优化指标