Andrew Ng Machine Learning 第十一周

前言

网易云课堂(双语字幕,不卡):https://study.163.com/course/courseMain.htm?courseId=1004570029
Coursera:https://www.coursera.org/learn/machine-learning
本人初学者,先在网易云课堂上看网课,再去Coursera上做作业,开博客以记录,文章中引用图片皆为课程中所截。

 为了人工智能!
 为了人工智能!为了人工智能!
 为了人工智能!为了人工智能!为了人工智能!

应用举例:照片OCR

1.问题描述与 OCR pipeline

Andrew Ng Machine Learning 第十一周
Andrew Ng Machine Learning 第十一周

2.滑动窗口

Tips:先用行人侦测来举例,因为相比于文件侦测,行人的身高和身宽比例比较统一
Andrew Ng Machine Learning 第十一周
Tips:人为标记图片作为训练集
Andrew Ng Machine Learning 第十一周
Tips:不断变化像素框大小,一行行横向移动搜寻是否有满足训练集训练出来的行人结果
Andrew Ng Machine Learning 第十一周
Tips:之后将会由不同大小的框将人所在位置标记出
Andrew Ng Machine Learning 第十一周
Tips:同理将一些文字训练集人为地标记出1(即为文字)
Andrew Ng Machine Learning 第十一周
Tips:然后按照黑白方式高亮标出可能是文字的地方,之后用放大算子放大成矩形,可以理解成某像素若周围存在像素,则共同组成一个白色矩形,然后对每个矩形判断它是否满足是文字的情况(文字应该是长大于宽的窄条形)将不科学的舍去。
Andrew Ng Machine Learning 第十一周
Tips:将每个有可能是文字的矩形中,还是用滑动窗口不断右移的方法继续判断两个文字的间隔(不是寻找文字。而是把文字分离)
Andrew Ng Machine Learning 第十一周
Tips:最后就是文字识别器了

3.获取大量数据和人工数据

Andrew Ng Machine Learning 第十一周Tips:人造文字数据的方法可以用不同的文字处理软件来将不同的文字格式和背景构成新的训练集
Andrew Ng Machine Learning 第十一周
Tips:或者将一个文字化成很多的方格,将不同方格扭曲变形,这也是新的训练集
Andrew Ng Machine Learning 第十一周
Tips:声音训练集的复杂方法就是用声音合成软件将声音植入一些嘈杂背景音
Andrew Ng Machine Learning 第十一周
Tips:这种方法的前提是要构造一个高方差的训练集,这样就会使样本增加而更好,参见第六周

4.天花板分析:下一步工作的 pipeline

Andrew Ng Machine Learning 第十一周
Tips:简单来说就是先记录整个系统的准确率,然后用人工处理将第一步确认成100%正确率,后面的正常执行,计算整个系统的准确率,然后人工处理第二步,往后类推,看哪步进化的正确率最高来优化哪步。

题目

1.Question 1

Suppose you are running a sliding window detector to find

text in images. Your input images are 1000x1000 pixels. You

will run your sliding windows detector at two scales, 10x10

and 20x20 (i.e., you will run your classifier on lots of 10x10

patches to decide if they contain text or not; and also on

lots of 20x20 patches), and you will “step” your detector by 2

pixels each time. About how many times will you end up

running your classifier on a single 1000x1000 test set image?
Andrew Ng Machine Learning 第十一周
解答:D
((1000x1000)/(2x2)=250000
250000*2=500000)

2.Question 2

Suppose that you just joined a product team that has been

developing a machine learning application, using m = 1,000
training examples. You discover that you have the option of

hiring additional personnel to help collect and label data.

You estimate that you would have to pay each of the labellers

$10 per hour, and that each labeller can label 4 examples per

minute. About how much will it cost to hire labellers to

label 10,000 new training examples?
Andrew Ng Machine Learning 第十一周
解答:A
(1分钟4个lable, 则1小时4x60=240个lable,10,000个lable, 需要10000/240=41.66小时,1小时$10,则42小时花$420)

3.Question 3

What are the benefits of performing a ceiling analysis? Check all that apply.
Andrew Ng Machine Learning 第十一周
解答:AB

4.Question 4

Suppose you are building an object classifier, that takes as input an image, and recognizes that image as either containing a car (y=1) or not (y=0). For example, here are a positive example and a negative example:
Andrew Ng Machine Learning 第十一周
After carefully analyzing the performance of your algorithm, you conclude that you need more positive (y=1y=1) training examples. Which of the following might be a good way to get additional positive examples?
Andrew Ng Machine Learning 第十一周
解答:A

5.Question 5

Suppose you have a PhotoOCR system, where you have the following pipeline:
Andrew Ng Machine Learning 第十一周
You have decided to perform a ceiling analysis on this system, and find the following:
Andrew Ng Machine Learning 第十一周
Which of the following statements are true?
Andrew Ng Machine Learning 第十一周
解答:AB