1、 box-header结构

MPEG-4由一系列box组成，定位后续box的位置需要依赖之前box-header中的信息。

box-header结构

{

box_size, uint32类型长度4字节

boxtype, uint32类型长度4字节

largesize, 可选内容，不关心

usertype, 可选内容，不关心

}

2、获取moov box

· 对一个完整视频文件，获取第一个4字节为box长度，获取第二个4字节为box类型，检查该box类型是否为ftpy，若不是，此视频文件不是MPEG-4格式。

· 根据ftpy中box_size，range请求下一个box-header中前8字节信息，检查boxtype是否为moov，若不是，循环此步，直到文件结束或找到moov类型box。

· 根据moov box的box size请求完整的moov box。

· 获取mvhd box（moov box header），如下所示为mvhd结构：

字段	字节数	意义
box size	4	box大小
box type	4	box类型
version	1	box版本，0或1，一般为0。（以下字节数均按version=0）
flags	3
creation time	4	创建时间（相对于UTC时间1904-01-01零点的秒数）
modification time	4	修改时间
time scale	4	文件媒体在1秒时间内的刻度值，可以理解为1秒长度的时间单元数
duration	4	该track的时间长度，用duration和time scale值可以计算track时长，比如audio track的time scale = 8000, duration = 560128，时长为70.016，video track的time scale = 600, duration = 42000，时长为70
rate	4	推荐播放速率，高16位和低16位分别为小数点整数部分和小数部分，即[16.16] 格式，该值为1.0（0x00010000）表示正常前向播放
volume	2	与rate类似，[8.8] 格式，1.0（0x0100）表示最大音量
reserved	10	保留位
matrix	36	视频变换矩阵
pre-defined	24
next track id	4	下一个track使用的id号

· 首先需要根据version不同，确定结构体字段长度。取time_scale字段，第N秒在媒体时间坐标系统的位置为N*time_scale

· 在moov box中，遍历所有trak box (不确定哪些trak box是需要的), 查找其中mdia box -> minf box -> stbl box。

· 查找stbl box 中 stts box，stts box结构如下：

字段	长度(字节)	描述
size	4	这个atom的字节数
type	4	stts
version	1	这个atom的版本
flag	3	这里为0
count	4	time-to-sample的数目
time-to-sample		Media中每个sample的duration。包含如下结构
Sample count	4	有相同duration的连续sample的数目
Sample duration	4	每个sample的duration

· 第n个sample显示dt(n)的时间满足下面的关系，以此得到该时间对应的sample序号

dt(n) = SUM（forj=0 to i-1 of sample_delta（j））

· 查找stbl box 中 stsc box，结构如下

字段	长度(字节)	描述
尺寸	4	这个atom的字节数
类型	4	stsc
版本	1	这个atom的版本
标志	3	这里为0
条目数目	4	sample-to-chunk的数目
sample-to-chunk		sample-to-chunk表的结构
First chunk	4	这个table使用的第一个chunk序号
Samples per chunk	4	当前trunk内的sample数目
Sample description ID	4	与这些sample关联的sample description的序号

· 第n个chunk包含的sample序号sp(n)满足以下关系：

sp(n+1) = sp(n) + [first_chunk(n+1)-first_chunk(n)]*samples_per_chunk(n)

· 可以由此确定上面得到的sample序号所属的chunk序号以及位于该chunk中的第几个sample

· 查找stbl box中 stco box，结构如下

· 通过上面得出的chunk序号，得到对应chunk offset即该chunk对于整个文件的偏移。

· 查找stbl box中 stsz box, 结构如下

· 通过上面得到的该sample位于chunk中的序号，找到该chunk中所有之前的sample的大小之和作为该sample在chunk中的偏移sample offset。

· 最终得到该sample的相对整个文件的偏移chunk offset + sample offset以及该sample对应的大小。根据这个大小和偏移请求指定sample数据。

· 得到某个sample数据怎么转换成一张图片，是否需要使用关键帧。

· 整个过程忽略了一些严格的格式检查

· 截取n张图最少将会发送n+2次http请求，第一次获取文件头，第二次获取moov（最少需要一次请求），后续n次，每次请求一帧sample数据。