音频播放 via FFmpeg
音频播放 via FFmpeg
这里所说的播放是指通过读取声音文件数据然后经过解码输出到扬声器进行播放的过程。
Windows 上有如下几种常见的实现方式:
- Waveform API
- FFmpeg
- DirectShow
- Media Foundation
FFmpeg 简介
FFmpeg 是一套可以用来记录、转换数字音频、视频,并能将其转化为流的开源计算机程序。采用 LGPL 或 GPL 许可证。它提供了录制、转换以及流化音视频的完整解决方案。它包含了非常先进的音频 / 视频编解码库 libavcodec,为了保证高可移植性和编解码质量,libavcodec 里很多 code 都是从头开发的。
FFmpeg 在 Linux 平台下开发,但它同样也可以在其它操作系统环境中编译运行,包括 Windows、Mac OS X 等。这个项目最早由 Fabrice Bellard 发起,2004 年至 2015 年间由 Michael Niedermayer 主要负责维护。许多 FFmpeg 的开发人员都来自 MPlayer 项目,而且当前 FFmpeg 也是放在 MPlayer 项目组的服务器上。项目的名称来自 MPEG 视频编码标准,前面的 “FF” 代表 “Fast Forward”。
FFmpeg 命令行播放音频
FFmpeg 提供了现成的程序 ffplay 用命令行的方式对音频进行播放。
>ffplay.exe "D:\You're Beautiful.mp3"
ffplay version 3.3.3 Copyright (c) 2003-2017 the FFmpeg developers
Input #0, mp3, from 'D:\You're Beautiful.mp3':
Metadata:
… …
artist : James Blunt
title : You're Beautiful
date : 2006
Duration: 00:03:24.30, start: 0.025057, bitrate: 192 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 192 kb/s
Metadata:
encoder : LAME3.96r
10.58 M-A: 0.000 fd=0 aq=27KB vq=0KB sq=0B f=0/0 ← 此处是正在播放的音频信息,会实时变化
FFmpeg + SDL 播放音频
SDL(Simple DirectMedia Layer)是一套开放源代码的跨平台多媒体开发库,使用 C 语言写成。SDL 提供了数种控制图像、声音、输入的函数,让开发者只要用相同或是相似的代码就可以开发出跨多个平台(Linux、Windows、Mac OS X 等)的应用软件。目前 SDL 多用于开发游戏、模拟器、媒体播放器等多媒体应用领域。
我们将利用 FFmpeg 进行解码,然后用 SDL 进行播放。
在播放前,可以在控制台输入 ffmpeg -formats 查看支持的音视频格式(muxers / demuxers):
> ffmpeg -formats
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 8.2.1 (GCC) 20181017
libavformat 58. 20.100 / 58. 20.100
... ...
File formats:
D. = Demuxing supported
.E = Muxing supported
--
D 3dostr 3DO STR
E 3g2 3GP2 (3GPP2 file format)
E 3gp 3GP (3GPP file format)
D 4xm 4X Technologies
E a64 a64 - video for Commodore 64
D aa Audible AA format files
D aac raw ADTS AAC (Advanced Audio Coding)
DE ac3 raw AC-3
D acm Interplay ACM
D act ACT Voice file format
D adf Artworx Data Format
D adp ADP
D ads Sony PS2 ADS
E adts ADTS AAC (Advanced Audio Coding)
DE adx CRI ADX
... ...
输入ffmpeg -codecs 可以查看支持的编解码器:
> ffmpeg.exe -codecs
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 8.2.1 (GCC) 20181017
libavcodec 58. 35.100 / 58. 35.100
... ...
Codecs:
D..... = Decoding supported
.E.... = Encoding supported
..V... = Video codec
..A... = Audio codec
..S... = Subtitle codec
...I.. = Intra frame-only codec
....L. = Lossy compression
.....S = Lossless compression
-------
D.VI.S 012v Uncompressed 4:2:2 10-bit
D.V.L. 4xm 4X Movie
D.VI.S 8bps QuickTime 8BPS video
.EVIL. a64_multi Multicolor charset for Commodore 64 (encoders: a64multi )
.EVIL. a64_multi5 Multicolor charset for Commodore 64, extended with 5th color (colram) (encoders: a64multi5 )
D.V..S aasc Autodesk RLE
D.VIL. aic Apple Intermediate Codec
DEVI.S alias_pix Alias/Wavefront PIX image
DEVIL. amv AMV Video
D.V.L. anm Deluxe Paint Animation
D.V.L. ansi ASCII/ANSI art
DEV..S apng APNG (Animated Portable Network Graphics) image
DEVIL. asv1 ASUS V1
DEVIL. asv2 ASUS V2
... ...
播放流程
播放代码
音频播放需要匹配解码后的音频格式和 Render 的音频格式,这个过程就是重采样(resampling),在 采集音频 via FFmpeg 这篇博文中有做过简单介绍。在 FFmpeg中,有两种方法进行重采样:
- 使用 resampler (SwrContext)
- 使用 filter (AVFilterGraph)
*盗用 DShow 中的 Filter Graph * ╭∩╮(︶︿︶)╭∩╮
本文中的代码基于 FFmpeg 4.1。
- 以下是使用 resampler 进行播放的概要代码,略去各个函数的具体实现和资源释放
audio_stream_idx = open_input_file(audio_file, AVMEDIA_TYPE_AUDIO, &fmt_ctx, &dec_ctx);
hr = init_resampler(dec_ctx, OUT_CHANNELS, OUT_SAMPLE_FMT, OUT_SAMPLE_RATE, &resample_ctx);
hr = init_fifo(&g_fifo, OUT_SAMPLE_FMT, OUT_CHANNELS);
hr = sdl_helper.init(sdl_fill_audio);
hr = sdl_helper.run();
g_out_audio_info.channels = OUT_CHANNELS;
g_out_audio_info.channel_layout = av_get_default_channel_layout(OUT_CHANNELS);
g_out_audio_info.sample_format = OUT_SAMPLE_FMT;
g_out_audio_info.sample_rate = OUT_SAMPLE_RATE;
g_out_audio_info.frame_size = OUT_FRAME_SIZE;
while (_kbhit() == 0) {
hr = audio_process(fmt_ctx, dec_ctx, &g_out_audio_info, g_fifo,
resample_ctx, audio_stream_idx, &g_finished, NULL, on_audio_proc);
if (g_finished)
break;
}
- 以下是使用 filter 进行播放的概要代码,略去各个函数的具体实现和资源释放
audio_stream_idx = open_input_file(audio_file, AVMEDIA_TYPE_AUDIO, &fmt_ctx, &dec_ctx);
sprintf_s(filter_descr, "aresample=%d,aformat=sample_fmts=s16:channel_layouts=%s",
OUT_SAMPLE_RATE, OUT_CHANNELS == 1 ? "mono" : "stereo");
hr = aud_filter.init(dec_ctx, fmt_ctx->streams[audio_stream_idx]->time_base, filter_descr);
hr = init_fifo(&g_fifo, OUT_SAMPLE_FMT, OUT_CHANNELS);
hr = sdl_helper.init(sdl_fill_audio);
hr = sdl_helper.run();
g_out_audio_info.channels = OUT_CHANNELS;
g_out_audio_info.channel_layout = av_get_default_channel_layout(OUT_CHANNELS);
g_out_audio_info.sample_format = OUT_SAMPLE_FMT;
g_out_audio_info.sample_rate = OUT_SAMPLE_RATE;
g_out_audio_info.frame_size = OUT_FRAME_SIZE;
while (_kbhit() == 0) {
std::vector<AVFrame*> frames;
hr = decode_av_frame(fmt_ctx, dec_ctx, audio_stream_idx, frames, &g_finished);
if (SUCCEEDED(hr)) {
for (size_t i = 0; i < frames.size(); ++i)
hr = aud_filter.do_filter(frames[i], on_audio_filtered);
}
if (g_finished)
break;
}
open_input_file 函数
打开输入文件并初始化解码器。
int open_input_file(
const char *file_name,
AVMediaType stream_type,
AVFormatContext **in_fmt_ctx,
AVCodecContext **dec_ctx)
{
AVCodecContext *avctx = NULL;
AVCodec *decoder = NULL;
int hr = avformat_open_input(in_fmt_ctx, file_name, NULL, NULL);
hr = avformat_find_stream_info(*in_fmt_ctx, NULL);
int stream_index = -1;
for (unsigned int i = 0; i < (*in_fmt_ctx)->nb_streams; ++i) {
if ((*in_fmt_ctx)->streams[i]->codecpar->codec_type == stream_type) {
stream_index = i;
break;
}
}
AVStream* stream = (*in_fmt_ctx)->streams[stream_index];
decoder = avcodec_find_decoder(stream->codecpar->codec_id);
avctx = avcodec_alloc_context3(decoder);
hr = avcodec_parameters_to_context(avctx, stream->codecpar);
if (stream_type == AVMEDIA_TYPE_VIDEO)
avctx->framerate = av_guess_frame_rate(*in_fmt_ctx, stream, NULL);
hr = avcodec_open2(avctx, decoder, NULL);
*dec_ctx = avctx;
return stream_index;
}
Audio_filter::init 函数
初始化并连接 filter graph。
int Audio_filter::init(AVCodecContext *dec_ctx, AVRational stream_time_base, const char *filters_descr)
{
const AVFilter *abuffersrc = avfilter_get_by_name("abuffer");
const AVFilter *abuffersink = avfilter_get_by_name("abuffersink");
const enum AVSampleFormat out_sample_fmts[] = { AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_NONE };
const int64_t out_channel_layouts[] = { AV_CH_LAYOUT_MONO, AV_CH_LAYOUT_STEREO, -1 };
const int out_sample_rates[] = { 16000, 22050, 44100, 48000, -1 };
AVFilterInOut *outputs = avfilter_inout_alloc();
AVFilterInOut *inputs = avfilter_inout_alloc();
m_filter_graph = avfilter_graph_alloc();
if (!dec_ctx->channel_layout)
dec_ctx->channel_layout = av_get_default_channel_layout(dec_ctx->channels);
char args[512];
_snprintf_s(args, sizeof(args), "time_base=%d/%d:sample_rate=%d:sample_fmt=%s:channel_layout=0x%"PRIx64,
stream_time_base.num, stream_time_base.den, dec_ctx->sample_rate,
av_get_sample_fmt_name(dec_ctx->sample_fmt), dec_ctx->channel_layout);
hr = avfilter_graph_create_filter(&m_buffersrc_ctx, abuffersrc, "in", args, NULL, m_filter_graph);
hr = avfilter_graph_create_filter(&m_buffersink_ctx, abuffersink, "out", NULL, NULL, m_filter_graph);
hr = av_opt_set_int_list(m_buffersink_ctx, "sample_fmts", out_sample_fmts, -1, AV_OPT_SEARCH_CHILDREN);
hr = av_opt_set_int_list(m_buffersink_ctx, "channel_layouts", out_channel_layouts, -1, AV_OPT_SEARCH_CHILDREN);
hr = av_opt_set_int_list(m_buffersink_ctx, "sample_rates", out_sample_rates, -1, AV_OPT_SEARCH_CHILDREN);
outputs->name = av_strdup("in");
outputs->filter_ctx = m_buffersrc_ctx;
outputs->pad_idx = 0;
outputs->next = NULL;
inputs->name = av_strdup("out");
inputs->filter_ctx = m_buffersink_ctx;
inputs->pad_idx = 0;
inputs->next = NULL;
hr = avfilter_graph_parse_ptr(m_filter_graph, filters_descr, &inputs, &outputs, NULL);
hr = avfilter_graph_config(m_filter_graph, NULL);
m_filt_frame = av_frame_alloc();
hr = 0;
RESOURCE_FREE:
avfilter_inout_free(&inputs);
avfilter_inout_free(&outputs);
return hr;
}
SDL_audio_helper::init & run 函数
主要是初始化基本的音频参数,如:采样率,格式,声道,帧大小等。
int SDL_audio_helper::init(SDL_AudioCallback get_aud_frame_callback)
{
int hr = -1;
hr = SDL_Init(SDL_INIT_AUDIO | SDL_INIT_TIMER);
RETURN_IF_FAILED(hr);
SDL_AudioSpec wanted_spec = {0};
wanted_spec.freq = OUT_SAMPLE_RATE;
wanted_spec.format = AUDIO_S16SYS;
wanted_spec.channels = OUT_CHANNELS;
wanted_spec.samples = OUT_FRAME_SIZE;
wanted_spec.callback = get_aud_frame_callback;
wanted_spec.userdata = NULL;
hr = SDL_OpenAudio(&wanted_spec, NULL);
RETURN_IF_FAILED(hr);
return 0;
}
int SDL_audio_helper::run()
{
SDL_PauseAudio(0); // start audio playback
return 0;
}
audio_process 函数
前方参数密集,请小心。
typedef int (*pf_proc_audio_callback)(void* context, AVFrame *frame);
static int audio_process(
AVFormatContext* in_fmt_ctx,
AVCodecContext* dec_ctx,
audio_base_info* out_audio_info,
AVAudioFifo* fifo,
SwrContext* resample_ctx,
int audio_stream_index,
int* finished,
void* callback_ctx,
pf_proc_audio_callback callback_proc)
{
int hr = -1;
hr = decode_a_frame(in_fmt_ctx, dec_ctx, out_audio_info, fifo, resample_ctx, audio_stream_index, finished);
RETURN_IF_FAILED(hr);
while ((av_audio_fifo_size(fifo) >= out_audio_info->frame_size)
|| (*finished && av_audio_fifo_size(fifo) > 0))
{
AVFrame *output_frame = NULL;
hr = read_samples_from_fifo(fifo, out_audio_info, &output_frame);
RETURN_IF_FAILED(hr);
hr = callback_proc(callback_ctx, output_frame);
av_frame_free(&output_frame);
RETURN_IF_FAILED(hr);
}
return 0;
}
decode_a_frame 函数
请看 这里,一模一样。
Audio_filter::do_filter 函数
本文中的 filter 其实就是做了重采样的工作,不过下面的函数通用所有 filter。
typedef int (*pf_filter_callback)(AVFrame *frame);
int Audio_filter::do_filter(AVFrame* frame, pf_filter_callback callback_proc)
{
RETURN_IF_NULL(frame);
int hr = -1;
/* push the audio data from decoded frame into the filtergraph */
hr = av_buffersrc_add_frame_flags(m_buffersrc_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);
RETURN_IF_FAILED(hr);
/* pull filtered audio from the filtergraph */
while (true) {
hr = av_buffersink_get_frame(m_buffersink_ctx, m_filt_frame);
if (hr == AVERROR(EAGAIN) || hr == AVERROR_EOF)
break;
RETURN_IF_FAILED(hr);
if (NULL != callback_proc)
callback_proc(m_filt_frame);
av_frame_unref(m_filt_frame);
}
return hr;
}
on_audio_filtered 回调函数
Filter graph 处理后回调的函数,fifo 又赢了,谁都别想绕过它 ( ●-● )
int on_audio_filtered(AVFrame* frame)
{
int hr = -1;
hr = add_samples_to_fifo(g_fifo, frame->data, frame->nb_samples);
RETURN_IF_FAILED(hr);
while (av_audio_fifo_size(g_fifo) >= OUT_FRAME_SIZE ||
(g_finished && av_audio_fifo_size(g_fifo) > 0))
{
AVFrame *output_frame = NULL;
hr = read_samples_from_fifo(g_fifo, &g_out_audio_info, &output_frame);
RETURN_IF_FAILED(hr);
hr = on_audio_proc(NULL, output_frame);
av_frame_free(&output_frame);
RETURN_IF_FAILED(hr);
}
return 0;
}
on_audio_proc 回调 & sdl_fill_audio 回调
Audio Data:哪里才是我的归宿 (?_?)
Uint8 *g_audio_chunk = NULL;
Uint8 *g_audio_pos = NULL;
Uint32 g_audio_len = 0;
// callback for decoded and swr converted samples
int on_audio_proc(void* context, AVFrame* frame)
{
g_audio_chunk = *(frame->data);
g_audio_pos = g_audio_chunk;
g_audio_len = frame->nb_samples * OUT_CHANNELS * sizeof(short);
while (g_audio_len > 0) //wait until all the data consumed
SDL_Delay(1);
return 0;
}
// callback for SDL audio playback engine
void sdl_fill_audio(void *user_data, Uint8 *stream, int len)
{
if (g_audio_len == 0)// Only play if we have data left
return;
SDL_memset(stream, 0, len); // Noise will occur without this step
Uint32 out_len = (len > g_audio_len ? g_audio_len : len);// Mix as much data as possible
SDL_MixAudio(stream, g_audio_pos, out_len, SDL_MIX_MAXVOLUME);
g_audio_pos += out_len;
g_audio_len -= out_len;
}
– EOF –