音频播放 via FFmpeg

Windows 上有如下几种常见的实现方式:

FFmpeg 简介

FFmpeg 是一套可以用来记录、转换数字音频、视频,并能将其转化为流的开源计算机程序。采用 LGPL 或 GPL 许可证。它提供了录制、转换以及流化音视频的完整解决方案。它包含了非常先进的音频 / 视频编解码库 libavcodec,为了保证高可移植性和编解码质量,libavcodec 里很多 code 都是从头开发的。

FFmpeg 在 Linux 平台下开发,但它同样也可以在其它操作系统环境中编译运行,包括 Windows、Mac OS X 等。这个项目最早由 Fabrice Bellard 发起,2004 年至 2015 年间由 Michael Niedermayer 主要负责维护。许多 FFmpeg 的开发人员都来自 MPlayer 项目,而且当前 FFmpeg 也是放在 MPlayer 项目组的服务器上。项目的名称来自 MPEG 视频编码标准,前面的 “FF” 代表 “Fast Forward”。

FFmpeg 命令行播放音频

FFmpeg 提供了现成的程序 ffplay 用命令行的方式对音频进行播放。

>ffplay.exe "D:\You're Beautiful.mp3"
ffplay version 3.3.3 Copyright (c) 2003-2017 the FFmpeg developers
Input #0, mp3, from 'D:\You're Beautiful.mp3':
    … …
    artist          : James Blunt
    title           : You're Beautiful
    date            : 2006
  Duration: 00:03:24.30, start: 0.025057, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 192 kb/s
      encoder       : LAME3.96r
  10.58 M-A:  0.000 fd=0 aq=27KB vq=0KB sq=0B f=0/0 ← 此处是正在播放的音频信息,会实时变化

FFmpeg + SDL 播放音频

SDL(Simple DirectMedia Layer)是一套开放源代码的跨平台多媒体开发库,使用 C 语言写成。SDL 提供了数种控制图像、声音、输入的函数,让开发者只要用相同或是相似的代码就可以开发出跨多个平台(Linux、Windows、Mac OS X 等)的应用软件。目前 SDL 多用于开发游戏、模拟器、媒体播放器等多媒体应用领域。

我们将利用 FFmpeg 进行解码,然后用 SDL 进行播放。

在播放前,可以在控制台输入 ffmpeg -formats 查看支持的音视频格式(muxers / demuxers):

> ffmpeg -formats
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20181017
  libavformat    58. 20.100 / 58. 20.100
  ... ...
File formats:
 D. = Demuxing supported
 .E = Muxing supported
 D  3dostr          3DO STR
  E 3g2             3GP2 (3GPP2 file format)
  E 3gp             3GP (3GPP file format)
 D  4xm             4X Technologies
  E a64             a64 - video for Commodore 64
 D  aa              Audible AA format files
 D  aac             raw ADTS AAC (Advanced Audio Coding)
 DE ac3             raw AC-3
 D  acm             Interplay ACM
 D  act             ACT Voice file format
 D  adf             Artworx Data Format
 D  adp             ADP
 D  ads             Sony PS2 ADS
  E adts            ADTS AAC (Advanced Audio Coding)
 DE adx             CRI ADX
 ... ...

输入ffmpeg -codecs 可以查看支持的编解码器:

> ffmpeg.exe -codecs
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20181017
  libavcodec     58. 35.100 / 58. 35.100
  ... ...
 D..... = Decoding supported
 .E.... = Encoding supported
 ..V... = Video codec
 ..A... = Audio codec
 ..S... = Subtitle codec
 ...I.. = Intra frame-only codec
 ....L. = Lossy compression
 .....S = Lossless compression
 D.VI.S 012v                 Uncompressed 4:2:2 10-bit
 D.V.L. 4xm                  4X Movie
 D.VI.S 8bps                 QuickTime 8BPS video
 .EVIL. a64_multi            Multicolor charset for Commodore 64 (encoders: a64multi )
 .EVIL. a64_multi5           Multicolor charset for Commodore 64, extended with 5th color (colram) (encoders: a64multi5 )
 D.V..S aasc                 Autodesk RLE
 D.VIL. aic                  Apple Intermediate Codec
 DEVI.S alias_pix            Alias/Wavefront PIX image
 DEVIL. amv                  AMV Video
 D.V.L. anm                  Deluxe Paint Animation
 D.V.L. ansi                 ASCII/ANSI art
 DEV..S apng                 APNG (Animated Portable Network Graphics) image
 DEVIL. asv1                 ASUS V1
 DEVIL. asv2                 ASUS V2
... ...


音频播放需要匹配解码后的音频格式和 Render 的音频格式,这个过程就是重采样(resampling),在 采集音频 via FFmpeg 这篇博文中有做过简单介绍。在 FFmpeg中,有两种方法进行重采样:

  1. 使用 resampler (SwrContext)
  2. 使用 filter (AVFilterGraph)
    *盗用 DShow 中的 Filter Graph * ╭∩╮(︶︿︶)╭∩╮

本文中的代码基于 FFmpeg 4.1。

  1. 以下是使用 resampler 进行播放的概要代码,略去各个函数的具体实现和资源释放
    audio_stream_idx = open_input_file(audio_file, AVMEDIA_TYPE_AUDIO, &fmt_ctx, &dec_ctx);
    hr = init_resampler(dec_ctx, OUT_CHANNELS, OUT_SAMPLE_FMT, OUT_SAMPLE_RATE, &resample_ctx);
    hr = init_fifo(&g_fifo, OUT_SAMPLE_FMT, OUT_CHANNELS);
    hr = sdl_helper.init(sdl_fill_audio);
    hr = sdl_helper.run();    
    g_out_audio_info.channels = OUT_CHANNELS;
    g_out_audio_info.channel_layout = av_get_default_channel_layout(OUT_CHANNELS);
    g_out_audio_info.sample_format = OUT_SAMPLE_FMT;
    g_out_audio_info.sample_rate = OUT_SAMPLE_RATE;
    g_out_audio_info.frame_size = OUT_FRAME_SIZE;
    while (_kbhit() == 0) {
        hr = audio_process(fmt_ctx, dec_ctx, &g_out_audio_info, g_fifo, 
                resample_ctx, audio_stream_idx, &g_finished, NULL, on_audio_proc);
        if (g_finished)
  1. 以下是使用 filter 进行播放的概要代码,略去各个函数的具体实现和资源释放
    audio_stream_idx = open_input_file(audio_file, AVMEDIA_TYPE_AUDIO, &fmt_ctx, &dec_ctx);
    sprintf_s(filter_descr, "aresample=%d,aformat=sample_fmts=s16:channel_layouts=%s", 
        OUT_SAMPLE_RATE, OUT_CHANNELS == 1 ? "mono" : "stereo");
    hr = aud_filter.init(dec_ctx, fmt_ctx->streams[audio_stream_idx]->time_base, filter_descr);
    hr = init_fifo(&g_fifo, OUT_SAMPLE_FMT, OUT_CHANNELS);
    hr = sdl_helper.init(sdl_fill_audio);
    hr = sdl_helper.run();    
    g_out_audio_info.channels = OUT_CHANNELS;
    g_out_audio_info.channel_layout = av_get_default_channel_layout(OUT_CHANNELS);
    g_out_audio_info.sample_format = OUT_SAMPLE_FMT;
    g_out_audio_info.sample_rate = OUT_SAMPLE_RATE;
    g_out_audio_info.frame_size = OUT_FRAME_SIZE;
    while (_kbhit() == 0) {
        std::vector<AVFrame*> frames;
        hr = decode_av_frame(fmt_ctx, dec_ctx, audio_stream_idx, frames, &g_finished);
        if (SUCCEEDED(hr)) {
            for (size_t i = 0; i < frames.size(); ++i)
                hr = aud_filter.do_filter(frames[i], on_audio_filtered);
        if (g_finished)

open_input_file 函数


int open_input_file(
    const char *file_name,
    AVMediaType stream_type,
    AVFormatContext **in_fmt_ctx,
    AVCodecContext **dec_ctx)
    AVCodecContext *avctx = NULL;
    AVCodec *decoder = NULL;
    int hr = avformat_open_input(in_fmt_ctx, file_name, NULL, NULL);
    hr = avformat_find_stream_info(*in_fmt_ctx, NULL);    
    int stream_index = -1;
    for (unsigned int i = 0; i < (*in_fmt_ctx)->nb_streams; ++i) {
        if ((*in_fmt_ctx)->streams[i]->codecpar->codec_type == stream_type) {
            stream_index = i;
    AVStream* stream = (*in_fmt_ctx)->streams[stream_index];
    decoder = avcodec_find_decoder(stream->codecpar->codec_id);    
    avctx = avcodec_alloc_context3(decoder);    
    hr = avcodec_parameters_to_context(avctx, stream->codecpar);
    if (stream_type == AVMEDIA_TYPE_VIDEO)
        avctx->framerate = av_guess_frame_rate(*in_fmt_ctx, stream, NULL);
    hr = avcodec_open2(avctx, decoder, NULL); 
    *dec_ctx = avctx;
    return stream_index;

Audio_filter::init 函数

初始化并连接 filter graph。

int Audio_filter::init(AVCodecContext *dec_ctx, AVRational stream_time_base, const char *filters_descr)
    const AVFilter *abuffersrc  = avfilter_get_by_name("abuffer");
    const AVFilter *abuffersink = avfilter_get_by_name("abuffersink");
    const enum AVSampleFormat out_sample_fmts[] = { AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_NONE };
    const int64_t out_channel_layouts[] = { AV_CH_LAYOUT_MONO, AV_CH_LAYOUT_STEREO, -1 };
    const int out_sample_rates[] = { 16000, 22050, 44100, 48000, -1 };
    AVFilterInOut *outputs = avfilter_inout_alloc();
    AVFilterInOut *inputs  = avfilter_inout_alloc();
    m_filter_graph = avfilter_graph_alloc();
    if (!dec_ctx->channel_layout)
        dec_ctx->channel_layout = av_get_default_channel_layout(dec_ctx->channels);
    char args[512];
    _snprintf_s(args, sizeof(args), "time_base=%d/%d:sample_rate=%d:sample_fmt=%s:channel_layout=0x%"PRIx64,
        stream_time_base.num, stream_time_base.den, dec_ctx->sample_rate,
        av_get_sample_fmt_name(dec_ctx->sample_fmt), dec_ctx->channel_layout);
    hr = avfilter_graph_create_filter(&m_buffersrc_ctx, abuffersrc, "in", args, NULL, m_filter_graph);
    hr = avfilter_graph_create_filter(&m_buffersink_ctx, abuffersink, "out", NULL, NULL, m_filter_graph);
    hr = av_opt_set_int_list(m_buffersink_ctx, "sample_fmts", out_sample_fmts, -1, AV_OPT_SEARCH_CHILDREN);
    hr = av_opt_set_int_list(m_buffersink_ctx, "channel_layouts", out_channel_layouts, -1, AV_OPT_SEARCH_CHILDREN);
    hr = av_opt_set_int_list(m_buffersink_ctx, "sample_rates", out_sample_rates, -1, AV_OPT_SEARCH_CHILDREN);
    outputs->name       = av_strdup("in");
    outputs->filter_ctx = m_buffersrc_ctx;
    outputs->pad_idx    = 0;
    outputs->next       = NULL;

    inputs->name       = av_strdup("out");
    inputs->filter_ctx = m_buffersink_ctx;
    inputs->pad_idx    = 0;
    inputs->next       = NULL;

    hr = avfilter_graph_parse_ptr(m_filter_graph, filters_descr, &inputs, &outputs, NULL);
    hr = avfilter_graph_config(m_filter_graph, NULL);

    m_filt_frame = av_frame_alloc();

    hr = 0;

    return hr;

SDL_audio_helper::init & run 函数


    int SDL_audio_helper::init(SDL_AudioCallback get_aud_frame_callback)
        int hr = -1;        

        SDL_AudioSpec wanted_spec = {0};
        wanted_spec.freq = OUT_SAMPLE_RATE; 
        wanted_spec.format = AUDIO_S16SYS; 
        wanted_spec.channels = OUT_CHANNELS; 
        wanted_spec.samples = OUT_FRAME_SIZE; 
        wanted_spec.callback = get_aud_frame_callback; 
        wanted_spec.userdata = NULL; 

        hr = SDL_OpenAudio(&wanted_spec, NULL);
        return 0;

    int SDL_audio_helper::run()
        SDL_PauseAudio(0); // start audio playback
        return 0;

audio_process 函数


typedef int (*pf_proc_audio_callback)(void* context, AVFrame *frame);

static int audio_process( 
    AVFormatContext* in_fmt_ctx,
    AVCodecContext* dec_ctx, 
    audio_base_info* out_audio_info,
    AVAudioFifo* fifo,
    SwrContext* resample_ctx,
    int audio_stream_index,
    int* finished,
    void* callback_ctx,
    pf_proc_audio_callback callback_proc)
    int hr = -1;
    hr = decode_a_frame(in_fmt_ctx, dec_ctx, out_audio_info, fifo, resample_ctx, audio_stream_index, finished);

    while ((av_audio_fifo_size(fifo) >= out_audio_info->frame_size)
          || (*finished && av_audio_fifo_size(fifo) > 0))
        AVFrame *output_frame = NULL;
        hr = read_samples_from_fifo(fifo, out_audio_info, &output_frame);

        hr = callback_proc(callback_ctx, output_frame);
    return 0;

decode_a_frame 函数

请看 这里,一模一样。

Audio_filter::do_filter 函数

本文中的 filter 其实就是做了重采样的工作,不过下面的函数通用所有 filter。

typedef int (*pf_filter_callback)(AVFrame *frame);

int Audio_filter::do_filter(AVFrame* frame, pf_filter_callback callback_proc)
    int hr = -1;

    /* push the audio data from decoded frame into the filtergraph */
    hr = av_buffersrc_add_frame_flags(m_buffersrc_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);

    /* pull filtered audio from the filtergraph */
    while (true) {
        hr = av_buffersink_get_frame(m_buffersink_ctx, m_filt_frame);
        if (hr == AVERROR(EAGAIN) || hr == AVERROR_EOF)

        if (NULL != callback_proc)
    return hr;

on_audio_filtered 回调函数

Filter graph 处理后回调的函数,fifo 又赢了,谁都别想绕过它 ( ●-● )

int on_audio_filtered(AVFrame* frame)
    int hr = -1;

    hr = add_samples_to_fifo(g_fifo, frame->data, frame->nb_samples);

    while (av_audio_fifo_size(g_fifo) >= OUT_FRAME_SIZE ||
        (g_finished && av_audio_fifo_size(g_fifo) > 0)) 
        AVFrame *output_frame = NULL;
        hr = read_samples_from_fifo(g_fifo, &g_out_audio_info, &output_frame);

        hr = on_audio_proc(NULL, output_frame);

    return 0;

on_audio_proc 回调 & sdl_fill_audio 回调

Audio Data:哪里才是我的归宿 (?_?)

Uint8 *g_audio_chunk = NULL; 
Uint8 *g_audio_pos = NULL;
Uint32 g_audio_len = 0; 

// callback for decoded and swr converted samples
int on_audio_proc(void* context, AVFrame* frame)
    g_audio_chunk = *(frame->data); 
    g_audio_pos = g_audio_chunk;
    g_audio_len = frame->nb_samples * OUT_CHANNELS * sizeof(short);
    while (g_audio_len > 0) //wait until all the data consumed
    return 0;

// callback for SDL audio playback engine
void sdl_fill_audio(void *user_data, Uint8 *stream, int len)
    if (g_audio_len == 0)// Only play if we have data left 
    SDL_memset(stream, 0, len); // Noise will occur without this step
    Uint32 out_len = (len > g_audio_len ? g_audio_len : len);// Mix as much data as possible
    SDL_MixAudio(stream, g_audio_pos, out_len, SDL_MIX_MAXVOLUME);
    g_audio_pos += out_len; 
    g_audio_len -= out_len; 

