音频播放 via FFmpeg

这里所说的播放是指通过读取声音文件数据然后经过解码输出到扬声器进行播放的过程。
Windows 上有如下几种常见的实现方式:

FFmpeg 简介

FFmpeg 是一套可以用来记录、转换数字音频、视频,并能将其转化为流的开源计算机程序。采用 LGPL 或 GPL 许可证。它提供了录制、转换以及流化音视频的完整解决方案。它包含了非常先进的音频 / 视频编解码库 libavcodec,为了保证高可移植性和编解码质量,libavcodec 里很多 code 都是从头开发的。

FFmpeg 在 Linux 平台下开发,但它同样也可以在其它操作系统环境中编译运行,包括 Windows、Mac OS X 等。这个项目最早由 Fabrice Bellard 发起,2004 年至 2015 年间由 Michael Niedermayer 主要负责维护。许多 FFmpeg 的开发人员都来自 MPlayer 项目,而且当前 FFmpeg 也是放在 MPlayer 项目组的服务器上。项目的名称来自 MPEG 视频编码标准,前面的 “FF” 代表 “Fast Forward”。

FFmpeg 命令行播放音频

FFmpeg 提供了现成的程序 ffplay 用命令行的方式对音频进行播放。

>ffplay.exe "D:\You're Beautiful.mp3"
ffplay version 3.3.3 Copyright (c) 2003-2017 the FFmpeg developers
Input #0, mp3, from 'D:\You're Beautiful.mp3':
  Metadata:
    … …
    artist          : James Blunt
    title           : You're Beautiful
    date            : 2006
  Duration: 00:03:24.30, start: 0.025057, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 192 kb/s
    Metadata:
      encoder       : LAME3.96r
  10.58 M-A:  0.000 fd=0 aq=27KB vq=0KB sq=0B f=0/0 ← 此处是正在播放的音频信息,会实时变化

FFmpeg + SDL 播放音频

SDL(Simple DirectMedia Layer)是一套开放源代码的跨平台多媒体开发库,使用 C 语言写成。SDL 提供了数种控制图像、声音、输入的函数,让开发者只要用相同或是相似的代码就可以开发出跨多个平台(Linux、Windows、Mac OS X 等)的应用软件。目前 SDL 多用于开发游戏、模拟器、媒体播放器等多媒体应用领域。

我们将利用 FFmpeg 进行解码,然后用 SDL 进行播放。

在播放前,可以在控制台输入 ffmpeg -formats 查看支持的音视频格式(muxers / demuxers):

> ffmpeg -formats
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20181017
  libavformat    58. 20.100 / 58. 20.100
  ... ...
File formats:
 D. = Demuxing supported
 .E = Muxing supported
 --
 D  3dostr          3DO STR
  E 3g2             3GP2 (3GPP2 file format)
  E 3gp             3GP (3GPP file format)
 D  4xm             4X Technologies
  E a64             a64 - video for Commodore 64
 D  aa              Audible AA format files
 D  aac             raw ADTS AAC (Advanced Audio Coding)
 DE ac3             raw AC-3
 D  acm             Interplay ACM
 D  act             ACT Voice file format
 D  adf             Artworx Data Format
 D  adp             ADP
 D  ads             Sony PS2 ADS
  E adts            ADTS AAC (Advanced Audio Coding)
 DE adx             CRI ADX
 ... ...

输入ffmpeg -codecs 可以查看支持的编解码器:

> ffmpeg.exe -codecs
ffmpeg version 4.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20181017
  libavcodec     58. 35.100 / 58. 35.100
  ... ...
Codecs:
 D..... = Decoding supported
 .E.... = Encoding supported
 ..V... = Video codec
 ..A... = Audio codec
 ..S... = Subtitle codec
 ...I.. = Intra frame-only codec
 ....L. = Lossy compression
 .....S = Lossless compression
 -------
 D.VI.S 012v                 Uncompressed 4:2:2 10-bit
 D.V.L. 4xm                  4X Movie
 D.VI.S 8bps                 QuickTime 8BPS video
 .EVIL. a64_multi            Multicolor charset for Commodore 64 (encoders: a64multi )
 .EVIL. a64_multi5           Multicolor charset for Commodore 64, extended with 5th color (colram) (encoders: a64multi5 )
 D.V..S aasc                 Autodesk RLE
 D.VIL. aic                  Apple Intermediate Codec
 DEVI.S alias_pix            Alias/Wavefront PIX image
 DEVIL. amv                  AMV Video
 D.V.L. anm                  Deluxe Paint Animation
 D.V.L. ansi                 ASCII/ANSI art
 DEV..S apng                 APNG (Animated Portable Network Graphics) image
 DEVIL. asv1                 ASUS V1
 DEVIL. asv2                 ASUS V2
... ...

播放流程

音频播放 via FFmpeg

播放代码

音频播放需要匹配解码后的音频格式和 Render 的音频格式,这个过程就是重采样(resampling),在 采集音频 via FFmpeg 这篇博文中有做过简单介绍。在 FFmpeg中,有两种方法进行重采样:

  1. 使用 resampler (SwrContext)
  2. 使用 filter (AVFilterGraph)
    *盗用 DShow 中的 Filter Graph * ╭∩╮(︶︿︶)╭∩╮

本文中的代码基于 FFmpeg 4.1。

  1. 以下是使用 resampler 进行播放的概要代码,略去各个函数的具体实现和资源释放
    audio_stream_idx = open_input_file(audio_file, AVMEDIA_TYPE_AUDIO, &fmt_ctx, &dec_ctx);
    hr = init_resampler(dec_ctx, OUT_CHANNELS, OUT_SAMPLE_FMT, OUT_SAMPLE_RATE, &resample_ctx);
    hr = init_fifo(&g_fifo, OUT_SAMPLE_FMT, OUT_CHANNELS);
    
    hr = sdl_helper.init(sdl_fill_audio);
    hr = sdl_helper.run();    
    
    g_out_audio_info.channels = OUT_CHANNELS;
    g_out_audio_info.channel_layout = av_get_default_channel_layout(OUT_CHANNELS);
    g_out_audio_info.sample_format = OUT_SAMPLE_FMT;
    g_out_audio_info.sample_rate = OUT_SAMPLE_RATE;
    g_out_audio_info.frame_size = OUT_FRAME_SIZE;
    
    while (_kbhit() == 0) {
        hr = audio_process(fmt_ctx, dec_ctx, &g_out_audio_info, g_fifo, 
                resample_ctx, audio_stream_idx, &g_finished, NULL, on_audio_proc);
        if (g_finished)
            break;
    }
  1. 以下是使用 filter 进行播放的概要代码,略去各个函数的具体实现和资源释放
    audio_stream_idx = open_input_file(audio_file, AVMEDIA_TYPE_AUDIO, &fmt_ctx, &dec_ctx);
    
    sprintf_s(filter_descr, "aresample=%d,aformat=sample_fmts=s16:channel_layouts=%s", 
        OUT_SAMPLE_RATE, OUT_CHANNELS == 1 ? "mono" : "stereo");
    hr = aud_filter.init(dec_ctx, fmt_ctx->streams[audio_stream_idx]->time_base, filter_descr);
    
    hr = init_fifo(&g_fifo, OUT_SAMPLE_FMT, OUT_CHANNELS);
    
    hr = sdl_helper.init(sdl_fill_audio);
    hr = sdl_helper.run();    
    
    g_out_audio_info.channels = OUT_CHANNELS;
    g_out_audio_info.channel_layout = av_get_default_channel_layout(OUT_CHANNELS);
    g_out_audio_info.sample_format = OUT_SAMPLE_FMT;
    g_out_audio_info.sample_rate = OUT_SAMPLE_RATE;
    g_out_audio_info.frame_size = OUT_FRAME_SIZE;
    
    while (_kbhit() == 0) {
        std::vector<AVFrame*> frames;
        hr = decode_av_frame(fmt_ctx, dec_ctx, audio_stream_idx, frames, &g_finished);
        if (SUCCEEDED(hr)) {
            for (size_t i = 0; i < frames.size(); ++i)
                hr = aud_filter.do_filter(frames[i], on_audio_filtered);
        }
        
        if (g_finished)
            break;
    }

open_input_file 函数

打开输入文件并初始化解码器。

int open_input_file(
    const char *file_name,
    AVMediaType stream_type,
    AVFormatContext **in_fmt_ctx,
    AVCodecContext **dec_ctx)
{
    AVCodecContext *avctx = NULL;
    AVCodec *decoder = NULL;
    
    int hr = avformat_open_input(in_fmt_ctx, file_name, NULL, NULL);
    hr = avformat_find_stream_info(*in_fmt_ctx, NULL);    
    
    int stream_index = -1;
    for (unsigned int i = 0; i < (*in_fmt_ctx)->nb_streams; ++i) {
        if ((*in_fmt_ctx)->streams[i]->codecpar->codec_type == stream_type) {
            stream_index = i;
            break;
        }
    }
    
    AVStream* stream = (*in_fmt_ctx)->streams[stream_index];
    decoder = avcodec_find_decoder(stream->codecpar->codec_id);    
    avctx = avcodec_alloc_context3(decoder);    
    
    hr = avcodec_parameters_to_context(avctx, stream->codecpar);
    if (stream_type == AVMEDIA_TYPE_VIDEO)
        avctx->framerate = av_guess_frame_rate(*in_fmt_ctx, stream, NULL);
        
    hr = avcodec_open2(avctx, decoder, NULL); 
    *dec_ctx = avctx;
    return stream_index;
}

Audio_filter::init 函数

初始化并连接 filter graph。

int Audio_filter::init(AVCodecContext *dec_ctx, AVRational stream_time_base, const char *filters_descr)
{
    const AVFilter *abuffersrc  = avfilter_get_by_name("abuffer");
    const AVFilter *abuffersink = avfilter_get_by_name("abuffersink");
    const enum AVSampleFormat out_sample_fmts[] = { AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_NONE };
    const int64_t out_channel_layouts[] = { AV_CH_LAYOUT_MONO, AV_CH_LAYOUT_STEREO, -1 };
    const int out_sample_rates[] = { 16000, 22050, 44100, 48000, -1 };
    
    AVFilterInOut *outputs = avfilter_inout_alloc();
    AVFilterInOut *inputs  = avfilter_inout_alloc();
    m_filter_graph = avfilter_graph_alloc();
    
    if (!dec_ctx->channel_layout)
        dec_ctx->channel_layout = av_get_default_channel_layout(dec_ctx->channels);
        
    char args[512];
    _snprintf_s(args, sizeof(args), "time_base=%d/%d:sample_rate=%d:sample_fmt=%s:channel_layout=0x%"PRIx64,
        stream_time_base.num, stream_time_base.den, dec_ctx->sample_rate,
        av_get_sample_fmt_name(dec_ctx->sample_fmt), dec_ctx->channel_layout);
        
    hr = avfilter_graph_create_filter(&m_buffersrc_ctx, abuffersrc, "in", args, NULL, m_filter_graph);
    hr = avfilter_graph_create_filter(&m_buffersink_ctx, abuffersink, "out", NULL, NULL, m_filter_graph);
    hr = av_opt_set_int_list(m_buffersink_ctx, "sample_fmts", out_sample_fmts, -1, AV_OPT_SEARCH_CHILDREN);
    hr = av_opt_set_int_list(m_buffersink_ctx, "channel_layouts", out_channel_layouts, -1, AV_OPT_SEARCH_CHILDREN);
    hr = av_opt_set_int_list(m_buffersink_ctx, "sample_rates", out_sample_rates, -1, AV_OPT_SEARCH_CHILDREN);
   
    outputs->name       = av_strdup("in");
    outputs->filter_ctx = m_buffersrc_ctx;
    outputs->pad_idx    = 0;
    outputs->next       = NULL;

    inputs->name       = av_strdup("out");
    inputs->filter_ctx = m_buffersink_ctx;
    inputs->pad_idx    = 0;
    inputs->next       = NULL;

    hr = avfilter_graph_parse_ptr(m_filter_graph, filters_descr, &inputs, &outputs, NULL);
    hr = avfilter_graph_config(m_filter_graph, NULL);

    m_filt_frame = av_frame_alloc();

    hr = 0;
RESOURCE_FREE:
    avfilter_inout_free(&inputs);
    avfilter_inout_free(&outputs);

    return hr;
}

SDL_audio_helper::init & run 函数

主要是初始化基本的音频参数,如:采样率,格式,声道,帧大小等。

    int SDL_audio_helper::init(SDL_AudioCallback get_aud_frame_callback)
    {
        int hr = -1;        
        hr = SDL_Init(SDL_INIT_AUDIO | SDL_INIT_TIMER);
        RETURN_IF_FAILED(hr);

        SDL_AudioSpec wanted_spec = {0};
        wanted_spec.freq = OUT_SAMPLE_RATE; 
        wanted_spec.format = AUDIO_S16SYS; 
        wanted_spec.channels = OUT_CHANNELS; 
        wanted_spec.samples = OUT_FRAME_SIZE; 
        wanted_spec.callback = get_aud_frame_callback; 
        wanted_spec.userdata = NULL; 

        hr = SDL_OpenAudio(&wanted_spec, NULL);
        RETURN_IF_FAILED(hr);
        return 0;
    }

    int SDL_audio_helper::run()
    {
        SDL_PauseAudio(0); // start audio playback
        return 0;
    }

audio_process 函数

前方参数密集,请小心。

typedef int (*pf_proc_audio_callback)(void* context, AVFrame *frame);

static int audio_process( 
    AVFormatContext* in_fmt_ctx,
    AVCodecContext* dec_ctx, 
    audio_base_info* out_audio_info,
    AVAudioFifo* fifo,
    SwrContext* resample_ctx,
    int audio_stream_index,
    int* finished,
    void* callback_ctx,
    pf_proc_audio_callback callback_proc)
{
    int hr = -1;
    hr = decode_a_frame(in_fmt_ctx, dec_ctx, out_audio_info, fifo, resample_ctx, audio_stream_index, finished);
    RETURN_IF_FAILED(hr);

    while ((av_audio_fifo_size(fifo) >= out_audio_info->frame_size)
          || (*finished && av_audio_fifo_size(fifo) > 0))
    {
        AVFrame *output_frame = NULL;
        hr = read_samples_from_fifo(fifo, out_audio_info, &output_frame);
        RETURN_IF_FAILED(hr);

        hr = callback_proc(callback_ctx, output_frame);
        av_frame_free(&output_frame);
        RETURN_IF_FAILED(hr);
    }    
    return 0;
}

decode_a_frame 函数

请看 这里,一模一样。

Audio_filter::do_filter 函数

本文中的 filter 其实就是做了重采样的工作,不过下面的函数通用所有 filter。

typedef int (*pf_filter_callback)(AVFrame *frame);

int Audio_filter::do_filter(AVFrame* frame, pf_filter_callback callback_proc)
{
    RETURN_IF_NULL(frame);
    int hr = -1;

    /* push the audio data from decoded frame into the filtergraph */
    hr = av_buffersrc_add_frame_flags(m_buffersrc_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);
    RETURN_IF_FAILED(hr);

    /* pull filtered audio from the filtergraph */
    while (true) {
        hr = av_buffersink_get_frame(m_buffersink_ctx, m_filt_frame);
        if (hr == AVERROR(EAGAIN) || hr == AVERROR_EOF)
            break;
        RETURN_IF_FAILED(hr);

        if (NULL != callback_proc)
            callback_proc(m_filt_frame);
        av_frame_unref(m_filt_frame);
    }
    return hr;
}

on_audio_filtered 回调函数

Filter graph 处理后回调的函数,fifo 又赢了,谁都别想绕过它 ( ●-● )

int on_audio_filtered(AVFrame* frame)
{
    int hr = -1;

    hr = add_samples_to_fifo(g_fifo, frame->data, frame->nb_samples);
    RETURN_IF_FAILED(hr);

    while (av_audio_fifo_size(g_fifo) >= OUT_FRAME_SIZE ||
        (g_finished && av_audio_fifo_size(g_fifo) > 0)) 
    {
        AVFrame *output_frame = NULL;
        hr = read_samples_from_fifo(g_fifo, &g_out_audio_info, &output_frame);
        RETURN_IF_FAILED(hr);

        hr = on_audio_proc(NULL, output_frame);
        av_frame_free(&output_frame);
        RETURN_IF_FAILED(hr);
    }

    return 0;
}

on_audio_proc 回调 & sdl_fill_audio 回调

Audio Data:哪里才是我的归宿 (?_?)

Uint8 *g_audio_chunk = NULL; 
Uint8 *g_audio_pos = NULL;
Uint32 g_audio_len = 0; 

// callback for decoded and swr converted samples
int on_audio_proc(void* context, AVFrame* frame)
{
    g_audio_chunk = *(frame->data); 
    g_audio_pos = g_audio_chunk;
    g_audio_len = frame->nb_samples * OUT_CHANNELS * sizeof(short);
    
    while (g_audio_len > 0) //wait until all the data consumed
        SDL_Delay(1); 
    return 0;
} 

// callback for SDL audio playback engine
void sdl_fill_audio(void *user_data, Uint8 *stream, int len)
{ 
    if (g_audio_len == 0)// Only play if we have data left 
        return; 
        
    SDL_memset(stream, 0, len); // Noise will occur without this step
    Uint32 out_len = (len > g_audio_len ? g_audio_len : len);// Mix as much data as possible
    SDL_MixAudio(stream, g_audio_pos, out_len, SDL_MIX_MAXVOLUME);
    
    g_audio_pos += out_len; 
    g_audio_len -= out_len; 
}

音频播放 via FFmpeg
EOF