Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

Part 1: 相关配置介绍

1. 总开关

appendonly  no

no: 表示不开启 aof 功能

yes: 表示开启 aof 功能

2. 输出文件

appendfilename  “appendonly.aof”

aof 输出的文件名

3. 模式

appendfsync  everysec

# appendfsync  always

# appendfsync  no

no: don't fsync, just let the OS flush the data when it wants. Faster.

性能最快,这样系统调度磁盘I/O,性能较好。

always: fsync after every write to the append only log. Slow, Safest.

每次写入,都直接调用fsync,将数据刷进磁盘,最安全。

everysec: fsync only one time every second. Compromise.

4. 选项

no-appendfsync-on-rewrite no

      AOF fsync策略被设置为alwayseverysec,并且后台进程(后台保存或AOF log后台重写)对磁盘执行大量I/O操作时,在某些Linux配置中,Redis可能fsync()调用会阻塞很长时间。并且,该问题目前并没有被修复。

目前,即使在不同的线程中执行fsync也会阻塞我们的同步write(2)调用。

为了缓解这个问题,可以使用以下选项来防止在执行BGSAVEBGREWRITEAOF时在主进程中调用fsync()

这意味着,当一个子进程执行保存操作时,Redis的持久化与“appendfsync none”相同。实际上,这意味着在最坏的情况下(使用默认的Linux设置)可能会损失30秒的日志。

如果你有延迟、性能损失问题,把它变成“yes”。否则,从持久化角度来看,“no”是最安全的选择。

auto-aof-rewrite-percentage 100

auto-aof-rewrite-min-size 64mb

      当日志大小以指定的百分比增长时,Redis能够通过显式调用BGREWRITEAOF的自动rewrite aof文件。

Part 2: 工作机制简介

Redis 内部保存了最近一次重写后的AOF文件的大小(如果重新启动后没有发生重写,则使用启动时的AOF大小)

这个基准大小与当前大小进行比较。如果当前大小大于指定的百分比,则会触发重写。需要为要重写的AOF文件指定一个最小的大小,这对于避免重写AOF文件是很有用的,即使达到了百分比增长,但它仍然是非常小的(数据库启用初期,增长很快)。

指定0的百分比以禁用自动AOF重写特性。

aof-load-truncated yes

      redis 异常退出,aof 文件被异常截断,不完整时,用于数据恢复。

aof-use-rdb-preamble yes

      aof file 前面有个 rdb 部分,即aof file分成两部分: [RDB file][AOF tail]

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

Part 3: Core API: rewriteAppendOnlyFileBackground

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

1. 客户端主动发送 BGREWRITEAOF

2. redis fork 子进程:

      a. 子进程 rewrite aof file into a temp file

      b. 主进程 不断向 server aof_rewrite_buf 中 添加新增内容

3. 当子进程完成 a 中操作

4. 主进程获取子进程退出状态,如果没有异常,那么将 aof_rewrite_buf 中新增的内容append 进入 aof temp file 中,最后,将该 temp aof file 重命名。然后这个文件作为新的 aof file

Part 4: details 1 aof diff buf

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

Part 5: aof 是如何工作的呢,机制详解

第一种情况,未开启 aofappendonly 配置为 no

接收到了命令 bgrewriteaof

此时:

情况 1. aof 子进程存在,那么此次命令失败。

情况 2. rdb 子进程存在,那么该命令会被 scheduled,待“闲暇”时,被处理,逻辑与直接处理逻辑一致。

情况 3. 直接处理,此时会 fork aof 子进程进行相应的处理。

相应处理逻辑,主要在 rewriteAppendOnlyFileBackground 函数中实现。

注意: aof 并不总是 fork 子进程,用户主动发送 aof 命令,以 back ground fork 子进程的方式处理,是一种较为典型 aof 处理方式。

此处按下,暂且不表。

第二种情况,开启 aofappendonly 配置为 yes

Redis

1. load config 阶段,aof_state 置为 AOF_ON

    同时加载其他相应的参数:

        appendfsync    everysec

        auto-aof-rewrite-percentage     100

        auto-aof-rewrite-min-size     64mb

        aof-load-truncated    yes

        aof-use-rdb-preamble    yes

2. server init 阶段,aof_fd append 方式打开,另外,aof_buf 启用

3. 开启“监听”模式,接收用户命令,等待“投喂”

“投喂”过程,参看图:AOF_cmd_log

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

Redis 在处理用户命令后,会调用 aof 模块提供的 feedAppendOnlyFile 接口,进行 aof 的数据“投喂”过程。

aof 将命令格式化之后,存入 redis 全局缓存 aof_buf

(“投喂”过程的捎带手的操作,如下图,数据被存入了 aof_rewrite_buf_blocks )

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

aof_buf 的消费

每次 epoll_wait “醒来”,处理完命令,以及上述的“投喂”过程,根据 aeEventLoop 框架,在进入下一次 epoll_wait 之前,会进行 beforeSleep callback 的调用。

而在该 callback 调用中,将调用 flushAppendOnlyFile aof_buf 中的数据,write 入 aof_fd 中。

write 的后处理,返回值处理:

如果write 失败,并且是 always 模式运行 aof,那么此时整个 redis server 会直接 exit (这恐怕不是期望看到的)

aof_last_write_errno 错误码记录:

如果 write 失败(返回 -1)那么记录错误码write 的错误。

如果 write 失败(只写了部分内容)redis 内部会尝试进行“恢复”,即将 aof log 文件 truncate write 之前的大小。如果此时 truncate 失败,那么记录错误ENOSPCaof_last_write_status 错误状态记录: C_ERR。并且,已经写入的内容,将从 aof_buf 中移除。

如果 write 成功,即将所有 aof_buf 中的内容全量写入了 aof_fd 中,那么将置 aof_last_write_status C_OK,并且将尝试释放 aof_buf(aof_buf 大小低于 4000,保留,否则释放)

aof_buf write 结束之后,sync 策略:

appendfsync  everysec

# appendfsync  always

# appendfsync  no

no: don't fsync, just let the OS flush the data when it wants. Faster.

性能最快,这样系统调度磁盘I/O,性能较好。这种状态下,redis 内部没有做额外的处理即 redis 不显式调用 fsync/fdatasync 接口,该函数此时便会直接返回。

always: fsync after every write to the append only log. Slow, Safest.

每次写入,都直接调用fsync,将数据刷进磁盘,最安全,这种情况,如前文所述,一旦有一次 write 失败(无论是返回-1partially written ,此时都会将整个 redis 进程 exit)。

everysec: fsync only one time every second. Compromise.

默认配置,常见的模式。

everysec 两种情况:

第一. epoll_wait 是从 timeout 中醒来,或者遇到了无需进行 aof 处理的命令,此时的 aof_buf 空,在进入下一次 epoll_wait 之前的 beforeSleep callback 中,重新进入该函数,进行必要性的检查后,如需要,将进行 try_sync 处理

第二. epoll_wait 是从响应 client 端命令中醒来,并且,该命令亦需进行 aof 处理的命令,此时的 aof_buf 为非空。

如果此时后台(注意该后台,并不是 aof child process,而是 back ground I/O 模块后台 job )正在处理 sync 任务:

那么查看 flag

如果是初始化状态,那么会置 flag 后返回(aof_flush_postponed_start) flag 将在serverCron 轮询任务中生效。此时,本次 write 操作,会被 skip

如果该 flag 是非初始化状态,会检查时间流逝,低于 2 秒,直接返回,本次 write 操作被 skip

如果该 flag 是非初始化状态,并且时间流逝已经超过 2 秒,此时会 fall through

如果无后台处理的 sync 任务,或者从上面被 fall through 下来,逻辑如下:

1. write aof_buf to aof_fd

2. 上面的 flag: aof_flush_postponed_start 0 (serverCron 任务中,该分支便无法执行)

3. 接下来,按照 write 逻辑正常处理(前文已述)

遗留问题一: try_sync

遗留问题二: serverCron 中对 aof_flush_postponed_start flag 的判断处理

遗留问题三: write 出错, aof_last_write_status  == C_ERR

遗留问题一: try_sync

try_sync 操作分支(是个 goto label 分支),只在 everysec 中被决策执行,always 模式下,该分支走不到,此处与代码注释中的描述不一致,而且,分支中有段“多余”的代码,也就是说 always 模式的 sync 操作,此处并未处理,查看代码,也有问题,always 模式下的 sync 操作,并不是如 redis 文档中所描述的那样,每一条 write 命令都会紧接着 sync

1. no-appendfsync-on-rewrite 配置将进行判断,如果 rdb 或者 aof child process 存在,正在处理之前的 saving 操作,那么该配置如果为 no, try_sync 失败

2. 此时如果没有后台的 sync job,将添加后台 sync job

3. BIO 异步线程模式启动,main thread aof_fd 传送过去,后台进行 fdatasync(aof_fd); 操作

遗留问题二:

serverCron 中对 aof_flush_postponed_start 的判断处理,实际此时的场景是:

bio 后台正在处理 sync 任务,aof_buf 非空,新增需要 aof 处理的命令,此时由于 beforeSleep write 进行了 skip 的操作,此时需要在 serverCron 中不断的进行轮询,进行 write 的条件检查并操作。

遗留问题三: write 出错, aof_last_write_status  == C_ERR

serverCron 中不断尝试,如下:

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

前文所述的“多余”代码如下红框中所示:

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

如果按照 redis 文档中描述所说的,以上 if 条件应当添加一种情况

aof_fsync == AOF_FSYNC_ALWAYS

如下图:

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

子进程呢? AOF rewrite 机制

文描述很多,没有看到 aof 子进程的创建时机。 aof 子进程的创建,主要有两种方式触发

第一: 用户显式发送 bgrewriteaof 命令

第二: 根据 config 进行自动触发

核心都是同一个函数的调用 rewriteAppendOnlyFileBackground

自动触发条件如下(在serverCron任务中,进行以下条件的检查以及进行相应的处理):

Redis Source Code Read Log( 14 AOF 详解(NOTE:此文较长))

进程干啥了?

1. 全量 dump(此处与 rdb 模式类似)当前数据库的快照。

全量格式有两种,一种是 aof 格式,另外一种是 rdb 格式,参考于配置项:

aof-use-rdb-preamble    yes

2. dump 期间,如果有客户端的新命令,将被临时缓存起来,父子进程会将该部分数据进行同步。

3. 临时文件替换为 aof file 文件。

为什么会进行这样的操作,定期的进行后台子进程的处理,进行全量数据 dump ?

为什么在已经有了一个AOF的文件了,还有进行这样的dump操作,然后还进行替换?

假设一个场景:

SET key value

DEL key

如果没有子进程这种全量dump后的替换规则存在,那么:

aof 文件中是什么呢?

条命令进行 AOF 格式化之后的文本。

但是实际的数据库中呢?空的

也就是说 AOF 其实是命令“轨迹”,如果不采用替换,那么不断的操作被记录下来,AOF文件可能变得“无限”大。

如果采用了这种全量 dump,进行替换(aof rewrite 机制),就能一定程度上控制这种“无限”大的场景。

所以:

AOF 的文件,实际是 全量数据 + 一段时间内的命令轨迹”

这样的方式,保证了 redis 遇到非预期的core dump,数据能够最大程度的被还原。

所以 AOF 方案的安全性相比 RDB 要强大很多。

Part 6: AOF 总结

逐条命令进行“轨迹”的记录(即:先 feedAppendOnlyFile,后进行 flushAppendOnlyFile ),达到一定程度(增量达到一定的百分比),之后(触发 AOF child process back ground rewrite 机制),进行全量数据的dump ( 即调用rewriteAppendOnlyFileBackground ),然后,替换(即:rewrite)。

在此过程中,由于是子进程进行的处理,所以 main process 仍旧能够响应客户端请求,进行正常处理。并且,能够记录此时的 diff 内容,待子进程完成全量数据的 dump 以及增量数据的同步之后,替换 aof 文件,然后整体回到了逐条命令进行“轨迹”记录的状态,不断循环。

所以,AOF 实际是包含了 rdb 功能的

Part 7: 附 redis 官网针对两种持久化方式的优劣论述

文本来源于 redis.io

RDB advantages

  • RDB is a very compact single-file point-in-time representation of your Redis data. RDB files are perfect for backups. For instance you may want to archive your RDB files every hour for the latest 24 hours, and to save an RDB snapshot every day for 30 days. This allows you to easily restore different versions of the data set in case of disasters.
  • RDB is very good for disaster recovery, being a single compact file that can be transferred to far data centers, or onto Amazon S3 (possibly encrypted).
  • RDB maximizes Redis performances since the only work the Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent instance will never perform disk I/O or alike.
  • RDB allows faster restarts with big datasets compared to AOF.

RDB disadvantages

  • RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). You can configure different save points where an RDB is produced (for instance after at least five minutes and 100 writes against the data set, but you can have multiple save points). However you'll usually create an RDB snapshot every five minutes or more, so in case of Redis stopping working without a correct shutdown for any reason you should be prepared to lose the latest minutes of data.
  • RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great. AOF also needs to fork() but you can tune how often you want to rewrite your logs without any trade-off on durability.

AOF advantages

  • Using AOF Redis is much more durable: you can have different fsync policies: no fsync at all, fsync every second, fsync at every query. With the default policy of fsync every second write performances are still great (fsync is performed using a background thread and the main thread will try hard to perform writes when no fsync is in progress.) but you can only lose one second worth of writes.
  • The AOF log is an append only log, so there are no seeks, nor corruption problems if there is a power outage. Even if the log ends with an half-written command for some reason (disk full or other reasons) the redis-check-aof tool is able to fix it easily.
  • Redis is able to automatically rewrite the AOF in background when it gets too big. The rewrite is completely safe as while Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one.
  • AOF contains a log of all the operations one after the other in an easy to understand and parse format. You can even easily export an AOF file. For instance even if you flushed everything for an error using a FLUSHALL command, if no rewrite of the log was performed in the meantime you can still save your data set just stopping the server, removing the latest command, and restarting Redis again.

AOF disadvantages

  • AOF files are usually bigger than the equivalent RDB files for the same dataset.
  • AOF can be slower than RDB depending on the exact fsync policy. In general with fsync set to every second performance is still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load.
  • In the past we experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF produced to not reproduce exactly the same dataset on reloading. These bugs are rare and we have tests in the test suite creating random complex datasets automatically and reloading them to check everything is fine. However, these kind of bugs are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works by incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust. However - 1) It should be noted that every time the AOF is rewritten by Redis it is recreated from scratch starting from the actual data contained in the data set, making resistance to bugs stronger compared to an always appending AOF file (or one rewritten reading the old AOF instead of reading the data in memory). 2) We have never had a single report from users about an AOF corruption that was detected in the real world.