论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

论文对ASGD算法进行了改进,对具有延迟的梯度设计了新的更新规则。

ASGD

下图为ASGD训练过程,梯度gt应该对应模型wt,但是由于延迟,在参数服务器接收到时,模型已经更新到了wt+τ,ASGD对于延迟的梯度不做处理。论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

论文利用泰勒展开,将梯度展开,尝试补偿延迟

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation部分则为hessian矩阵,因为参数数量过多,导致计算出精确的hessian矩阵会降低训练效率,所以文中提出了一种hessian近似器。

DC-ASGD算法

给出DC-ASGD算法更新规则:

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

客户端

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

服务器端

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

实验

在两个数据集上做了实验CIFAR-10 (Hinton, 2007) 和ImageNetILSVRC 2013 (Russakovsky et al., 2015).

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

论文笔记——Asynchronous Stochastic Gradient Descent with Delay Compensation

原文见论文题目