online convex programming first week reading material
The first assigned reading material is online convex programming and generalized infinitesimal gradient ascent, written by Maritin Zinkevich in 2003, posted in proceedings of the twentieth international conference on machine learning. To begin with, online convex programming, as a contrast to offline convex programming, apart from both consist with a convex feasible set, deals with a set of convex functions, and the cost function is unavailable until the decision variable are selected. This models a number of optimization problems including industrial production and network routing.
The main idea in the paper is plain and simple, and based on gradient descent, which can be called greedy projection. The author didn’t make any distributional assumptions about the convex cost function and relationships between successive cost functions. The algorithm has three advantages, first gradient descent is simple and natural, second the prior assumptions aren’t needed, which provides a more general scenario and can handle some arbitrary sequences. Finally, in online linear, the algorithm even performs better compared with the traditional expert model. The idea is motivated by infinitesimal gradient ascent, which is used in repeated games. The author also proves that the algorithm is universally consistent.
In this section 2 the author introduced the algorithms and give an
Greedy Projection
To analyse the performance of the algorithm, in a way to prove that the algorithm works for any sequence of convex function, which is the biggest strength compared with the expert system, the author analyse the regret to achieve the goal. The way is to compare the algorithm to an offline one.
Online regret:
Offline regret:
Difference/regret :Longer and more drastically the sequences are, the less advantages can the offline algorithm get. And the goal is to bound the regret, whose average value can also be proved to approach zero. One important result for bounding is derived
Which can be interpreted as the begin on the wrong side, and always respond after see the cost function.The proof is shown in the paper, in a way to use
To compute the regret, however, this substitution is a little bit confusing to me. If set x* to be the statically optimal vector. Let
The author also bound a regret if the playing on action didn’t happen.
Then use the definition of projection and the assumption of g, the regret can be bounded as
The summation of the above equation would be the first result.
As for the regret against a dynamic strategy, for this scenario, the author predict that offline algorithm is allowed to have a small change. Whch is possible in reality. The author defined the path length
Similar regret function as static problem, by using the same proving idea, an be proved that the regret can also be bounded.
To sum up, in this section, the author used the property of convex function to bound input regret, the projection to bound the distance between the choices, the largest derivative to bound other derivative, an easily integrated algebra related with learning rate to bound the summation, yeild results in both static and dynamic
scenario. And the result in dynamic suggest that the regret will be bounded by the length of path.
In the next section, the author helps us to establish an online linear programming problem and applying the algorithm. The problem is constructed as follows:Two players A,B, a joint action is a pair A x B
Other u is zero. The player select the action at each step is based on a random basis. The history is called a sequence of joint actions.
of which the length is t. The utility of the history is
Also the regret of not playing action can be defined as
Which may not be positive. Then the max regret, or just regret is:
An important definition is behaviour:A behaviour is a function from histories of past actions to distributions over the next action of the player. Followed by the definition environment:It’s a function from the history of past actions to distributions over thr next action of the environment.
Then the author define a how a behaviour is universally consistent
Then to formulate a repeated game an online linear program: the author introduced generalized infinitesimal gradient ascent
If set as previous assumptions that
Then the algorithm is universally consistent. Using the result from previous, it’s easy to see that the regret goes to zero.
In section 4 the author show how to translate algorithms from mixing experts into online linear programs and online convex linear programming algorithms into algorithms for online convex programs. But I skipped it because it’s too theoretical for me.
To sum up this paper is suitable for us to get a better understand of some basic idea we learnt in class in online programming, and also shows us the potential strength of it.