Variance Reduction Methods: a Quick Introduction to Importance Sampling

https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/variance-reduction-methods

the art of variance reduction: better for less
both variance reduction techniques and quasi monte carlo methods (which we will talk about in the next chapter) are complex topics. why should u learn about them now? people have written entire books on these topics alone. why are they so important? monte carlo as u may have realized is actually quite a simple method. its main problem though is variance (in rendering, this turns into noise划重点要考的). to get rid of it, all we can do is increase the number of samples N, and to make things worse, each time u want to reduce this noise by two, u need not two but four times as many samples (for the error to decrease linearly, the number of samples needs to increase exponentially). thus a great deal of research was invested into finding whether this noise could be reduced by any other mean than just the brute force approach (increasing N). and it happens that mathematics again can help answering this question. this lead to an entire branch in monte carlo research focused on what is known as variance reduction methods. how can we reduce variance by any other mean than just increasing N. it happens that importance sampling and quasi monte carlo are two such solutions. of course, u can imagine why people are so excited about it. it is the saint graal of monte carlo rendering: the promise of better for less. importance sampling and quasi monte carlo deserve a lesson of their own (which u will find later in this section). teaching these techniques will also becomes easier once the topic of light transport and BRDFs have been reviewed (the next three lesssons). but let us briefly talk about them now, just to give u a feel of what they are how and why they work (if u are not interested in the mathematical details and just want to have an intuitive understanding of what these methods are, this introduction is probably what u are looking for and good enough).

A Quick Introduction to Importance Sampling

in the lesson introduction to shading, we already explained that to shade a point P, we need to gather light arriving at P. in 3D, light can come from all directions in the hemisphere oriented about the normal at P which mathematicaly, we can define as an integral over the hemisphere (the hemisphere is the domain of integration):

Variance Reduction Methods: a Quick Introduction to Importance Sampling
The symbol Ω is usually used in the literature to denote the hemisphere of direction (and dw is a differential solid angle, w is a direction) and L(w) a function of direction (it returns the amount of light coming from w). technically, we should add a cos⁡(θ) in this expression but it is not important for this demonstration. what is important, is to realize that finding how much light arrives at P requires an integral over the entire hemisphere of directions above P. unfortunately, there is no closed-form to this integral for arbitrary geometry. however, we can approximate the result of this integral with monte carlo integration. the idea is to “sample” the hemisphere, or to create a set of N random directions over the hemisphere about P. the approximation of the integral is just an average of the amount of light coming from these N randomly chosen directions:
Variance Reduction Methods: a Quick Introduction to Importance Sampling
where ωi is a random direction contained in the hemisphere above P (in the lesson Sampling from the advanced section, you can lean how to create samples over the hemisphere). An illustration of this technique can be seen in figure 1 (top).
Variance Reduction Methods: a Quick Introduction to Importance Sampling
Figure 1: samples can miss parts of the function which are contributing a lot to the result of the integral. It would be great to distribute these samples using a PDF that has the same than the function (the PDF is high whereas the function is high).

In two dimension though, the hemisphere becomes a half disk. Now, as with most of the Monte Carlo integration cases we have been studying so far, the samples are uniformly distributed. In other words, each direction in the hemisphere has equal probability to be chosen. If we consider the 2D case, the direction is reduced to an angle (θθ) going from 0 to ππ. The domain of integration being defined by the interval [0,ππ], the PDF (which we know is 1/(b-a)) is therefore 1/ππ and is uniform all across i.e. the distribution function is constant (figure 1, bottom, the red line represents this PDF).

Let’s now look into what the problem is. As you can see in the top of figure 1, we are trying to find out how much light arrives at P, a point located on the floor of a box. To make the demonstration easier, we will imagine that the walls from this box are very dark, therefore they will contribute very little to the illumination of P. However, we have two lights on the ceiling of this room. Naturally, most of the light illuminating the floor will come from these two light sources. But, as you can see on the figure, on the six samples used in our Monte Carlo integration to approximate light arriving at P, all samples are missing these lights but one. Since the contribution of one of the lights is missing, we can assume that the approximation of the amount of light arriving at P using this set of samples, will be underestimated. In other words, we can except the difference between the approximation (the monte carlo integration) and the expected value of f(x) to be quite high (variance will be high).

When the function is pretty regular, the position of the samples has little impact on the result. When the function is very irregular, a small change in the sample position may cause a large-scale change in the result of the Monte Carlo integration. In general, irregular functions are difficult to sample and cause a high variance in the estimator (figure 2).
Variance Reduction Methods: a Quick Introduction to Importance Sampling
Figure 2: When the function is pretty regular (top left), the position of the samples has little impact on the result. When the function is very irregular, a small change in the sample position may cause a large-scale change in the result of the Monte Carlo integration…

Technically you can see the amount of light arriving at P as a function defined over the interval [0,ππ]. This is like rotating the hand of a watch from one side of the disk to the other, and measuring how much light strikes P from the direction pointed out by the hand. At the bottom of figure 1, you can see what this function looks like (the yellow curve). It’s a nice, continuous smooth curve, with two distinct peaks one for each of the two light sources. If we were to integrate this function, in the entire domain over which this function is valid, it is pretty clear that these two peaks would have a more important contribution to the result of the integral than any other part of the curve. We can say that these peaks are where the function is the most important. As we said, it would make sense to create samples where the function is important but as we have just explained, there is no guarantee that it will actually be the case particularly when the sample are uniformly distributed.

Now you may think that this is not a problem. Why not just creating samples where the function is important (figure 3). There is at least two problems with this approach. First keep in mind that monte carlo integration is a stochastic process. In other words, it only works because samples are chosen randomly. If you start deciding where you put the samples (rather than randomly distributing them), the theory of monte carlo integration will fall apart (the result of your sudo Monte Carlo integration will have no mathematical meaning). Concentrating samples where the function is important while ignoring the over parts of the curve, will probably overestimate the result anyway, which is not good either. But the second problem is even more critical: knowing where the function is actually important, suggests that we know what that function is in the first place, which obviously in most situations, is an information we don’t have. For example in the example of finding how much light strikes P, we just don’t know anything about where is the light coming from and in which amount. If we had this information, we would probably not use monte carlo integration in the first place anyway. We only use this method, because precisely, we don’t know beforehand what that function is, so we just sample it, randomly, hoping that the average of these samples will give us a valid approximation to the integral.

This is a very important point in Monte Carlo integration. In most situations, we have no prior knowledge on the function (the integrand) being integrated, and this is often particularly true in rendering.
Variance Reduction Methods: a Quick Introduction to Importance Sampling
Figure 3: uniform distribution is okay but we may miss parts of the function which are important. In this example (on the left), one of the peaks is not sampled. Forcing the samples to be in the regions of the function which are the most important is not a good solution either. Monte Carlo integration relies on the samples to be “stochastically” distributed. Forcing them to be within specific regions of the function will introduce bias (in this example, on the right, you can see that the chosen samples will lead to overestimating the result of the integral).

unfortunately, variance reduction techniques require some previous knowledge of the function being integrated. in some importance cases (such as the two we described in this chapter, integrating the incoming radiance over the pixel area and the incoming light at P) we do not know what that function is. so what is the deal? Helas, there is no magical solution to this problem. if u do not have any prior knowledge on the function being integrated, just forget about variance reduction. but let us see what we can do when we know about f(x) prior to the integration (we will show an example related to rendering later in this chapter).

Let’s imagine that the integrand is a constant function (figure 4). So you go with a uniform distribution, you sample your constant function, get a result, and everything is great. Obviously since the function is constant, every time you run this approximation you get the same result. In other words, variance is simply zero. The ultimate goal of Monte Carlo integration is to have the smallest possible variance, with the smallest number of samples. So obviously when the variance is 0, you can’t get anything better than that. Then you might think “wouldn’t it be great if all functions that needed integration, were constant”. Though, in reality, we know that this is never the case: functions that need integration are never constant (it would be too easy). Some of them can even have very high narrow peaks. But can we turn a non constant function into a constant function?
Variance Reduction Methods: a Quick Introduction to Importance Sampling
Figure 4: the position of the samples doesn’t change the result of the Monte Carlo integration.

while being a rather strange thought, the answer to this question is obvioulsy yes. u just need to divide the function by itself. if the funtion f(x) equals 2 when x=0 , then f(0)/2 = 1. If the function f(x) = 0.5 when x = 2 then f(2)/0.5 = 1. If you do this for every value of x, then you end up with a constant function f(x)/f(x) = 1.

As we showed in figure 5 (right), dividing f(x) by another other function exactly proportional to f(x) gives a constant function as well. If f′(x) is a copy of f(x) but either scaled up or down by a constant factor c, we have:
Variance Reduction Methods: a Quick Introduction to Importance Sampling
and thus:
Variance Reduction Methods: a Quick Introduction to Importance Sampling
Variance Reduction Methods: a Quick Introduction to Importance Sampling
figure 5: if the function f(x) is divided by itself (in red left) or another function
f’(x) which is proportional to f(x) (in red, right), we get a constant function (in blue).

Now you will think, that’s great and simple but what’s the point, since the function we are interested in is f(x) and not a constant function. That’s true, but have another look at the formula for the general Monte Carlo estimator again:

Variance Reduction Methods: a Quick Introduction to Importance Sampling

We kept the same color code (red for f(x) and yellow for f’(x)) so that you can more easily see where we are heading at. In the Monte Carlo estimator, the integrand f(x) is divided by another function, pdf(x). Taking advantage of what we just learned, if pdf(x) is a function exactly proportional to f(x), then the variance of the Monte Carlo integral will also exactly be 0!

Since Monte Carlo integration applied to a constant function has no variance (regardless of the way samples are distributed), we can take advantage of the general Monte Carlo estimator (in which the integrand f(x) is divided by pdf(x)) to create the same effect. Indeed, if the pdf(x) (in the numerator) is a function whose shape is exactly proportional (the ideal case) or as similar as possible to the integrand, then variance will either be 0 (the ideal case) or potentially greatly reduced:

Variance Reduction Methods: a Quick Introduction to Importance Sampling
What it means in practice, is that to reduce variance, our tasks is to find a pdf that is matching as closely as possible the shape of the integrand. When the PDF matches the shape of the function exactly, you can a perfect estimator.