Importance of scaling

Next: You better make your Up: PRECONDITIONING THE REGULARIZATION Previous: The second miracle of

Importance of scaling

Another simple toy example shows us the importance of scaling. We use the same example as the previous one, except we make the diagonal penalty function vary slowly with location.

 d(m)     F(m,n)                                            iter   Sum(|grad|)

-100.     62.  16.   2.  53.  59.  22.  37. -12.  -3.   8.     1  42484.1016
 -83.     31.  72.  74. -47.  43.  40. -16.  26.  -3.  -4.     2   8388.0635
  20.      3. -19.  46.  27.   5.   9. -32.   7.  -3.   2.     3   4379.3032
   0.    100.   0.   0.   0.   0.   0.   0.   0.   0.   0.     4   1764.9844
   0.      0.  90.   0.   0.   0.   0.   0.   0.   0.   0.     5    868.9418
   0.      0.   0.  80.   0.   0.   0.   0.   0.   0.   0.     6    502.5179
   0.      0.   0.   0.  70.   0.   0.   0.   0.   0.   0.     7    450.0512
   0.      0.   0.   0.   0.  60.   0.   0.   0.   0.   0.     8    185.2923
   0.      0.   0.   0.   0.   0.  50.   0.   0.   0.   0.     9    247.1021
   0.      0.   0.   0.   0.   0.   0.  40.   0.   0.   0.    10    338.7060
   0.      0.   0.   0.   0.   0.   0.   0.  30.   0.   0.    11    119.5686
   0.      0.   0.   0.   0.   0.   0.   0.   0.  20.   0.    12     34.3372
   0.      0.   0.   0.   0.   0.   0.   0.   0.   0.  10.    13      0.0000

We observe that solving the same problem for the scaled variables has required a severe increase in the number of iterations required to get the solution. We lost the benefit of the second CG miracle. Even the rapid convergence predicted for the 11th iteration is delayed until the 13th.

Another curious fact may be noted here. The gradient does not decrease monotonically. It is known theoretically that the residual does decrease monotonically, but the gradient need not. I did not show the norm of the residual, because I wanted to display a function that vanishes at convergence, and the residual does not.

2015-05-07