Relajación lagrangiana en el contexto de regresión de cresta

15

En "Los elementos del aprendizaje estadístico" (2ª ed.), P63, los autores dan las siguientes dos formulaciones del problema de regresión de crestas:

β^ridge=argminβ{i=1N(yiβ0j=1pxijβj)2+λj=1pβj2}

and

β^ridge=argminβi=1N(yiβ0j=1pxijβj)2, subject to j=1pβj2t.

It is claimed that the two are equivalent, and that there is a one-to-one correspondence between the parameters λ and t.

It would appear that the first formulation is a Lagrangian relaxation of the second. However, I never had an intuitive understanding of how or why Lagrangian relaxations work.

Is there a simple way to demonstrate that the two formulations are indeed equivalent? If I have to choose, I'd prefer intuition over rigour.

Thanks.

NPE
fuente
If you merely want an intuitive explanation, go at 1.03.26 of this video (to the end), there is an intuitive explanation of how constraints relate to objective function.
user603

Respuestas:

3

The correspondence can most easily be shown using the Envelope Theorem.

First, the standard Lagrangian will have an additional λt term. This will not affect the maximization problem if we are just treating λ as given, so Hastie et al drop it.

Now, if you differentiate the full Lagrangian with respect to t, the Envelope Theorem says you can ignore the indirect effects of t through β, because you're at a maximum. What you'll be left with is the Lagrange multipler from λt.

But what does this mean intuitively? Since the constraint binds at the maximum, the derivative of the Lagrangian, evaluated at the maximum, is the same as the deriviate the original objective. Therefore the Lagrange multiplier gives the shadow price -- the value in terms of the objective -- of relaxing the constraint by increasing t.

I assume this is the correspondence Hastie et al. are referring to.

Tristan
fuente