Recientemente encontré que era necesario derivar un pdf para el cuadrado de una variable aleatoria normal con media 0. Por cualquier razón, elegí no normalizar la varianza de antemano. Si hice esto correctamente, este pdf es el siguiente:
Me di cuenta de que esto era solo una parametrización de una distribución gamma:
Y luego, por el hecho de que la suma de dos gammas (con el mismo parámetro de escala) es igual a otra gamma, se deduce que la gamma es equivalente a la suma de variables aleatorias normales al cuadrado.
Esto fue un poco sorprendente para mí. Aunque sabía que la distribución , una distribución de la suma de las RV normales estándar al cuadrado , era un caso especial de la gamma, no me di cuenta de que la gamma era esencialmente una generalización que permitía la suma de variables aleatorias normales de cualquier variación. Esto también conduce a otras caracterizaciones que no había visto antes, como la distribución exponencial equivalente a la suma de dos distribuciones normales al cuadrado.
Todo esto es algo misterioso para mí. ¿Es la distribución normal fundamental para la derivación de la distribución gamma, de la manera que describí anteriormente? La mayoría de los recursos que verifiqué no mencionan que las dos distribuciones están intrínsecamente relacionadas de esta manera, o incluso, de hecho, describen cómo se deriva la gamma. ¿Esto me hace pensar que está en juego una verdad de nivel inferior que simplemente he resaltado de una manera enrevesada?
Respuestas:
Como señaló el comentario del profesor Sarwate, las relaciones entre el cuadrado normal y el chi-cuadrado son un hecho muy difundido, como también debería ser el hecho de que un chi-cuadrado es solo un caso especial de la distribución Gamma:
la última igualdad que sigue a la propiedad de escala de Gamma.
En cuanto a la relación con el exponencial, para ser exactos, es la suma de dos normales al cuadrado de media cero cada uno escalado por la varianza del otro , lo que conduce a la distribución exponencial:
But the suspicion that there is "something special" or "deeper" in the sum of two squared zero mean normals that "makes them a good model for waiting time" is unfounded: First of all, what is special about the Exponential distribution that makes it a good model for "waiting time"? Memorylessness of course, but is there something "deeper" here, or just the simple functional form of the Exponential distribution function, and the properties ofe ? Unique properties are scattered around all over Mathematics, and most of the time, they don't reflect some "deeper intuition" or "structure" - they just exist (thankfully).
Second, the square of a variable has very little relation with its level. Just considerf(x)=x in, say, [−2,2] :
...or graph the standard normal density against the chi-square density: they reflect and represent totally different stochastic behaviors, even though they are so intimately related, since the second is the density of a variable that is the square of the first. The normal may be a very important pillar of the mathematical system we have developed to model stochastic behavior - but once you square it, it becomes something totally else.
fuente
Let us address the question posed, This is all somewhat mysterious to me. Is the normal distribution fundamental to the derivation of the gamma distribution...? No mystery really, it is simply that the normal distribution and the gamma distribution are members, among others of the exponential family of distributions, which family is defined by the ability to convert between equational forms by substitution of parameters and/or variables. As a consequence, there are many conversions by substitution between distributions, a few of which are summarized in the figure below.
LEEMIS, Lawrence M.; Jacquelyn T. MCQUESTON (February 2008). "Univariate Distribution Relationships" (PDF). American Statistician. 62 (1): 45–53. doi:10.1198/000313008x270448 cite
Here are two normal and gamma distribution relationships in greater detail (among an unknown number of others, like via chi-squared and beta).
First A more direct relationship between the gamma distribution (GD) and the normal distribution (ND) with mean zero follows. Simply put, the GD becomes normal in shape as its shape parameter is allowed to increase. Proving that that is the case is more difficult. For the GD,
As the GD shape parametera→∞ , the GD shape becomes more symmetric and normal, however, as the mean increases with increasing a , we have to left shift the GD by (a−1)1a−−√k to hold it stationary, and finally, if we wish to maintain the same standard deviation for our shifted GD, we have to decrease the scale parameter (b ) proportional to 1a−−√ .
To wit, to transform a GD to a limiting case ND we set the standard deviation to be a constant (k ) by letting b=1a−−√k and shift the GD to the left to have a mode of zero by substituting z=(a−1)1a−−√k+x . Then
Note that in the limit asa→∞ the most negative value of x for which this GD is nonzero →−∞ . That is, the semi-infinite GD support becomes infinite. Taking the limit as a→∞ of the reparameterized GD, we find
Graphically fork=2 and a=1,2,4,8,16,32,64 the GD is in blue and the limiting ND(x;0, 22) is in orange, below
Second Let us make the point that due to the similarity of form between these distributions, one can pretty much develop relationships between the gamma and normal distributions by pulling them out of thin air. To wit, we next develop an "unfolded" gamma distribution generalization of a normal distribution.
Note first that it is the semi-infinite support of the gamma distribution that impedes a more direct relationship with the normal distribution. However, that impediment can be removed when considering the half-normal distribution, which also has a semi-infinite support. Thus, one can generalize the normal distribution (ND) by first folding it to be half-normal (HND), relating that to the generalized gamma distribution (GD), then for our tour de force, we "unfold" both (HND and GD) to make a generalized ND (a GND), thusly.
The generalized gamma distribution
Can be reparameterized to be the half-normal distribution,
Note thatθ=π√σ2√. Thus,
which implies that
is a generalization of the normal distribution, whereμ is the location, α>0 is the scale, and β>0 is the shape and where β=2 yields a normal distribution. It includes the Laplace distribution when β=1 . As β→∞ , the density converges pointwise to a uniform density on (μ−α,μ+α) . Below is the generalized normal distribution plotted for α=π√2,β=1/2,1,4 in blue with the normal case α=π√2,β=2 in orange.
The above can be seen as the generalized normal distribution Version 1 and in different parameterizations is known as the exponential power distribution, and the generalized error distribution, which are in turn one of several other generalized normal distributions.
fuente
The derivation of the chi-squared distribution from the normal distribution is much analogous to the derivation of the gamma distribution from the exponential distribution.
We should be able to generalize this:
The analogy is as following:
Normal and Chi-squared distributions relate to the sum of squares
The joint density distribution of multiple independent standard normal distributed variables depends on∑x2i
f(x1,x2,...,xn)=exp(−0.5∑ni=1xi2)(2π)n/2
IfXi∼N(0,1)
then∑ni=1Xi2∼χ2(ν)
Exponential and gamma distributions relate to the regular sum
The joint density distribution of multiple independent exponential distributed variables depends on∑xi
IfXi∼Exp(λ)
then∑ni=1Xi∼Gamma(n,λ)
The derivation can be done by a change of variables integrating not over allx1,x2,...xn but instead only over the summed term (this is what Pearson did in 1900). This unfolds very similar in both cases.
For theχ2 distribution:
WhereV(s)=πn/2Γ(n/2+1)sn/2 is the n-dimensional volume of an n-ball with squared radius s .
For the gamma distribution:
WhereV(s)=snn! is the n-dimensional volume of a n-polytope with ∑xi<s .
The gamma distribution can be seen as the waiting timeY for the n -th event in a Poisson process which is the distributed as the sum of n exponentially distributed variables.
As Alecos Papadopoulos already noted there is no deeper connection that makes sums of squared normal variables 'a good model for waiting time'. The gamma distribution is the distribution for a sum of generalized normal distributed variables. That is how the two come together.
But the type of sum and type of variables may be different. While the gamma distribution, when derived from the exponential distribution (p=1), gets the interpretation of the exponential distribution (waiting time), you can not go reverse and go back to a sum of squared Gaussian variables and use that same interpretation.
The density distribution for waiting time which falls of exponentially, and the density distribution for a Gaussian error falls of exponentially (with a square). That is another way to see the two connected.
fuente