Construcción de distribución Dirichlet con distribución Gamma.

Los jacobianos, los determinantes absolutos del cambio de la función variable, parecen formidables y pueden ser complicados. Sin embargo, son una parte esencial e inevitable del cálculo de un cambio de variable multivariante. Parece que no hay nada más que escribir una matriz de derivadas por y hacer el cálculo. $k+1$ $k+1$

Hay una mejor manera Se muestra al final en la sección "Solución". Debido a que el propósito de esta publicación es presentar a los estadísticos lo que puede ser un método nuevo para muchos, gran parte está dedicado a explicar la maquinaria detrás de la solución. Este es el álgebra de las formas diferenciales . (Las formas diferenciales son las cosas que uno integra en múltiples dimensiones). Se incluye un ejemplo detallado y trabajado para ayudar a que esto se vuelva más familiar.

Antecedentes

Hace más de un siglo, los matemáticos desarrollaron la teoría del álgebra diferencial para trabajar con las "derivadas de orden superior" que ocurren en la geometría multidimensional. El determinante es un caso especial de los objetos básicos manipulados por tales álgebras, que típicamente son formas multilineales alternas . La belleza de esto radica en cuán simples pueden ser los cálculos.

Aquí está todo lo que necesitas saber.

Un diferencial es una expresión de la forma " ". Es la concatenación de " " con cualquier nombre de variable. $dx_i$ $d$
Una forma única es una combinación lineal de diferenciales, como o incluso . Es decir, los coeficientes son funciones de las variables. $dx_1+dx_2$ $x_2 dx_1 - \exp(x_2) dx_2$
Las formas se pueden "multiplicar" usando un producto de cuña , escrito . Este producto es anti-conmutativo (también llamado alternativo ): para cualquiera de las dos formas y , $\wedge$ $\omega$ $\eta$

$ω \land η = - η \land ω .$ $\omega \wedge \eta = -\eta \wedge \omega.$
Esta multiplicación es lineal y asociativa: en otras palabras, funciona de manera familiar. Una consecuencia inmediata es que , lo que implica que el cuadrado de cualquier forma única siempre es cero. ¡Eso hace que la multiplicación sea extremadamente fácil! $\omega \wedge \omega = -\omega \wedge \omega$
Para manipular los integrandos que aparecen en los cálculos de probabilidad, una expresión como puede entenderse como . $dx_1 dx_2 \cdots dx_{k+1}$ $|dx_1\wedge dx_2 \wedge \cdots \wedge dx_{k+1}|$
Cuando es una función, entonces su diferencial está dado por la diferenciación: $y = g(x_1, \ldots, x_n)$

$d y = d g (x_{1}, \dots, x_{n}) = \frac{\partial g}{\partial x_{1}} (x_{1}, \dots, x_{n}) d x_{1} + \dots + \frac{\partial g}{\partial x_{1}} (x_{1}, \dots, x_{n}) d x_{n} .$ $dy = dg(x_1, \ldots, x_n) = \frac{\partial g}{\partial x_1}(x_1, \ldots, x_n) dx_1 + \cdots + \frac{\partial g}{\partial x_1}(x_1, \ldots, x_n) dx_n.$

La conexión con los jacobianos es esta: el jacobiano de una transformación es, hasta el signo, simplemente el coeficiente de $(y_1, \ldots, y_n) = F(x_1, \ldots, x_n) = (f_1(x_1, \ldots, x_n), \ldots, f_n(x_1, \ldots, x_n))$ que aparece en informática $dx_1\wedge \dots \wedge dx_n$

d y_{1} \land \dots \land d y_{n} = d f_{1} (x_{1}, \dots, x_{n}) \land \dots \land d f_{n} (x_{1}, \dots, x_{n})

$dy_1 \wedge \cdots \wedge dy_n = df_1(x_1,\ldots, x_n)\wedge \cdots \wedge df_n(x_1, \ldots, x_n)$

después de expandir cada uno de los como una combinación lineal de en la regla (5). $df_i$ $dx_j$

Ejemplo

La simplicidad de esta definición de jacobiano es atractiva. ¿Aún no estás convencido de que valga la pena? Considere el conocido problema de convertir integrales bidimensionales de coordenadas cartesianas a coordenadas polares , donde . La siguiente es una aplicación completamente mecánica de las reglas anteriores, donde " $(x, y)$ $(r,\theta)$ $(x,y) = (r\cos(\theta), r\sin(\theta))$ $(*)$ "se usa para abreviar expresiones que obviamente desaparecerán en virtud de la regla (3), lo que implica . $dr\wedge dr = d\theta\wedge d\theta = 0$

\begin{aligned} d x d y & = | d x \land d y | = | d (r \cos (θ)) \land d (r \sin (θ)) | \\ = | (\cos (θ) d r - r \sin (θ) d θ) \land (\sin (θ) d r + r \cos (θ) d θ | \\ = | (*) d r \land d r + (*) d θ \land d θ - r \sin (θ) d θ \land \sin (θ) d r + \cos (θ) d r \land r \cos (θ) d θ | \\ = | 0 + 0 + r \sin^{2} (θ) d r \land d θ + r \cos^{2} (θ) d r \land d θ | \\ = | r (\sin^{2} (θ) + \cos^{2} (θ)) d r \land d θ) | \\ = r d r d θ \end{aligned} .

$\eqalign{ dx dy &= |dx\wedge dy| = |d(r\cos(\theta)) \wedge d(r\sin(\theta))| \\ &= |(\cos(\theta)dr - r\sin(\theta)d\theta) \wedge (\sin(\theta)dr + r\cos(\theta)d\theta| \\ &= |(*)dr\wedge dr + (*) d\theta\wedge d\theta - r\sin(\theta)d\theta\wedge \sin(\theta)dr + \cos(\theta)dr \wedge r\cos(\theta) d\theta| \\ &= |0 + 0 + r\sin^2(\theta) dr\wedge d\theta + r\cos^2(\theta) dr\wedge d\theta| \\ &= |r(\sin^2(\theta) + \cos^2(\theta)) dr\wedge d\theta)| \\ &= r\ dr d\theta }.$

El punto de esto es la facilidad con la que se pueden realizar tales cálculos, sin perder el tiempo con matrices, determinantes u otros objetos multiindiciales. Simplemente multiplica las cosas, recordando que las cuñas son anti-conmutativas. Es más fácil de lo que se enseña en álgebra de secundaria.

Preliminares

Veamos este álgebra diferencial en acción. En este problema, el PDF de la distribución conjunta de es el producto de los PDF individuales (porque se supone que es independiente). Para manejar el cambio a las variables debemos ser explícitos sobre los elementos diferenciales que se integrarán. Estos forman el término $(X_1, X_2, \ldots, X_{k+1})$ $X_i$ $Y_i$ $dx_1 dx_2 \cdots dx_{k+1}$ . Incluir el PDF da el elemento de probabilidad

\begin{aligned} f_{X} (x, α) d x_{1} \dots d x_{k + 1} & \propto (x_{1}^{α_{1} - 1} \exp (- x_{1})) \dots (x_{k + 1}^{α_{k + 1} - 1} \exp (- x_{k + 1})) d x_{1} \dots d x_{k + 1} \\ = x_{1}^{α_{1} - 1} \dots x_{k + 1}^{α_{k + 1} - 1} \exp (- (x_{1} + \dots + x_{k + 1})) d x_{1} \dots d x_{k + 1} . \end{aligned}

$\eqalign{ f_\mathbf{X}(\mathbf{x},\mathbf{\alpha})dx_1 \cdots dx_{k+1} &\propto \left(x_1^{\alpha_1-1}\exp\left(-x_1\right)\right)\cdots \left(x_{k+1}^{\alpha_{k+1}-1}\exp\left(-x_{k+1}\right) \right)dx_1 \cdots dx_{k+1} \\ &= x_1^{\alpha_1-1}\cdots x_{k+1}^{\alpha_{k+1}-1}\exp\left(-\left(x_1+\cdots+x_{k+1}\right)\right)dx_1 \cdots dx_{k+1}. }$

(The normalizing constant has been ignored; it will be recovered at the end.)

Staring at the definitions of the $Y_i$ a few seconds ought to reveal the utility of introducing the new variable

Z = X_{1} + X_{2} + \dots + X_{k + 1},

$Z = X_1 + X_2 + \cdots + X_{k+1},$

giving the relationships

X_{i} = Y_{i} Z .

$X_i = Y_i Z.$

This suggests making the change of variables $x_i \to y_i z$ in the probability element. The intention is to retain the first $k$ variables $y_1, \ldots, y_k$ along with $z$ and then integrate out $z$ . To do so, we have to re-express all the $dx_i$ in terms of the new variables. This is the heart of the problem. It's where the differential algebra takes place. To begin with,

d x_{i} = d (y_{i} z) = y_{i} d z + z d y_{i} .

$dx_i = d(y_i z) = y_i dz + z dy_i.$

Note that since $Y_1+Y_2+\cdots+Y_{k+1}=1$ , then

0 = d (1) = d (y_{1} + y_{2} + \dots + y_{k + 1}) = d y_{1} + d y_{2} + \dots + d y_{k + 1} .

$0 = d(1) = d(y_1 + y_2 + \cdots + y_{k+1}) = dy_1 + dy_2 + \cdots + dy_{k+1}.$

Consider the one-form

ω = d x_{1} + \dots + d x_{k} = z (d y_{1} + \dots + d y_{k}) + (y_{1} + \dots + y_{k}) d z .

$\omega = dx_1 + \cdots + dx_k = z(dy_1 + \cdots + dy_k) + (y_1+\cdots + y_k) dz.$

It appears in the differential of the last variable:

\begin{aligned} d x_{k + 1} & = z d y_{k + 1} + y_{k + 1} d z \\ = - z (d y_{1} + \dots + d y_{k}) + (1 - y_{1} - \dots y_{k}) d z \\ = d z - ω . \end{aligned}

$\eqalign{ dx_{k+1} &= z dy_{k+1} + y_{k+1}dz \\ &= -z(dy_1 + \cdots + dy_k) + (1-y_1-\cdots y_k)dz \\ &= dz - \omega. }$

The value of this lies in the observation that

d x_{1} \land \dots \land d x_{k} \land ω = 0

$dx_1 \wedge \cdots \wedge dx_k \wedge \omega = 0$

because, when you expand this product, there is one term containing $dx_1 \wedge dx_1 = 0$ as a factor, another containing $dx_2 \wedge dx_2 = 0$ , and so on: they all disappear. Consequently,

\begin{aligned} d x_{1} \land \dots \land d x_{k} \land d x_{k + 1} & = d x_{1} \land \dots \land d x_{k} \land z - d x_{1} \land \dots \land d x_{k} \land ω \\ = d x_{1} \land \dots \land d x_{k} \land z . \end{aligned}

$\eqalign{ dx_1 \wedge \cdots \wedge dx_k \wedge dx_{k+1} &= dx_1 \wedge \cdots \wedge dx_k \wedge z - dx_1 \wedge \cdots \wedge dx_k \wedge \omega \\ &= dx_1 \wedge \cdots \wedge dx_k \wedge z. }$

Whence (because all products $dz\wedge dz$ disappear),

\begin{aligned} d x_{1} \land \dots \land d x_{k + 1} & = (z d y_{1} + y_{1} d z) \land \dots \land (z d y_{k} + y_{k} d z) \land d z \\ = z^{k} d y_{1} \land \dots \land d y_{k} \land d z . \end{aligned}

$\eqalign{ dx_1 \wedge \cdots \wedge dx_{k+1} &= (z dy_1 + y_1 dz) \wedge \cdots \wedge (z dy_k + y_k dz) \wedge dz \\ &= z^k dy_1 \wedge \cdots \wedge dy_k \wedge dz. }$

The Jacobian is simply $|z^k| = z^k$ , the coefficient of the differential product on the right hand side.

Solution

The transformation $(x_1, \ldots, x_k, x_{k+1})\to (y_1, \ldots, y_k, z)$ is one-to-one: its inverse is given by $x_i = y_i z$ for $1\le i\le k$ and $x_{k+1} = z(1-y_1-\cdots-y_k)$ . Therefore we don't have to fuss any more about the new probability element; it simply is

\begin{aligned} (z y_{1})^{α_{1} - 1} \dots (z y_{k})^{α_{k} - 1} {(z (1 - y_{1} - \dots - y_{k}))}^{α_{k + 1} - 1} \exp (- z) | z^{k} d y_{1} \land \dots \land d y_{k} \land d z | \\ = (z^{α_{1} + \dots + α_{k + 1} - 1} \exp (- z) d z) (y_{1}^{α_{1} - 1} \dots y_{k}^{α_{k} - 1} {(1 - y_{1} - \dots - y_{k})}^{α_{k + 1} - 1} d y_{1} \dots d y_{k}) . \end{aligned}

$\eqalign{ &(z y_1)^{\alpha_1-1}\cdots (z y_k)^{\alpha_k-1}\left(z(1-y_1-\cdots-y_k)\right)^{\alpha_{k+1}-1}\exp\left(-z\right)|z^k dy_1 \wedge \cdots \wedge dy_k \wedge dz| \\ &= \left(z^{\alpha_1+\cdots+\alpha_{k+1}-1}\exp\left(-z\right) dz\right)\left( y_1^{\alpha_1-1} \cdots y_k^{\alpha_k-1}\left(1-y_1-\cdots-y_k\right)^{\alpha_{k+1}-1}dy_1 \cdots dy_k\right). }$

That is manifestly a product of a Gamma $(\alpha_1+\cdots+\alpha_{k+1})$ distribution (for $Z$ ) and a Dirichlet $(\mathbf\alpha)$ distribution (for $(Y_1,\ldots, Y_k)$ ). In fact, since the original normalizing constant must have been a product of $\Gamma(\alpha_i)$ , we deduce immediately that the new normalizing constant must be divided by $\Gamma(\alpha_1+\cdots+\alpha_{k+1})$ , enabling the PDF to be written

f_{Y} (y, α) = \frac{Γ (α_{1} + \dots + α_{k + 1})}{Γ (α_{1}) \dots Γ (α_{k + 1})} (y_{1}^{α_{1} - 1} \dots y_{k}^{α_{k} - 1} {(1 - y_{1} - \dots - y_{k})}^{α_{k + 1} - 1}) .

$f_\mathbf{Y}(\mathbf{y},\mathbf{\alpha}) = \frac{\Gamma(\alpha_1+\cdots+\alpha_{k+1})}{\Gamma(\alpha_1)\cdots\Gamma(\alpha_{k+1})}\left( y_1^{\alpha_1-1} \cdots y_k^{\alpha_k-1}\left(1-y_1-\cdots-y_k\right)^{\alpha_{k+1}-1}\right).$

whuber
fuente

Construcción de distribución Dirichlet con distribución Gamma.

Respuestas:

Antecedentes

Ejemplo

Preliminares

Solution