¿Explicación intuitiva de la densidad de la variable transformada?

37

Supongamos que X es una variable aleatoria con pdf fX(x) . Entonces la variable aleatoria Y=X2 tiene el pdf

fY(y)={12y(fX(y)+fX(y))y00y<0

Entiendo el cálculo detrás de esto. Pero estoy tratando de pensar en una manera de explicarlo a alguien que no conoce el cálculo. En particular, estoy tratando de explicar por qué el factor 1y aparece al frente. Voy a apuñalarlo:

Supongamos que X tiene una distribución gaussiana. Casi todo el peso de su pdf está entre los valores, por ejemplo, 3 y 3. Pero que se asigna a 0 a 9 de Y . Por lo tanto, el peso pesado en el pdf para X se ha extendido a través de una gama más amplia de valores en la transformación de Y . Por lo tanto, para que fY(y) sea ​​un verdadero pdf, el peso extra pesado debe ser compensado por el factor multiplicativo 1y

¿Como suena eso?

Si alguien puede proporcionar una mejor explicación o un enlace a uno en un documento o libro de texto, lo agradecería enormemente. Encuentro este ejemplo de transformación variable en varios libros introductorios de probabilidad matemática / estadísticas. Pero nunca encuentro una explicación intuitiva con eso :(

Lowndrul
fuente
I think your explanation is correct.
highBandWidth
2
The explanation is right, but it's purely qualitative: the precise form of the multiplicative factor is still a mystery. The -1/2 power simply appears magically. Thus, at some level, you have to do the same thing that Calculus does: find the rate of change of the square root function.
whuber

Respuestas:

37

PDFs are heights but they are used to represent probability by means of area. It therefore helps to express a PDF in a way that reminds us that area equals height times base.

Initially the height at any value x is given by the PDF fX(x). The base is the infinitesimal segment dx, whence the distribution (that is, the probability measure as opposed to the distribution function) is really the differential form, or "probability element,"

PEX(x)=fX(x)dx.

This, rather than the PDF, is the object you want to work with both conceptually and practically, because it explicitly includes all the elements needed to express a probability.

When we re-express x in terms of y=x2, the base segments dx get stretched (or squeezed): by squaring both ends of the interval from x to x+dx we see that the base of the y area must be an interval of length

dy=(x+dx)2x2=2xdx+(dx)2.

Because the product of two infinitesimals is negligible compared to the infinitesimals themselves, we conclude

dy=2xdx, whence dx=dy2x=dy2y.

Having established this, the calculation is trivial because we just plug in the new height and the new width:

PEX(x)=fX(x)dx=fX(y)dy2y=PEY(y).

Because the base, in terms of y, is dy, whatever multiplies it must be the height, which we can read directly off the middle term as

12yfX(y)=fY(y).

This equation PEX(x)=PEY(y) is effectively a conservation of area (=probability) law.

Two pdfs

This graphic accurately shows narrow (almost infinitesimal) pieces of two PDFs related by y=x2. Probabilities are represented by the shaded areas. Due to the squeezing of the interval [0.32,0.45] via squaring, the height of the red region (y, at the left) has to be proportionally expanded to match the area of the blue region (x, at the right).

whuber
fuente
2
I love infinitesimals. This is a wonderful explanation. Thinking in terms of the 2x, which can be clearly seen to emerge from the derivative of the transform, is much more intuitive than thinking in terms of the y. I think that's where my sticking point was.
lowndrul
@whuber, I believe you first line should be P(X(x,x+dx))=fx(x)dx? Is that what you mean by pdfX(x)? PS: also curious about your thoughts on my answer (below).
Carlos Cinelli
@Carlos It's a little more rigorous to express the idea in the way I did at the outset: the PDF is what you multiply the Lebesgue measure dx by in order to get the given probability measure.
whuber
@whuber but if the pdf is what you multiply then it is the term fX(x), not the product fx(x)dx as you wrote, right? It is not clear why you call the product fX(x)dx a pdf.
Carlos Cinelli
1
@Carlos: thank you; now I see your point. I made some edits to address it.
whuber
11

How about, if I manufacture objects that are always square and I know the distribution of the side lengths of the squares; what can I say about the distribution of the areas of the squares?

In particular, if I know the distribution of a random variable X, what can I say about Y=X2? One thing that you can say is

FY(c)=P(Yc)=P(X2c)=P(cXc)=FX(c)FX(c).

So a relationship is established between the CDF of Y and CDF of X; what is the relationship between their PDFs? We need calculus for that. Taking the derivatives of both sides gives you the results you wanted.

schenectady
fuente
2
(+1) Although this is not a full answer, it presents a good way to go about finding fY and clearly shows why it is a sum of two pieces, one for each square root.
whuber
1
I don't get why pdf(x) = f(x)dx. What about pdf(x) dx = f(x), density = prob mass/interval...what i'm getting wrong?
Fernando
2

Imagine we have a population and Y is a summary of that population. Then P(Y(y,y+Δy)) is counting the proportion of individuals that have variable Y in the range (y,y+Δy). You can consider this as a "bin" of size Δy and we are counting how many individuals are inside that bin.

Now let us re-express those individuals in terms of another variable, X. Given that we know that Y and X are related as Y=X2, the event Y(y,y+Δy) is the same as the event X2(x2,(x+Δx)2) which is the same as the event X(|x|,|x|+Δx) or X(|x|Δx,|x|). Thus, the individuals that are in the bin (y,y+Δy) must also be in the bins (|x|,|x|+Δx) and (|x|Δx,|x|). In other words, those bins must have the same proportion of individuals,

P(Y(y,y+Δy))=P(X(|x|,|x|+Δx))+P(X(|x|Δx,|x|))

Ok, now let's get to the density. First, we need to define what a probability density is. As the name suggests, it is the proportion of individuals per area. That is, we count the share of individuals on that bin and divide by the size of the bin. Since we have established that the proportions of people are the same here, but the size of the bins have changed, we conclude the density will be different. But different by how much?

As we said, the probability density is the proportion of people in the bin divided by the size of the bin, thus the density of Y is given by fY(y):=P(Y(y,y+Δy))Δy. Analogously, the probability density of X is given by fX(x):=P(X(x,x+Δx))Δx.

From our previous result that the population in each bin is the same we then have that,

fY(y):=P(Y(y,y+Δy))Δy=P(X(|x|,|x|+Δx))+P(X(|x|Δx,|x|))Δy=fX(|x|)Δx+fX(|x|)ΔxΔy=ΔxΔy(fX(|x|)+fX(|x|))=ΔxΔy(fX(y)+fX(y))

That is, the density fX(y)+fX(y) changes by the factor ΔxΔy, which is the relative size of stretching or squeezing the bin size. In our case, since y=x2 we have that y+Δy=(x+Δx)2=x2+2xΔx+Δx2. If Δx is tiny enough we can ignore Δx2, which implies Δy=2xΔx and ΔxΔy=12x=12y, and that is why the factor 12y shows up in the transformation.

Carlos Cinelli
fuente