¿Alguien puede aclarar el concepto de una "suma de variables aleatorias"

21

En mi clase de probabilidad, los términos "sumas de variables aleatorias" se usan constantemente. Sin embargo, ¿estoy atascado en lo que eso significa exactamente?

¿Estamos hablando de la suma de un montón de realizaciones de una variable aleatoria? Si es así, ¿eso no se suma a un solo número? ¿Cómo una suma de realizaciones de variables aleatorias nos lleva a una distribución, o una función cdf / pdf / de cualquier tipo? Y si no se trata de realizaciones de variables aleatorias, ¿qué se agrega exactamente?

Gosset
fuente
1
Por 'realizaciones de una variable aleatoria' supongo que se refiere a los valores observados reales. Lo que se suma en la 'suma de variables aleatorias' son las variables aleatorias antes de que se observen. Imagine calcular el peso de las siguientes 5 personas para subir al elevador. Aún no conoce sus pesos (por lo tanto), por lo que cada uno es una variable aleatoria. Pero probablemente le gustaría saber algo sobre la distribución de la suma de sus pesos.
PeterR
@PeterR Esto es lo que no entiendo. ¿Cómo tiene sentido hablar sobre agregar algo que aún no tiene valor? ¿Es un tipo de suma metafórica?
Gosset
1
I think your problem is that you don't understand what is a random variable. If you get this concept then the sum will come easily too.
Aksakal
@Aksakal Isn't the fact that I posted this question evidence of that already? Perhaps if you do know it, you could clarify the concept?
Gosset
X+YX,YUnif(1,6) and independent. It turns out that X+Y has a triangular distribution.
bdeonovic

Respuestas:

39

A physical, intuitive model of a random variable is to write down the name of every member of a population on one or more slips of paper--"tickets"--and put those tickets into a box. The process of thoroughly mixing the contents of the box, followed by blindly pulling out one ticket--exactly as in a lottery--models randomness. Non-uniform probabilities are modeled by introducing variable numbers of tickets in the box: more tickets for the more probable members, fewer for the less probable.

A random variable is a number associated with each member of the population. (Therefore, for consistency, every ticket for a given member has to have the same number written on it.) Multiple random variables are modeled by reserving spaces on the tickets for more than one number. We usually give those spaces names like X, Y, and Z. The sum of those random variables is the usual sum: reserve a new space on every ticket for the sum, read off the values of X, Y, etc. on each ticket, and write their sum in that new space. This is a consistent way of writing numbers on the tickets, so it's another random variable.

Figure

This figure portrays a box representing a population Ω={α,β,γ} and three random variables X, Y, and X+Y. It contains six tickets: the three for α (blue) give it a probability of 3/6, the two for β (yellow) give it a probability of 2/6, and the one for γ (green) give it a probability of 1/6. In order to display what is written on the tickets, they are shown before being mixed.

The beauty of this approach is that all the paradoxical parts of the question turn out to be correct:

  • the sum of random variables is indeed a single, definite number (for each member of the population),

  • yet it also leads to a distribution (given by the frequencies with which the sum appears in the box), and

  • it still effectively models a random process (because the tickets are still blindly drawn from the box).

In this fashion the sum can simultaneously have a definite value (given by the rules of addition as applied to numbers on each of the tickets) while the realization--which will be a ticket drawn from the box--does not have a value until it is carried out.

This physical model of drawing tickets from a box is adopted in the theoretical literature and made rigorous with the definitions of sample space (the population), sigma algebras (with their associated probability measures), and random variables as measurable functions defined on the sample space.

This account of random variables is elaborated, with realistic examples, at "What is meant by a random variable?".

whuber
fuente
3
+1 exemplary post. I hope you don't mind the impertinent question, but what was the illustration done in?
Glen_b -Reinstate Monica
4
@Glen_b PowerPoint :-). The image of a box is from mymiddlec.files.wordpress.com/2013/09/empty-box.jpg. The tickets are PowerPoint graphics. (There's nothing impertinent about such questions!) I grouped the whole bunch, pasted it into Paint, and used that to save it as a .png file.
whuber
I'm missing something but it seems like you are just writing multiple numerical labels on each member of population. All alphas have X=1, Y=2 and hence X+Y= 3 .. X, Y and X+ Y have exactly same distribution, shifted by a value here a value there, because of different lebels
MiloMinderbinder
1
@whuber - should have written frequencies. Not well versed in mathematical jargons to say 'underlying probability measure'. anyhow you are getting my drift. I am beginning to see how i can play around with numbers on tickets to give it the desired probability distribution. At cursory level this approach just seemed like a wordplay with different 'labels' and hence was not seeing it clearly. this would be like 50th time you have helped me on this site. thank you
MiloMinderbinder
1
@Milo You're welcome. I see now that you were reacting to the example in this answer rather than the example I gave in the preceding comments. The answer's example indeed does have three different tickets with relative frequencies 1:2:3, and that is all that "probability measure" means in this case. This isn't just jargon, though: there's a profound need for the underlying concepts. See, inter alia, stats.stackexchange.com/questions/199280 for some nice accounts.
whuber
4

there is no secret behind this phrase, it is as simple as you can think: if X and Y are two random variables, their sum is X + Y and this sum is a random variable as well. If X_1, X_2, X_3,...,X_n and are n random variables, their sum is X_1 + X_2 + X_3 +...+ X_n and this sum is also a random variable (and a realization of this sum is a single number, namely a sum of n realizations).

Why do you talk so much about sums of random variables in the class? One reason is the (amazing) central limit theorem: if we sum many independent random variables, than we can "predict" the distribution of this sum (almost) independently of the distribution of the single variables in the sum! The sum tends to become a normal distribution and this is the likely reason why we observe the normal distribution so often in the real world.

jolvi
fuente
3

r.v. is a relation between the occurrence of an event and a real number. Say, if it's raining the value X is 1, if it's not then 0. You can have another r.v. Y equal to 10 when it's cold, and 100 when it's hot. So, if it's raining and cold then X=1, Y=10, and X+Y=11.

X+Y values are 10 (not raining cold); 11 (raining,cold), 100 (not raining,hot) and 110 (raining, hot). If you figure our probabilities of the events, then you'll get PMF of this new r.v. X+Y.

Aksakal
fuente
1

None of these answers gives a mathematically rigorous way to think about sum of random variable. Note that X,Y needs not to be defined on the same outcome domain and even if they do, X+Y cannot be understood as summing up two functions. Rather, they should be first extended to the domain Ω1×Ω2. For example, let X,Y be identical function of Ω={Head,Tail} where X(Head)=Y(Head)=1,X(Tail)=Y(Tail)=0. Domain of (X+Y) should be {(Head,Tail),(Tail,Head),(Head, Head),(Tail,Tail)}. Now X,Y are functions on this product space where their value is determined solely by the 1st and 2nd coordinate respectively. The sum now can be understood as summation of functions as the usual sense. Note also that the σfield and probability measure should also be defined anew. Saying X,Y are independent is one way to specify the product measure.

Daniel Li
fuente