¿Cuál es la relación entre el tamaño de la muestra y la influencia del previo sobre el posterior?

17

Si tenemos un tamaño de muestra pequeño, ¿influirá mucho la distribución previa en la distribución posterior?

toby j
fuente
5
The intuition is clear: the more data you have, the less you have to rely on your priors. Not just a statistics lesson, but a life lesson! ;)
Lucas Reis

Respuestas:

27

Yes. The posterior distribution for a parameter θ, given a data set X can be written as

p(θ|X)p(X|θ)likelihoodp(θ)prior

or, as is more commonly displayed on the log scale,

log(p(θ|X))=c+L(θ;X)+log(p(θ))

The log-likelihood, L(θ;X)=log(p(X|θ)), scales with the sample size, since it is a function of the data, while the prior density does not. Therefore, as the sample size increases, the absolute value of L(θ;X) is getting larger while log(p(θ)) stays fixed (for a fixed value of θ), thus the sum L(θ;X)+log(p(θ)) becomes more heavily influenced by L(θ;X) as the sample size increases.

Therefore, to directly answer your question - the prior distribution becomes less and less relevant as it becomes outweighed by the likelihood. So, for a small sample size, the prior distribution plays a much larger role. This agrees with intuition since, you'd expect that prior specifications would play a larger role when there isn't much data available to disprove them whereas, if the sample size is very large, the signal present in the data will outweigh whatever a priori beliefs were put into the model.

Macro
fuente
6
+1 Note that c also depends on n.
20

Here is an attempt to illustrate the last paragraph in Macro's excellent (+1) answer. It shows two priors for the parameter p in the Binomial(n,p) distribution. For a few different n, the posterior distributions are shown when x=n/2 has been observed. As n grows, both posteriors become more and more concentrated around 1/2.

For n=2 the difference is quite big, but for n=50 there is virtually no difference.

The two priors below are Beta(1/2,1/2) (black) and Beta(2,2) (red). The posteriors have the same colours as the priors that they are derived from.

Posterior distributions

(Note that for many other models and other priors, n=50 won't be enough for the prior not to matter!)

MånsT
fuente
4
Very cool illustrations, @MånsT. I de-italicized the words 'Beta' and 'Binomial' in your answer - I hope you don't mind.
Macro
Of course not, @Macro! I agree that it looks better this way.
MånsT