Desde una perspectiva de probabilidad bayesiana, ¿por qué un intervalo de confianza del 95% no contiene el parámetro verdadero con una probabilidad del 95%?

14

Desde la página de Wikipedia sobre intervalos de confianza :

... si se construyen intervalos de confianza en muchos análisis de datos separados de experimentos repetidos (y posiblemente diferentes), la proporción de tales intervalos que contienen el valor verdadero del parámetro coincidirá con el nivel de confianza ...

Y de la misma página:

Un intervalo de confianza no predice que el valor verdadero del parámetro tiene una probabilidad particular de estar en el intervalo de confianza dados los datos realmente obtenidos.

Si lo entendí bien, esta última afirmación se hace con la interpretación frecuentista de la probabilidad en mente. Sin embargo, desde una perspectiva de probabilidad bayesiana, ¿por qué un intervalo de confianza del 95% no contiene el parámetro verdadero con una probabilidad del 95%? Y si no es así, ¿qué tiene de malo el siguiente razonamiento?

Si tengo un proceso que sé que produce una respuesta correcta el 95% del tiempo, entonces la probabilidad de que la siguiente respuesta sea correcta es 0.95 (dado que no tengo ninguna información adicional sobre el proceso). Del mismo modo, si alguien me muestra un intervalo de confianza creado por un proceso que contendrá el parámetro verdadero el 95% del tiempo, ¿no debería estar en lo cierto al decir que contiene el parámetro verdadero con probabilidad 0.95, dado lo que sé?

Esta pregunta es similar a, pero no la misma, ¿por qué un IC del 95% no implica una probabilidad del 95% de contener la media? Las respuestas a esa pregunta se han centrado en por qué un IC del 95% no implica una probabilidad del 95% de contener la media desde una perspectiva frecuentista. Mi pregunta es la misma, pero desde una perspectiva de probabilidad bayesiana.

Rasmus Bååth
fuente
Una forma de pensar en esto es que el IC del 95% es un "promedio a largo plazo". Ahora hay muchas maneras de dividir sus casos de "corto plazo" para que se obtenga una cobertura bastante arbitraria, pero cuando se promedia se obtiene un 95% en general. Otra forma más abstracta es generar xiBernoulli(pi) para i=1,2, modo que i=1pi=0.95 . Hay infinitas formas de hacer esto. Aquí xiindica si el CI creado con el conjunto de datos i-ésimo contenía el parámetro, y pi es la probabilidad de cobertura para este caso.
probabilidadislogica

Respuestas:

11

Actualización : con el beneficio de una retrospectiva de algunos años, escribí un tratamiento más conciso de esencialmente el mismo material en respuesta a una pregunta similar.


Cómo construir una región de confianza

Comencemos con un método general para construir regiones de confianza. Se puede aplicar a un solo parámetro, para obtener un intervalo de confianza o un conjunto de intervalos; y se puede aplicar a dos o más parámetros, para obtener regiones de confianza dimensional más altas.

Afirmamos que las estadísticas observadas D originan a partir de una distribución con parámetros θ , es decir, la distribución de muestreo s(d|θ) sobre posibles estadísticas d , y buscamos una región de confianza para θ en el conjunto de valores posibles Θ . Defina una región de mayor densidad (HDR): el h -HDR de un PDF es el subconjunto más pequeño de su dominio que admite la probabilidad h . Denote el h -HDR de s(d|ψ) como Hψ , para cualquier ψΘ . Entonces, laregión de confianzah paraθ , dados los datosD , es el conjuntoCD={ϕ:DHϕ} . Un valor típico deh sería 0,95.

Una interpretación frecuente

De la definición anterior de una región de confianza se sigue

dHψψCd
con Cd={ϕ:dHϕ} . Ahora imagine un gran conjunto de ( imaginarios ) observaciones {Di} , tomada en circunstancias similares a D . es decir, son muestras de s(d|θ) . Como Hθ admite la masa de probabilidad h del PDFs(d|θ) ,P(DiHθ)=h para todoi . Por lo tanto, la fracción de{Di} para la cualDiHθ esh . Y así, usando la equivalencia anterior, la fracción de{Di} para la cualθCDi también esh .

Esto, entonces, es lo que el reclamo frecuentista de la región de confianza h para θ equivale a:

Tomar un gran número de observaciones imaginarios {Di} de la distribución muestral s(d|θ) que dio lugar a la estadística observada D . Entonces, θ encuentra dentro de una fracción h de las regiones de confianza análogas pero imaginarias {CDi} .

Por lo tanto, la región de confianza CD no hace ningún reclamo sobre la probabilidad de que θ encuentre en algún lugar. La razón es simplemente que no hay nada en la formulación que nos permita hablar de una distribución de probabilidad sobre θ . La interpretación es simplemente una superestructura elaborada, que no mejora la base. La base es solo s(d|θ) y D , donde θ no aparece como una cantidad distribuida, y no hay información que podamos usar para abordar eso. Básicamente, hay dos formas de obtener una distribución sobre θ :

  1. Asigne una distribución directamente de la información disponible: p(θ|I) .
  2. Relacione θ con otra cantidad distribuida: p(θ|I)=p(θx|I)dx=p(θ|xI)p(x|I)dx .

En ambos casos, θ debe aparecer a la izquierda en alguna parte. Los frecuentes no pueden usar ninguno de los métodos, porque ambos requieren un previo herético.

Una vista bayesiana

Lo más que un Bayesiano puede hacer de la región de confianza hCD , dada sin calificación, es simplemente la interpretación directa: que es el conjunto de ϕ para el cual D cae en el h -HDR Hϕ de la distribución de muestreo s(d|ϕ) . No necesariamente nos dice mucho sobre θ , y he aquí por qué.

La probabilidad de que θCD , dada D y la información de fondo I , sea:

P(θCD|DI)=CDp(θ|DI)dθ=CDp(D|θI)p(θ|I)p(D|I)dθ
Observe que, a diferencia de la interpretación frecuentista, inmediatamente exigimos una distribución sobreθ. La información de fondoInos dice, como antes, que la distribución de muestreo ess(d|θ):
P(θCD|DI)=CDs(D|θ)p(θ|I)p(D|I)dθ=CDs(D|θ)p(θ|I)dθp(D|I)i.e.P(θCD|DI)=CDs(D|θ)p(θ|I)dθs(D|θ)p(θ|I)dθ
Ahora esta expresión en general no evalúa ah, es decir, laregión de confianzahCDno siempre contieneθcon probabilidadh. De hecho, puede ser muy diferente deh. Hay, sin embargo, muchas situaciones comunes en las quenose evalúan comoh, que es la razón por regiones de confianza a menudo son consistentes con nuestras intuiciones probabilísticas.

Por ejemplo, suponga que el PDF conjunto anterior de d y θ es simétrico en que pd,θ(d,θ|I)=pd,θ(θ,d|I) . (Claramente, esto implica una suposición de que el PDF se extiende sobre el mismo dominio en d y θ .) Entonces, si lo anterior es p(θ|I)=f(θ) , tenemos s(D|θ)p(θ|I)=s(D|θ)f(θ)=s(θ|D)f(D) . Por lo tanto

P(θCD|DI)=CDs(θ|D)dθs(θ|D)dθi.e.P(θCD|DI)=CDs(θ|D)dθ
From the definition of an HDR we know that for any ψΘ
Hψs(d|ψ)dd=hand therefore thatHDs(d|D)dd=hor equivalentlyHDs(θ|D)dθ=h
Therefore, given that s(d|θ)f(θ)=s(θ|d)f(d), CD=HD implies P(θCD|DI)=h. The antecedent satisfies
CD=HDψ[ψCDψHD]
Applying the equivalence near the top:
CD=HDψ[DHψψHD]
Thus, the confidence region CD contains θ with probability h if for all possible values ψ of θ, the h-HDR of s(d|ψ) contains D if and only if the h-HDR of s(d|D) contains ψ.

Now the symmetric relation DHψψHD is satisfied for all ψ when s(ψ+δ|ψ)=s(Dδ|D) for all δ that span the support of s(d|D) and s(d|ψ). We can therefore form the following argument:

  1. s(d|θ)f(θ)=s(θ|d)f(d) (premise)
  2. ψδ[s(ψ+δ|ψ)=s(Dδ|D)] (premise)
  3. ψδ[s(ψ+δ|ψ)=s(Dδ|D)]ψ[DHψψHD]
  4. ψ[DHψψHD]
  5. ψ[DHψψHD]CD=HD
  6. CD=HD
  7. [s(d|θ)f(θ)=s(θ|d)f(d)CD=HD]P(θCD|DI)=h
  8. P(θCD|DI)=h

Let's apply the argument to a confidence interval on the mean of a 1-D normal distribution (μ,σ), given a sample mean x¯ from n measurements. We have θ=μ and d=x¯, so that the sampling distribution is

s(d|θ)=nσ2πen2σ2(dθ)2
Suppose also that we know nothing about θ before taking the data (except that it's a location parameter) and therefore assign a uniform prior: f(θ)=k. Clearly we now have s(d|θ)f(θ)=s(θ|d)f(d), so the first premise is satisfied. Let s(d|θ)=g((dθ)2). (i.e. It can be written in that form.) Then
s(ψ+δ|ψ)=g((ψ+δψ)2)=g(δ2)ands(Dδ|D)=g((DδD)2)=g(δ2)so thatψδ[s(ψ+δ|ψ)=s(Dδ|D)]
whereupon the second premise is satisfied. Both premises being true, the eight-point argument leads us to conclude that the probability that θ lies in the confidence interval CD is h!

We therefore have an amusing irony:

  1. The frequentist who assigns the h confidence interval cannot say that P(θCD)=h, no matter how innocently uniform θ looks before incorporating the data.
  2. The Bayesian who would not assign an h confidence interval in that way knows anyhow that P(θCD|DI)=h.

Final Remarks

We have identified conditions (i.e. the two premises) under which the h confidence region does indeed yield probability h that θCD. A frequentist will baulk at the first premise, because it involves a prior on θ, and this sort of deal-breaker is inescapable on the route to a probability. But for a Bayesian, it is acceptable---nay, essential. These conditions are sufficient but not necessary, so there are many other circumstances under which the Bayesian P(θCD|DI) equals h. Equally though, there are many circumstances in which P(θCD|DI)h, especially when the prior information is significant.

We have applied a Bayesian analysis just as a consistent Bayesian would, given the information at hand, including statistics D. But a Bayesian, if he possibly can, will apply his methods to the raw measurements instead---to the {xi}, rather than x¯. Oftentimes, collapsing the raw data into summary statistics D destroys information in the data; and then the summary statistics are incapable of speaking as eloquently as the original data about the parameters θ.

CarbonFlambe--Reinstate Monica
fuente
Would it be correct to say that a Bayesian is committed to take all the available information into account, while interpretation given in the question ignored D in some sense?
qbolec
Is it a good mental picture to illustrate the situation: imagine a grayscale image, where intensity of pixel x,y is the joint ppb of real param being y and observed stat being x. In each row y, we mark pixels which have 95% mass of the row. For each observed stat x, we define CI(x) to be the set of rows which have marked pixels in column x. Now, if we choose x,y randomly then CI(x) will contain y iff x,y was marked, and mass of marked pixels is 95% for each y. So, frequentists say that keeping y fixed, chance is 95%, OP says, that not fixing y also gives 95%, and bayesians fix y and don't know
qbolec
@qbolec It is correct to say that in the Bayesian method one cannot arbitrarily ignore some information while taking account of the rest. Frequentists say that for all y the expectation of yCI(x) (as a Boolean integer) under the sampling distribution prob(x|y,I) is 0.95. The frequentist 0.95 is not a probability but an expectation.
CarbonFlambe--Reinstate Monica
6

from a Bayesian probability perspective, why doesn't a 95% confidence interval contain the true parameter with 95% probability?

Two answers to this, the first being less helpful than the second

  1. There are no confidence intervals in Bayesian statistics, so the question doesn't pertain.

  2. In Bayesian statistics, there are however credible intervals, which play a similar role to confidence intervals. If you view priors and posteriors in Bayesian statistics as quantifying the reasonable belief that a parameter takes on certain values, then the answer to your question is yes, a 95% credible interval represents an interval within which a parameter is believed to lie with 95% probability.

If I have a process that I know produces a correct answer 95% of the time then the probability of the next answer being correct is 0.95 (given that I don't have any extra information regarding the process).

yes, the process guesses a right answer with 95% probability

Similarly if someone shows me a confidence interval that is created by a process that will contain the true parameter 95% of the time, should I not be right in saying that it contains the true parameter with 0.95 probability, given what I know?

Just the same as your process, the confidence interval guesses the correct answer with 95% probability. We're back in the world of classical statistics here: before you gather the data you can say there's a 95% probability of randomly gathered data determining the bounds of the confidence interval such that the mean is within the bounds.

With your process, after you've gotten your answer, you can't say based on whatever your guess was, that the true answer is the same as your guess with 95% probability. The guess is either right or wrong.

And just the same as your process, in the confidence interval case, after you've gotten the data and have an actual lower and upper bound, the mean is either within those bounds or it isn't, i.e. the chance of the mean being within those particular bounds is either 1 or 0. (Having skimmed the question you refer to it seems this is covered in much more detail there.)

How to interpret a confidence interval given to you if you subscribe to a Bayesian view of probability.

There are a couple of ways of looking at this

  1. Technically, the confidence interval hasn't been produced using a prior and Bayes theorem, so if you had a prior belief about the parameter concerned, there would be no way you could interpret the confidence interval in the Bayesian framework.

  2. Another widely used and respected interpretation of confidence intervals is that they provide a "plausible range" of values for the parameter (see, e.g., here). This de-emphasises the "repeated experiments" interpretation.

Moreover, under certain circumstances, notably when the prior is uninformative (doesn't tell you anything, e.g. flat), confidence intervals can produce exactly the same interval as a credible interval. In these circumstances, as a Bayesianist you could argue that had you taken the Bayesian route you would have gotten exactly the same results and you could interpret the confidence interval in the same way as a credible interval.

TooTone
fuente
but for sure confidence intervals exist even if I subscribe to a bayesian view of probability, they just wont dissapear, right? :)The situation I was asking about was how to interpret a confidence interval given to you if you subscribe to a Bayesian view of probability.
Rasmus Bååth
The problem is that confidence intervals aren't produced using a Bayesian methodology. You don't start with a prior. I'll edit the post to add something which might help.
TooTone
2

I'll give you an extreme example where they are different.

Suppose I create my 95% confidence interval for a parameter θ as follows. Start by sampling the data. Then generate a random number between 0 and 1. Call this number u. If u is less than 0.95 then return the interval (,). Otherwise return the "null" interval.

Now over continued repititions, 95% of the CIs will be "all numbers" and hence contain the true value. The other 5% contain no values, hence have zero coverage. Overall, this is a useless, but technically correct 95% CI.

The Bayesian credible interval will be either 100% or 0%. Not 95%.

probabilityislogic
fuente
So is it correct to say that before seeing a confidence interval there is a 95% probability that it will contain the true parameter, but for any given confidence interval the probability that it covers the true parameter depends on the data (and our prior)? To be honest, what I'm really struggling with is how useless confidence intervals sounds (credible intervals I like on the other hand) and the fact that I never the less will have to teach them to our students next week... :/
Rasmus Bååth
This question has some more examples, plus a very good paper comparing the two approaches
probabilityislogic
1

"from a Bayesian probability perspective, why doesn't a 95% confidence interval contain the true parameter with 95% probability? "

In Bayesian Statistics the parameter is not a unknown value, it is a Distribution. There is no interval containing the "true value", for a Bayesian point of view it does not even make sense. The parameter it's a random variable, you can perfectly know the probability of that value to be between x_inf an x_max if you know the distribuition. It's just a diferent mindset about the parameters, usually Bayesians used the median or average value of the distribuition of the parameter as a "estimate". There is not a confidence interval in Bayesian Statistics, something similar is called credibility interval.

Now from a frequencist point of view, the parameter is a "Fixed Value", not a random variable, can you really obtain probability interval (a 95% one) ? Remember that it's a fixed value not a random variable with a known distribution. Thats why you past the text :"A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data actually obtained."

The idea of repeating the experience over and over... is not Bayesian reasoning it's a Frequencist one. Imagine a real live experiment that you can only do once in your life time, can you/should you built that confidence interval (from the classical point of view )?.

But... in real life the results could get pretty close ( Bayesian vs Frequencist), maybe thats why It could be confusing.

blew
fuente