36

Benjamini y Hochberg desarrollaron el primer método (y aún más utilizado, creo) para controlar la tasa de descubrimiento falso (FDR).

Quiero comenzar con un montón de valores de P, cada uno para una comparación diferente, y decidir cuáles son lo suficientemente bajos como para ser llamados un "descubrimiento", controlando el FDR a un valor específico (digamos 10%). Una suposición del método habitual es que el conjunto de comparaciones son independientes o tienen "dependencia positiva", pero no puedo entender exactamente qué significa esa frase en el contexto del análisis de un conjunto de valores de P.

multiple-comparisons non-independent false-discovery-rate Harvey Motulsky
fuente

1

¡Gracias por otorgar la recompensa a mi respuesta, Harvey! ¿Diría que resuelve este problema por usted o está buscando una exposición más detallada? Noté que aún no has aceptado ninguna respuesta, por eso me gustaría aclarar. Gracias. Quizás pueda comentar o editar su Q para aclarar lo que aún le gustaría haber aclarado.

ameba dice Reinstate Monica

2

@ameba. La fecha límite para la recompensa estaba sobre mí, y su respuesta fue, con mucho, la mejor. Francamente, nunca se me ocurrió en ese momento que dar una recompensa no era aceptar la respuesta. Pero sé que son distintos (culparé al jet lag). Pero una respuesta completa realmente necesita incluir ejemplos realistas en los que el conjunto de valores P tenga y no una dependencia positiva. Esperaré a aceptar una respuesta durante una semana con la esperanza de que alguien pueda dar ambos tipos de ejemplos, por lo que el significado es claro.

Harvey Motulsky

Probablemente este no sea realmente un ejemplo satisfactorio, pero es muy fácil encontrar valores p con y sin dependencia positiva si pensamos en realizar pruebas de una cola en variables correlacionadas. Imagine que estoy probando si A = 0 y también si B = 0 contra alternativas de una cola (A> 0 y B> 0). Además, imagine que B depende de A. Por ejemplo, imagine que quiero saber si una población contiene más mujeres que hombres, y también si la población contiene más ovarios que testículos. Conocer claramente el valor p de la primera pregunta cambia nuestra expectativa del valor p para la segunda

Jacob Socolar

Gracias Harvey. Espero que esté claro que no estaba tratando de presionarlo para que aceptara mi respuesta (!!) sino más bien para aclarar qué tipo de respuesta está buscando en este hilo, y lo que aún le gustaría haber aclarado. Realmente no soy un experto en este tema, solo trato de darle sentido.

ameba dice Reinstate Monica

Ambos valores p cambian en la misma dirección, y esto es PRD. Pero si en cambio pruebo la segunda hipótesis de que la población 2 tiene más testículos que ovarios, nuestra expectativa para el segundo valor p disminuye a medida que aumenta el primer valor p. Esto no es PRD.

Jacob Socolar

20

Desde su pregunta y, en particular, sus comentarios a otras respuestas, me parece que usted está confundido sobre todo sobre el "cuadro grande" aquí: a saber, lo que hace "dependencia positiva" se refiere en este contexto a todos - a diferencia de lo es el significado técnico de la condición PRDS. Así que hablaré sobre el panorama general.

El panorama

Imagínese que usted está probando nula hipótesis, e imagina que todas ellas son verdaderas. Cada uno de los valores- es una variable aleatoria; repetir el experimento una y otra vez produciría un valor diferente de red cada vez, por lo que se puede hablar de una distribución de valores (debajo del valor nulo). Es bien sabido que para cualquier prueba, una distribución de valores bajo nulo debe ser uniforme; entonces, en el caso de la prueba de multiplicación, todas las distribuciones marginales de de los valores serán uniformes. $N$ $N$ $p$ $p$ $p$ $p$ $N$ $p$

Si todos los datos y todas las pruebas son independientes entre sí, entonces la distribución -dimensional conjunta de los valores también será uniforme. Esto será cierto, por ejemplo, en una situación clásica de "gominola" cuando se prueban un montón de cosas independientes: $N$ $N$ $p$

Sin embargo, no tiene por qué ser así. En principio, cualquier par de valores puede correlacionarse, ya sea positiva o negativamente, o depender de una manera más complicada. Considere probar todas las diferencias por pares en las medias entre cuatro grupos; esto es pruebas. Cada uno de los seis valores solo está distribuido uniformemente. Pero todos están positivamente correlacionados: si (en un intento dado) el grupo A por casualidad tiene una media particularmente baja, entonces la comparación A-B podría producir un valor bajo (esto sería un falso positivo). Pero en esta situación, es probable que A-vs-C, así como A-vs-D, también produzcan $p$ $N=4\cdot 3/2=6$ $p$ $p$ $p$ -valores. Por lo tanto, los valores son obviamente no independientes y además están positivamente correlacionados entre sí. $p$

Esto es, informalmente, a lo que se refiere la "dependencia positiva".

Esta parece ser una situación común en pruebas múltiples. Otro ejemplo sería probar las diferencias en varias variables que están correlacionadas entre sí. Obtener una diferencia significativa en uno de ellos aumenta las posibilidades de obtener una diferencia significativa en otro.

Es complicado encontrar un ejemplo natural donde los valores serían "negativamente dependientes". @ user43849 comentó en los comentarios anteriores que para las pruebas unilaterales es fácil: $p$

Imagine que estoy probando si A = 0 y también si B = 0 contra alternativas de una cola (A> 0 y B> 0). Además, imagine que B depende de A. Por ejemplo, imagine que quiero saber si una población contiene más mujeres que hombres, y también si la población contiene más ovarios que testículos. Claramente, conocer el valor p de la primera pregunta cambia nuestra expectativa del valor p para la segunda. Ambos valores p cambian en la misma dirección, y esto es PRD. Pero si en cambio pruebo la segunda hipótesis de que la población 2 tiene más testículos que ovarios, nuestra expectativa para el segundo valor p disminuye a medida que aumenta el primer valor p. Esto no es PRD.

Pero hasta ahora no he podido encontrar un ejemplo natural con puntos nulos.

Ahora, la formulación matemática exacta de la "dependencia positiva" que garantiza la validez del procedimiento de Benjamini-Hochberg es bastante complicada. Como se menciona en otras respuestas, la referencia principal es Benjamini & Yekutieli 2001 ; muestran que la propiedad PRDS ("dependencia de regresión positiva en cada uno de un subconjunto") implica el procedimiento Benjamini-Hochberg. Es una forma relajada de la propiedad PRD ("dependencia de regresión positiva"), lo que significa que PRD implica PRDS y, por lo tanto, también implica el procedimiento Benjamini-Hochberg.

Para las definiciones de PRD / PRDS, consulte la respuesta de @ user43849 (+1) y el documento de Benjamini & Yekutieli. Las definiciones son bastante técnicas y no tengo una buena comprensión intuitiva de ellas. De hecho, B&Y menciona también varios otros conceptos relacionados: positividad total multivariante de orden dos (MTP2) y asociación positiva. Según B&Y, están relacionados de la siguiente manera (el diagrama es mío):

$\hskip{10em}$

MTP2 implica PRD que implica PRDS que garantiza la corrección del procedimiento BH. PRD también implica PA, pero PA PRDS. $\ne$

ameba dice Reinstate Monica
fuente

¿Sería un ejemplo de dependencia negativa las pruebas por pares post hoc que siguen, por ejemplo, un ANOVA unidireccional de tres grupos, donde

, pero

, mientras que

y

, entonces mientras que

es menos probable que rechace (porque bajo

μ_{A} < μ_{B} < μ_{C}

$\mu_{A} < \mu_{B} < \mu_{C}$

{\bar{x}}_{B} < μ_{B}

$\bar{x}_{B} < \mu_{B}$

{\bar{x}}_{A} \approx μ_{A}

$\bar{x}_{A}\approx \mu_{A}$

{\bar{x}}_{C} \approx μ_{C}

$\bar{x}_{C}\approx \mu_{C}$

p_{A vs. B}

$p_{A\text{ vs. }B}$

H_{0}

$H_{0}$

), pero debido a la dependencia

esmásprobable que rechace?

| {\bar{x}}_{A} - {\bar{x}}_{B} | < | {\bar{x}}_{B} - {\bar{x}}_{C} |

$|\bar{x}_{A}-\bar{x}_{B}| < |\bar{x}_{B}-\bar{x}_{C}|$

p_{B vs. C}

$p_{B\text{ vs. }C}$

Alexis

1

@Alexis Estaba pensando en estas líneas, pero no creo que esto funcione porque tenemos que considerar lo que sucede bajo nulo . En este caso, el valor nulo es que

, por lo que su razonamiento se descompone.

μ_{A} = μ_{B} = μ_{C}

$\mu_A=\mu_B=\mu_C$

ameba dice Reinstate Monica

Entonces, si es difícil pensar en situaciones de dependencia negativas, entonces el procedimiento de Benjamini-Hochberg es válido para situaciones como pruebas por pares post hoc luego del rechazo de una hipótesis nula omnibus con respecto a grupos independientes (por ejemplo, ANOVA no bloqueado, Q de Cochran, Kruskal- Wallis, etc.)?

Alexis

@ Alexis Creo que esto es correcto, sí. Todavía estoy tratando de encontrar un ejemplo natural con dependencia negativa ...

dice ameba Reinstate Monica el

¡ROCK! ¡Vete niña! :) (Para los significados sin género de la palabra "niña";).

Alexis

18

Gran pregunta! Retrocedamos y comprendamos qué hizo Bonferroni y por qué era necesario que Benjamini y Hochberg desarrollaran una alternativa.

Se ha vuelto necesario y obligatorio en los últimos años realizar un procedimiento llamado corrección de pruebas múltiples. Esto se debe al creciente número de pruebas que se realizan simultáneamente con ciencias de alto rendimiento, especialmente en genética con la llegada de los estudios de asociación del genoma completo (GWAS). Disculpe mi referencia a la genética, ya que es mi área de trabajo. Si estamos realizando 1.000.000 pruebas simultáneamente en , esperaríamos falsos positivos. Esto es ridículamente grande y, por lo tanto, debemos controlar el nivel en el que se evalúa la importancia. La corrección de bonferroni, es decir, dividir el umbral de aceptación (0.05) por el número de pruebas independientes $P = 0.05$ $50,000$ $(0.05/M)$ corrige la tasa de error familiar ( ). $FWER$

Esto es cierto porque el FWER está relacionada con tasa de error de la prueba en cuanto a ( ) por la ecuación . Es decir, 100 por ciento menos 1 resta la tasa de error de prueba inteligente elevada a la potencia del número de pruebas independientes realizadas. Suponiendo que $TWER$ $FWER = 1 - (1 - TWER)^M$ da $(1- 0.05)^{1/M} = 1-\frac{0.05}{M}$ , que es el valor P de aceptación ajustado para M pruebas completamente independientes. $TWER \approx \frac{0.05}{M}$

El problema que encontramos ahora, al igual que Benjamini y Hochberg, es que no todas las pruebas son completamente independientes. Por lo tanto, la corrección de Bonferroni, aunque robusta y flexible, es una corrección excesiva . Considere el caso en genética donde dos genes están unidos en un caso llamado desequilibrio de enlace; es decir, cuando un gen tiene una mutación, es más probable que se exprese otro. Obviamente, estas no son pruebas independientes, aunque en la corrección de bonferroni se supone que son . Es aquí donde comenzamos a ver que dividir el valor de P entre M está creando un umbral artificialmente bajo debido a las pruebas independientes asumidas que realmente se influyen entre sí, ergo creando una M que es demasiado grande para nuestra situación real, donde las cosas no están No es independiente.

El procedimiento sugerido por Benjamini y Hochberg, y aumentado por Yekutieli (y muchos otros) es más liberal que Bonferroni, y de hecho, la corrección de Bonferroni solo se usa en los estudios más grandes. Esto se debe a que, en FDR, asumimos cierta interdependencia por parte de las pruebas y, por lo tanto, una M que es demasiado grande y poco realista y elimina los resultados que, en realidad, nos importan. Por lo tanto, en el caso de 1000 pruebas que no son independientes, la verdadera M no sería 1000, sino algo más pequeño debido a las dependencias. Por lo tanto, cuando dividimos 0.05 entre 1000, el umbral es demasiado estricto y evita algunas pruebas que pueden ser de interés.

No estoy seguro de si le importan las mecánicas detrás del control de la dependencia, aunque si lo hace, he vinculado el documento de Yekutieli para su referencia. También adjuntaré algunas otras cosas para su información y curiosidad.

Espero que esto haya ayudado de alguna manera, si he tergiversado algo, por favor hágamelo saber.

~ ~ ~

Referencias

Documento de Yekutieli sobre dependencias positivas: http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_yekutieli_ANNSTAT2001.pdf

(ver 1.3 - El problema).

Explicación de Bonferroni y otras cosas de interés: revisiones de Nature Genetics. Pruebas de poder estadístico y significado en estudios genéticos a gran escala - Pak C Sham y Shaun M Purcell

(ver recuadro 3.)

http://en.wikipedia.org/wiki/Familywise_error_rate

EDITAR:

En mi respuesta anterior, no definí directamente la dependencia positiva, que era lo que se pedía. En el artículo de Yekutieli, la sección 2.2se titula Dependencia positiva, y sugiero esto ya que es muy detallado. Sin embargo, creo que podemos hacerlo un poco más sucinto.

$I_0$ $I_0$

PRDS

$X$ is our whole set of test statistics, and $I_0$ is our set of test statistics which correctly support the null. Thus, for $X$ to be PRDS (positively dependent) on $I_0$ , the probability of $X$ being an element of $I_0$ (nulls) increases in non decreasing set of test statistics $x$ (elements of $X$ ).

Interpreting this, as we order our $P$ -values from lowest to highest, the probability of being part of the null set of test statistics is the lowest at the smallest P value, and increases from there. The FDR sets a boundary on this list of test statistics such that the probability of being part of the null set is 0.05. This is what we are doing when controlling for FDR.

In summation, the property of positive dependency is really the property of positive regression dependency of our whole set of test statistics upon our set of true null test statistics, and we control for an FDR of 0.05; thus as P values go from the bottom up (the step up procedure), they increase in probability of being part of the null set.

My former answer in the comments about the covariance matrix was not incorrect, just a little bit vague. I hope this helps a little bit more.

Chris C
fuente

6

Thanks. You provide a clear overview of controlling family wise error rates (Bonferroni etc.) vs controlling the FDR, but I still don't understand what "positive dependency" means. Consider that I have 1000 P values, testing expression of 1000 different genes comparing people with and without some disease. I use the BH method to decide which of these comparisons are "discoveries". What does "positive dependency" mean in this context?

Harvey Motulsky

9

A small but important note: Bonferroni makes absolutely no assumption regarding independence. In fact, it will cover correctly in the mutually exclusive case, which, in a way, is about as far from independent as you can get. There is a correction procedure (Sidak) that does assume independence and will more strongly control FWER under that assumption. Afew other aspects of this answer could use some light touch-up as well .

cardinal

2

@ChrisC I still don't understand. "Covariance matrix between elements"? I start with a list of P values, and want to decide which are low enough to be called "discoveries" worth following up on (with the FDR controlled). What are the elements of the covariance matrix? Say each P value is comparing expression of a particular gene between groups, and there are many such genes. For each gene, a t test compares the groups resulting in a P value. What does it mean, in this situation, for "elements to vary together" or having "positive correlations between themselves"?

Harvey Motulsky

2

@ChrisC Thanks. It is become more clear, but I still don't really grasp what this assumption means. The whole point of knowing about the assumption behind the method is to know when you are likely to be violating it. So it would help to list some scenarios where the assumption is not true. When would a lower P value not be associated with a higher probability of the null hypothesis being false?

Harvey Motulsky

1

This does not answer the question.

Alexis

10

I found this pre-print helpful in understanding the meaning. It should be said that I offer this answer not as an expert in the topic, but as an attempt at understanding to be vetted and validated by the community.

Thanks to Amoeba for very helpful observations about the difference between PRD and PRDS, see comments

Positive regression dependency (PRD) means the following: Consider the subset of p-values (or equivalently, test statistics) that correspond to true null hypotheses. Call the vector of these p-values $p$ . Let $C$ be a set of vectors with length equal to the length of $p$ and let $C$ have the following property:

If some vector $q$ is in $C$ , and
We construct some vector $r$ of the same length as $q$ so that all elements of $r$ are less than the corresponding elements of $q$ ( $r_i < q_i$ for all $i$ ), then
$r$ is also in $C$

(This means that $C$ is a "decreasing set".)

Assume we know something about the values of some of the elements of $p$ . Namely, $p_1 ... p_{n} < B_1 ... B_n$ . PRD means that the probability that $p$ is in $C$ never increases as $B_1 ... B_n$ increases.

In plain language, notice that we can formulate an expectation for any element $p_i$ . Since $p_i$ corresponds to a true null, it's unconditional expectation should be a uniform distribution from 0 to 1. But if the p-values are not independent, then our conditional expectation for $p_i$ given some other elements of $p_1 ... p_n$ might not be uniform. PRD means that raising increasing the value $p_1 ... p_n$ can never increase the probability that another element $p_i$ has lower value.

Benjamini and Yekutieli (2001) show that the Benjamini and Hochberg procedure for controlling FDR requires a condition they term positive regression dependence on a subset (PRDS). PRDS is similar to, and implied by, PRD. However, it is a weaker condition because it only conditions on one of $p_1 ... p_n$ at a time.

To rephrase in plain language: again consider the set of p-values that correspond to true null hypotheses. For any one of these p-values (call it $p_n$ ), imagine that we know $p_n < B$ , where $B$ is some constant. Then we can formulate a conditional expectation for the remaining p-values, given that $p_n < B$ . If the p-values are independent, then our expectation for the remaining p-values is the uniform distribution from 0 to 1. But if the p-values are not independent, then knowing $p_n < B$ might change our expectation for the remaining p-values. PRDS says that increasing the value of $B$ must not decrease our expectation for any of the remaining p-values corresponding to the true null hypotheses.

Edited to add:

Here's a putative example of a system that is not PRDS (R code below). The logic is that when samples a and b are very similar, it is more likely that their product will be atypical. I suspect that this effect (and not the non-uniformity of p-values under the null for the (a*b), (c*d) comparison) is driving the negative correlation in the p-values, but I cannot be sure. The same effect appears if we do a t-test for the second comparison (rather than a Wilcoxon), but the distribution of p-values still isn't uniform, presumably due to violations of the normality assumption.

ab <- rep(NA, 100000)  # We'll repeat the comparison many times to assess the relationships among p-values.
abcd <- rep(NA, 100000)

for(i in 1:100000){
  a <- rnorm(10)    # Draw 4 samples from identical populations.
  b <- rnorm(10)
  c <- rnorm(10)
  d <- rnorm(10)

  ab[i] <- t.test(a,b)$p.value          # We perform 2 comparisons and extract p-values
  abcd[i] <- wilcox.test((a*b),(c*d))$p.value
}

summary(lm(abcd ~ ab))    # The p-values are negatively correlated

ks.test(ab, punif)    # The p-values are uniform for the first test
ks.test(abcd, punif)   # but non-uniform for the second test.
hist(abcd)

Jacob Socolar
fuente

I'm sorry, but I don't really follow this.

Harvey Motulsky

Does the new final paragraph clear it up at all?

Jacob Socolar

@ Amoeba, yeah, I think you're right. The Yekutieli papers linked by previous posters are treatments of PRDS. As far as I can tell, PRD is the same property, but across all of the test statistics (or p-values), not just the subset corresponding to true nulls.

Jacob Socolar

1

Yup, you're absolutely right. Editing now.

Jacob Socolar

1

Interesting example, but the effect is super-weak: I get correlation coefficient (between ab and abcd) of around -0.03... But I don't get it: why do you say that "when samples a and b are very similar, it is more likely that their product will be atypical"?

ameba dice Reinstate Monica

4

In their paper, Benjamini and Yekutieli provide some examples of how positive regression dependence (PRD) is different from just being positively associated. The FDR control procedure relies on a weaker form of PRD which they call PRDS (i.e. PRD on each one from a subset of variables).

Positive dependency was originally proposed in the bivariate setting by Lehmann, but the multivariate version of this concept, known as positive regression dependency is what is relevant to multiple testing.

Here is a relevant excerpt from pg.6

Nevertheless, PRDS and positive association do not imply one another, and the difference is of some importance. For example, a multivariate normal distribution is positively associated iff all correlations are nonnegative. Not all correlations need be nonnegative for the PRDS property to hold (see Section 3.1, Case 1 below). On the other hand, a bivariate distribution may be positively associated, yet not positive regression dependent [Lehmann (1966)], and therefore also not PRDS on any subset. A stricter notion of positive association, Rosenbaum’s (1984) conditional (positive) association, is enough to imply PRDS: $\mathbf{X}$ is conditionally associated, if for any partition $(\mathbf{X}_1, \mathbf{X}_2)$ of $\mathbf{X}$ , and any function $h(\mathbf{X}_1)$ , $\mathbf{X}_2$ given $h(\mathbf{X}_1)$ is positively associated. It is important to note that all of the above properties, including PRDS, remain invariant to taking comonotone transformations in each of the coordinates [Eaton (1986)].
$\dots$ $\ldots$ Background on these concepts is clearly presented in Eaton (1986), supplemented by Holland and Rosenbaum (1986).

user3303
fuente

2

Positive dependence in this case means that the set of tests are positively correlated. The idea then is that if the variables in the set of tests that you have P-values for are positively correlated then each of the variables are not independent.

If you think back about a Bonferroni p-value correction, for example, you can guarantee that the type 1 error rate is less than 10% over say 100 statistically independent tests by setting your significance threshold to 0.1/100 = 0.001. But, what if each of those 100 tests a correlated in some way? Then you haven't really performed 100 separate tests.

In FDR, the idea is slightly different than the Bonferroni correction. The idea is to guarantee that only a certain percent (say 10%) of the things you declare significant are falsely declared significant. If you have correlated markers (positive dependence) in your dataset, the FDR value is chosen based on the total number of tests you perform (but the actual number of statistically independent tests is smaller). In this way it is more safe to conclude that the false discovery rate is falsely declaring significant 10% or less of the tests in your set of P-values.

Please see this book chapter for a discussion of positive dependence.

derrek
fuente

2

You explain FDR vs. Bonferroni, but don't define "positive dependency" but rather just reword it to "positively correlated" but I don't understand. Consider that I have 1000 P values, testing expression of 1000 different genes comparing people with and without some disease. I use the BH method to decide which of these comparisons are "discoveries". What does "positive dependency" mean in this context?

Harvey Motulsky

5

This answer is flat out wrong. Positive Regression Dependency and being positively associated are different from one another. The Benjamini Yekutieli paper explains this and provides references too. "Nevertheless, PRDS and positive association do not imply one another, and the difference is of some importance. For example, a multivariate normal distribution is positively associated iff all correlations are nonnegative. Not all correlations need be nonnegative for the PRDS property to hold (see Section 3.1, Case 1 below)." See pg. 6 of the paper.

user3303

El significado de "dependencia positiva" como condición para usar el método habitual para el control FDR

Respuestas:

El panorama

Edited to add: