¿Cómo resolver la paradoja de Simpson?

35

La paradoja de Simpson es un rompecabezas clásico discutido en los cursos de introducción a la estadística en todo el mundo. Sin embargo, mi curso se conformó con simplemente nota de que existía un problema y no proporcionan una solución. Me gustaría saber cómo resolver la paradoja. Es decir, cuando se enfrenta a la paradoja de un Simpson, donde dos opciones diferentes parecen competir por el ser la mejor opción en función de cómo se reparte los datos, que la elección se tiene que escoger?

Para hacer el hormigón problema, consideremos el primer ejemplo dado en el artículo correspondiente de Wikipedia . Se basa en un estudio real sobre un tratamiento para los cálculos renales.

ingrese la descripción de la imagen aquí

Supongamos que soy un médico y una prueba revela que un paciente tiene cálculos renales. Utilizando sólo la información proporcionada en la mesa, me gustaría para determinar si debería adoptar el tratamiento A o B. El tratamiento Parece que si sé que el tamaño de la piedra, entonces debemos preferir el tratamiento A. Pero si no lo hacemos, entonces debemos preferir el tratamiento B.

Pero considerar otra manera plausible de llegar a una respuesta. Si la piedra es grande, se debe elegir una, y si es pequeño, se debe elegir de nuevo A. Así que incluso si no sabemos el tamaño de la piedra, por el método de los casos, vemos que debemos preferir A. esto contradice el razonamiento anterior.

Por lo tanto: Un paciente entra en mi oficina. Una prueba revela que presentan cálculos renales, pero no me da ninguna información sobre su tamaño. ¿Qué tratamiento recomiendo? ¿Hay alguna resolución aceptado este problema?

Wikipedia sugiere una resolución usando "redes bayesianas causales" y una prueba de "puerta trasera", pero no tengo idea de cuáles son.

Patata
fuente
2
El enlace Paradox de Basic Simpson mencionado anteriormente es un ejemplo de datos de observación. No podemos decidir inequívocamente entre los hospitales porque los pacientes probablemente no fueron asignados al azar a los hospitales y la pregunta planteada no nos da una manera de saber si, por ejemplo, un hospital tendía a obtener pacientes de mayor riesgo. Desglosando los resultados en operaciones AE no aborda ese problema.
Emil Friedman
@EmilFriedman Estoy de acuerdo en que es cierto que podemos decidir inequívocamente entre hospitales. Pero ciertamente los datos son compatibles uno sobre el otro. (No es cierto que los datos no nos hayan enseñado nada sobre la calidad de los hospitales.)
Papa el

Respuestas:

14

En su pregunta, declara que no sabe qué son las "redes bayesianas causales" y las "pruebas de puerta trasera".

Supongamos que tiene una red bayesiana causal. Es decir, un gráfico acíclico dirigido cuyos nodos representan proposiciones y cuyos bordes dirigidos representan posibles relaciones causales. Puede tener muchas de esas redes para cada una de sus hipótesis. Hay tres formas de hacer un argumento convincente sobre la fuerza o la existencia de un borde .A?B

La forma más fácil es una intervención. Esto es lo que sugieren las otras respuestas cuando dicen que la "aleatorización adecuada" solucionará el problema. Usted forzar al azar a tener valores diferentes y medir B . Si puedes hacer eso, ya terminaste, pero no siempre puedes hacerlo. En su ejemplo, puede ser poco ético dar a las personas tratamientos ineficaces para enfermedades mortales, o pueden tener algo que decir en su tratamiento, por ejemplo, pueden elegir el menos duro (tratamiento B) cuando sus cálculos renales son pequeños y menos dolorosos.AB

La segunda forma es el método de la puerta de entrada. ¿Quieres mostrar que actúa sobre B a través de C , es decir, A C B . Si se supone que C es potencialmente causada por una , pero no tiene otras causas, y se puede medir que C se correlaciona con una , y B se correlaciona con C , entonces se puede concluir pruebas debe fluir a través de C . El ejemplo original: A está fumando, B es cáncer, CABCACBCACABCCABCEs la acumulación de alquitrán. El alquitrán solo puede provenir del tabaquismo, y se correlaciona tanto con el tabaquismo como con el cáncer. Por lo tanto, fumar causa cáncer a través del alquitrán (aunque podría haber otras vías causales que mitigan este efecto).

La tercera forma es el método de la puerta trasera. ¿Quieres mostrar que y B no están correlacionados a causa de una "puerta trasera", por ejemplo, causa común, es decir, un D B . Puesto que usted ha asumido un modelo causal, que sólo necesita bloquear la totalidad de los caminos (mediante la observación de las variables y acondicionado en ellos) que la evidencia puede fluir desde una y hasta B . Es un poco complicado bloquear estas rutas, pero Pearl ofrece un algoritmo claro que le permite saber qué variables debe observar para bloquear estas rutas.ABADBAB

Gung tiene razón en que con una buena aleatorización, los factores de confusión no importan. Dado que suponemos que no está permitido intervenir en la causa hipotética (tratamiento), cualquier causa común entre la causa hipotética (tratamiento) y el efecto (supervivencia), como la edad o el tamaño de los cálculos renales, será un factor de confusión. La solución es tomar las medidas correctas para bloquear todas las puertas traseras. Para más información ver:

Perla, Judea. "Diagramas causales para la investigación empírica". Biometrika 82.4 (1995): 669-688.


XY are both causes of success Z. X may be a cause of Y if other doctors are assigning tratment based on kidney stone size. Clearly there are no other causal relationships between X,Y, and Z. Y comes after X so it cannot be its cause. Similarly Z comes after X and Y.

Since X is a common cause, it should be measured. It is up to the experimenter to determine the universe of variables and potential causal relationships. For every experiment, the experimenter measures the necessary "back door variables" and then calculates the marginal probability distribution of treatment success for each configuration of variables. For a new patient, you measure the variables and follow the treatment indicated by the marginal distribution. If you can't measure everything or you don't have a lot of data but know something about the architecture of the relationships, you can do "belief propagation" (Bayesian inference) on the network.

Neil G
fuente
2
Very nice answer. Could you briefly say how to apply this framework to the example I give in the question? Does it give the expected answer (A)?
Potato
Thanks! Do you know of a good, short introduction to "belief propagation"? I am interested in learning more.
Potato
@Potato: I learned it from his book "Probabilistic Reasoning in Intelligent Systems". There are many tutorials online, but it's hard to find one that builds intuition rather than just presenting the algorithm.
Neil G
22

I have a prior answer that discusses Simpson's paradox here: Basic Simpson's paradox. It may help you to read that to better understand the phenomenon.

In short, Simpson's paradox occurs because of confounding. In your example, the treatment is confounded* with the kind of kidney stones each patient had. We know from the full table of results presented that treatment A is always better. Thus, a doctor should choose treatment A. The only reason treatment B looks better in the aggregate is that it was given more often to patients with the less severe condition, whereas treatment A was given to patients with the more severe condition. Nonetheless, treatment A performed better with both conditions. As a doctor, you don't care about the fact that in the past the worse treatment was given to patients who had the lesser condition, you only care about the patient before you, and if you want that patient to improve, you will provide them with the best treatment available.

*Note that the point of running experiments, and randomizing treatments, is to create a situation in which the treatments are not confounded. If the study in question was an experiment, I would say that the randomization process failed to create equitable groups, although it may well have been an observational study--I don't know.

gung - Reinstate Monica
fuente
You opt for the normalization approach also suggested by the other answer. I find this problematic. It is possible to exhibit two partitions of the same data set that give different conclusions when normalized. See my link and quote in reply to the other answer.
Potato
2
I haven't read the Stanford article. However, I don't find the reasoning in the quote compelling. It may well be that in some population, treatment B is better than treatment A. This doesn't matter. If that is true of some population, it is only because the population's characteristics are confounded. You are faced w/ a patient (not a population), & that patient is more likely to improve under treatment A w/o regard for whether that patient has large or small kidney stones. You should choose treatment A.
gung - Reinstate Monica
2
Is the young / old partition confounded? If not, this will not be a problem. If so, then we would use the full information to make the best decision. Based on what we know at present, the 'treatment B looks best in the aggregate' is a red herring. It only appears to be the case because of the confounding, but it is a (statistical) illusion.
gung - Reinstate Monica
2
You would have a more complicated table that took both kidney stone size & age into account. You can look at the Berkeley gender bias case example on the Wikipedia page.
gung - Reinstate Monica
1
Hate extending comments this long but...I wouldn't say that the paradox is always always due to confounding. It's due to a relationship among variables which a confounding variable will have, but I wouldn't call all variables leading to a Simpson paradox confounding (e.g. weight of 30 yr. olds and 90 yr. olds x amount of potato chips consumed per anum - because 90 yr. olds are much lighter to begin with the main effect of chips may be negative without the interaction included. I wouldn't call the age a confound though. (see first fig. on Wikipedia page.)
John
4

Do you want the solution to the one example or the paradox in general? There is none for the latter because the paradox can arise for more than one reason and needs to be assessed on a case by case basis.

The paradox is primarily problematic when reporting summary data and is critical in training individuals how to analyze and report data. We don't want researchers reporting summary statistics that hide or obfuscate patterns in the data or data analysts failing to recognize what the real pattern in the data is. No solution was given because there is no one solution.

In this particular case the doctor with the table would clearly always pick A and ignore the summary line. It makes no difference if they know the size of the stone or not. If someone analyzing the data had only reported the summary lines presented for A and B then there'd be an issue because the data the doctor received wouldn't reflect reality. In this case they probably should have also left the last line off of the table since it's only correct under one interpretation of what the summary statistic should be (there are two possible). Leaving the reader to interpret the individual cells would generally have produced the correct result.

(Your copious comments seem to suggest you're most concerned about unequal N issues and Simpson is broader than that so I'm reluctant to dwell on the unequal N issue further. Perhaps ask a more targeted question. Furthermore, you seem to think I am advocating a normalization conclusion. I am not. I am arguing that you need to consider that the summary statistic is relatively arbitrarily selected and that selection by some analyst gave rise to the paradox. I'm further arguing that you look at the cells you have.)

John
fuente
You claim we should ignore the summary line. Why is this "clear"?
Potato
It's clear because treatment A is better with large or small stones and B only comes out because of unequal N's. Furthermore, the final line is an interpretation not gospel. There are at least two ways to calculate that line. You would only calculate it that way if you want to say something about the particular sample.
John
I'm sorry, I don't understand why the summary line is an incorrect report. I think I'm missing your central point. Could you please explain?
Potato
1
You could normalize and then average, which gives the "correct" result (A). But this illicit. The following quote is from the relevant article in the Stanford Encyclopedia of Philosophy, available here: plato.stanford.edu/entries/paradox-simpson
Potato
2
"Simpson's Reversals show that there are numerous ways of partitioning a population that are consistent with associations in the total population. A partition by gender might indicate that both males and females fared worse when provided with a new treatment, while a partition of the same population by age indicated that patients under fifty, and patients fifty and older both fared better given the new treatment. Normalizing data from different ways of partitioning the same population will provide incompatible conclusions about the associations that hold in the total population."
Potato
4

One important "take away" is that if treatment assignments are disproportionate between subgroups, one must take subgroups into account when analyzing the data.

A second important "take away" is that observational studies are especially prone to delivering wrong answers due to the unknown presence of Simpson's paradox. That's because we cannot correct for the fact that Treatment A tended to be given to the more difficult cases if we don't know that it was.

In a properly randomized study we can either (1) allocate treatment randomly so that giving an "unfair advantage" to one treatment is highly unlikely and will automatically get taken care of in the data analysis or, (2) if there is an important reason to do so, allocate the treatments randomly but disproportionately based on some known issue and then take that issue into account during the analysis.

Emil Friedman
fuente
+1, however "automatically get taken care of" isn't quite true (at least in the immediate situation, which is what you primarily care about). It is true in the long run, but you still can very much have type I & type II errors due to sampling error (ie, patients in 1 treatment condition tended to have more severe diseases by chance alone).
gung - Reinstate Monica
But the effect of sampling error will be taken into account when we analyze the contingency table and calculate and properly interpret the p-value.
Emil Friedman