¿Cuál es la distribución de

¿Cuál es la distribución del coeficiente de determinación, o R al cuadrado, , en regresión múltiple univariada lineal bajo la hipótesis nula ? $R^2$ $H_0:\beta=0$

¿Cómo depende de la cantidad de predictores y la cantidad de muestras ? ¿Existe una expresión de forma cerrada para el modo de esta distribución? $k$ $n>k$

En particular, tengo la sensación de que para la regresión simple (con un predictor ) esta distribución tiene modo en cero, pero para la regresión múltiple el modo está en un valor positivo distinto de cero. Si esto es cierto, ¿hay una explicación intuitiva de esta "transición de fase"? $x$

Actualizar

Como @Alecos mostraron a continuación, la distribución de hecho picos a cero cuando y y no en cero cuando . Siento que debería haber una visión geométrica de esta transición de fase. Considere la vista geométrica de OLS: es un vector en , define un subespacio dimensional allí. OLS equivale a proyectar en este subespacio, y es el coseno cuadrado del ángulo entre y su proyección . $k=2$ $k=3$ $k>3$ $\mathbf y$ $\mathbb R^n$ $\mathbf X$ $k$ $\mathbf y$ $R^2$ $\mathbf y$ $\hat{\mathbf y}$

Ahora, de @ respuesta de Alecos se deduce que si todos los vectores son al azar, a continuación, la distribución de probabilidad de este ángulo llegará a su máximo a para y , pero tendrá un modo en algún otro valor para . ¡¿Por qué?! $90^\circ$ $k=2$ $k=3$ $<90^\circ$ $k>3$

Actualización 2: Estoy aceptando la respuesta de @ Alecos, pero todavía tengo la sensación de que me estoy perdiendo alguna información importante aquí. Si alguien sugiere alguna otra visión (geométrica o no) sobre este fenómeno que lo haría "obvio", estaré encantado de ofrecer una recompensa.

regression mathematical-statistics r-squared intuition ameba dice Reinstate Monica
fuente

¿Estás dispuesto a asumir la normalidad del error?

Dimitriy V. Masterov

Sí, supongo que hay que asumirlo para que esta pregunta responda (?).

ameba dice Reinstate Monica

¿Has comprobado esto davegiles.blogspot.jp/2013/05/good-old-r-squared.html ?

Khashaa

@Khashaa: de hecho, tengo que admitir que encontré esa página de blogspot antes de publicar mi pregunta aquí. Honestamente, todavía quería tener una discusión sobre este fenómeno en nuestro foro, así que fingí no haberlo visto.

ameba dice Reinstate Monica

Pregunta CV muy relacionada stats.stackexchange.com/questions/123651/…

Alecos Papadopoulos

Respuestas:

Para la hipótesis específica (que todos los coeficientes regresores son cero, sin incluir el término constante, que no se examina en esta prueba) y bajo normalidad, lo sabemos (ver, por ejemplo, Maddala 2001, p. 155, pero tenga en cuenta que allí, $k$ cuenta el regresores sin el término constante, por lo que la expresión se ve un poco diferente) que la estadística

F = n - k k - 1 R 2 1 - R 2

$F = \frac {n-k}{k-1}\frac {R^2}{1-R^2}$ se distribuye como unavariable aleatoriacentral

F(k−1,n−k) $F(k-1, n-k)$ .

Tenga en cuenta que aunque no probamos el término constante, $k$ también lo cuenta.

Mover cosas,

(k - 1) F - (k - 1) F R 2 = (n - k) R 2 \Rightarrow (k - 1) F = R 2 [(n - k) + (k - 1) F]

$(k-1)F - (k-1)FR^2 = (n-k)R^2 \Rightarrow (k-1)F = R^2\big[(n-k) + (k-1)F\big]$

\Rightarrow R 2 = ( k - 1 ) F ( n - k ) + ( k - 1 ) F

$\Rightarrow R^2 = \frac {(k-1)F}{(n-k) + (k-1)F}$

Pero el lado derecho se distribuye como una distribución Beta , específicamente

R 2 \sim B e t a (k - 1 2, n - k 2)

$R^2 \sim Beta\left (\frac {k-1}{2}, \frac {n-k}{2}\right)$

El modo de esta distribución es

modo R 2 = k - 1 2 - 1 k - 1 2 + n - k 2 - 2 = k - 3 n - 5

$\text{mode}R^2 = \frac {\frac {k-1}{2}-1}{\frac {k-1}{2}+ \frac {n-k}{2}-2} =\frac {k-3}{n-5}$

MODO FINITO Y ÚNICO
De la relación anterior podemos inferir que para que la distribución tenga un modo único y finito debemos tener

k \geq 3, n > 5

$k\geq 3, n >5$

Esto es consistente con el requisito general para una distribución Beta, que es

{α > 1, β \geq 1}, O {α \geq 1, β > 1}

$\{\alpha >1 , \beta \geq 1\},\;\; \text {OR}\;\; \{\alpha \geq1 , \beta > 1\}$

como se puede inferir de este hilo CV o leer aquí .
Tenga en cuenta que si , obtenemos la distribución Uniforme, por lo que todos los puntos de densidad son modos (finitos pero no únicos). Lo que crea la pregunta: ¿Por qué, si , se distribuye como ? $\{\alpha =1 , \beta = 1\}$ $k=3, n=5$ $R^2$ $U(0,1)$

IMPLICACIONES
Suponga que tiene regresores (incluida la constante) observaciones. Bastante agradable regresión, sin sobreajuste. Luego $k=5$ $n=99$

R 2 ∣ ∣ β = 0 \sim B e t a (2, 47), mode R 2 = 1 47 \approx 0.021

$R^2\Big|_{\beta=0} \sim Beta\left (2, 47\right), \text{mode}R^2 = \frac 1{47} \approx 0.021$

y diagrama de densidad

ingrese la descripción de la imagen aquí

Intuición, por favor: esta es la distribución de bajo la hipótesis de que ningún regresor pertenece realmente a la regresión. Entonces, a) la distribución es independiente de los regresores, b) a medida que aumenta el tamaño de la muestra, su distribución se concentra hacia cero a medida que el aumento de la información reduce la variabilidad de la muestra pequeña que puede producir algún "ajuste", pero también c) como el número de regresores irrelevantes aumenta para un tamaño de muestra dado, la distribución se concentra hacia , y tenemos el fenómeno de "ajuste espurio". $R^2$ $1$

Pero también, tenga en cuenta cuán "fácil" es rechazar la hipótesis nula: en el ejemplo particular, para probabilidad acumulada ya ha alcanzado , por lo que un obtenido rechazará el nulo de "regresión insignificante" en nivel de significancia %. $R^2=0.13$ $0.99$ $R^2>0.13$ $1$

APÉNDICE
Para responder al nuevo problema con respecto al modo de distribución de , puedo ofrecer la siguiente línea de pensamiento (no geométrica), que lo vincula al fenómeno del "ajuste espurio": cuando ejecutamos mínimos cuadrados en un conjunto de datos , esencialmente resolvemos un sistema de ecuaciones lineales con incógnitas (la única diferencia de las matemáticas de la escuela secundaria es que en aquel entonces llamamos "coeficientes conocidos" lo que en regresión lineal llamamos "variables / regresores", "desconocido x" lo que ahora llamamos "coeficientes desconocidos" y "términos constantes", lo que conocemos como "variable dependiente"). Mientras $R^2$ $n$ $k$ $k<n$ el sistema está sobreidentificado y no hay una solución exacta, solo aproximada, y la diferencia surge como "varianza inexplicable de la variable dependiente", que es capturada por . Si el sistema tiene una solución exacta (suponiendo independencia lineal). En el medio, a medida que aumentamos el número de , reducimos el "grado de sobreidentificación" del sistema y nos "movemos hacia" la única solución exacta. Bajo este punto de vista, tiene sentido por qué aumenta espuriosamente con la adición de regresiones irrelevantes y, en consecuencia, por qué su modo se mueve gradualmente hacia , a medida que aumenta para $1-R^2$ $k=n$ $k$ $R^2$ $1$ $k$ . $n$

Alecos Papadopoulos
fuente

Es matemático Para

el primer parámetro de la distribución beta (el "

" en notación estándar) se vuelve más pequeño que la unidad. En ese caso, la distribución Beta no tiene modo finito, juegue con keisan.casio.com/exec/system/1180573226 para ver cómo cambian las formas. k=2 $k=2$

α $\alpha$

Alecos Papadopoulos

@Alecos Excelente respuesta! (+1) ¿Puedo sugerirle que agregue a su respuesta el requisito de que exista el modo? Esto por lo general se indica como

, pero de manera más sutil, está bien si se cumple la igualdad en una de las dos ... Creo que para nuestros propósitos esto se convierte en

, y al menos uno de Estas desigualdades son estrictas . α>1 $\alpha>1$

β>1 $\beta>1$

k≥3 $k \geq 3$

n≥k+2 $n \geq k + 2$

Silverfish

@Khashaa Excepto si la teoría lo exige, nunca excluyo la intersección de la regresión: es el nivel promedio de la variable dependiente, regresores o no regresores (y este nivel suele ser positivo, por lo que sería una especificación errónea tontamente auto-creada para omitirlo). Pero siempre lo excluyo de la prueba F de la regresión, ya que lo que me importa no es si la variable dependiente tiene una media incondicional distinta de cero, sino si los regresores tienen algún poder explicativo con respecto a las desviaciones de esta media.

Alecos Papadopoulos

+1! ¿Existen resultados para la distribución de

para

distinto de cero ? R2 $R^2$

βj $\beta_j$

Christoph Hanck el

@ChristophHanck Véase también davegiles.blogspot.jp/2013/05/good-old-r-squared.html

Alecos Papadopoulos

No rederive la distribución en la excelente respuesta de @ Alecos (es un resultado estándar, veaaquípara otra buena discusión) pero quiero completar más detalles sobre las consecuencias. En primer lugar, ¿cómo se ve la distribución nula depara un rango de valores dey? El gráfico en la respuesta de @ Alecos es bastante representativo de lo que ocurre en las regresiones múltiples prácticas, pero a veces la percepción se obtiene más fácilmente de casos más pequeños. He incluido la media, el modo (donde existe) y la desviación estándar. El gráfico / tabla merece un buen globo ocular: seve mejor a tamaño completo. Podría haber incluido menos facetas, pero el patrón habría sido menos claro; He adjuntado $\mathrm{Beta}(\frac{k-1}{2}, \, \frac{n-k}{2})$ $R^2$ $n$ $k$ Rcódigo para que los lectores puedan experimentar con diferentes subconjuntos de y . $n$ $k$

Distribution of R2 for small sample sizes

Valores de parámetros de forma

El esquema de color del gráfico indica si cada parámetro de forma es menor que uno (rojo), igual a uno (azul) o más de uno (verde). El lado izquierdo muestra el valor de mientras que está a la derecha. Como $\alpha$ $\beta$ , su valor aumenta en progresión aritmética por una diferencia común de $\alpha = \frac{k-1}{2}$ medida que nos movemos hacia la derecha de una columna a otra (agreguemos un regresor a nuestro modelo) mientras que, parafijo, $\frac{1}{2}$ $n$ disminuye en $\beta = \frac{n-k}{2}$ . El total $\frac{1}{2}$ se fija para cada fila (para un tamaño de muestra dado). Si en cambio arreglamosy nos movemos hacia abajo en la columna (aumentamos el tamaño de la muestra en 1), entoncespermanece constante yaumenta en $\alpha + \beta = \frac{n-1}{2}$ $k$ $\alpha$ $\beta$ . En términos de regresión,es la mitad del número de regresores incluidos en el modelo, yes la mitad de los grados residuales de libertad. Para determinar la forma de la distribución, estamos particularmente interesados en dóndeoiguales. $\frac{1}{2}$ $\alpha$ $\beta$ $\alpha$ $\beta$

El álgebra es sencillo para : tenemos $\alpha$ entonces. De hecho, esta es la única columna del gráfico de facetas que se llena de azul a la izquierda. De manera similar,para(lacolumnaes roja a la izquierda) ypara(desde lacolumnaadelante, el lado izquierdo es verde). $\frac{k-1}{2}=1$ $k=3$ $\alpha < 1$ $k<3$ $k=2$ $\alpha > 1$ $k>3$ $k=4$

Para tenemos $\beta=1$ tanto. Observe cómo estos casos (marcados con un lado azul a la derecha) cortan una línea diagonal a través del diagrama de facetas. Paraobtenemos(las gráficas con un lado verde a la izquierda se encuentran a la izquierda de la línea diagonal). Paranecesitamos, que involucra solo los casos más a la derecha en mi gráfico: entenemosy la distribución es degenerada, pero $\frac{n-k}{2}=1$ $k=n-2$ $\beta > 1$ $k < n - 2$ $\beta < 1$ $k > n - 2$ $n=k$ $\beta=0$ donde $n=k-1$ se traza (lado derecho en rojo). $\beta = \frac{1}{2}$

Como el PDF es , está claro que si (y solo si) entonces como . Podemos ver esto en el gráfico: cuando el lado izquierdo está sombreado en rojo, observe el comportamiento en 0. De manera similar cuando entonces como . ¡Mira donde el lado derecho es rojo! $f(x;\,\alpha,\,\beta) \propto x^{\alpha-1} (1-x)^{\beta-1}$ $\alpha<1$ $f(x) \to \infty$ $x \to 0$ $\beta<1$ $f(x) \to \infty$ $x \to 1$

Simetrías

Una de las características más llamativas del gráfico es el nivel de simetría, pero cuando se trata de la distribución Beta, ¡esto no debería ser sorprendente!

La distribución Beta en sí es simétrica si . Para nosotros esto ocurre si que identifica correctamente los paneles , , y $\alpha = \beta$ $n = 2k-1$ $(k=2, n=3)$ $(k=3, n=5)$ $(k=4, n=7)$ $(k=5, n=9)$ . La medida en que la distribución es simétrica a través de depende del número de las variables regresor incluimos en el modelo para ese tamaño de la muestra. Si $R^2 = 0.5$ $k = \frac{n+1}{2}$ the distribution of $R^2$ is perfectly symmetric about 0.5; if we include fewer variables than that it becomes increasingly asymmetric and the bulk of the probability mass shifts closer to $R^2 = 0$ ; if we include more variables then it shifts closer to $R^2 = 1$ . Remember that $k$ includes the intercept in its count, and that we are working under the null, so the regressor variables should have coefficient zero in the correctly specified model.

There is also an obviously symmetry between distributions for any given $n$ , i.e. any row in the facet grid. For example, compare $(k=3, n=9)$ with $(k=7, n=9)$ . What's causing this? Recall that the distribution of $\mathrm{Beta}(\alpha, \beta)$ is the mirror image of $\mathrm{Beta}(\beta, \alpha)$ across $x=0.5$ . Now we had $\alpha_{k,n} = \frac{k-1}{2}$ and $\beta_{k,n} = \frac{n-k}{2}$ . Consider $k'=n-k+1$ and we find:

α k', n = ( n - k + 1 ) - 1 2 = n - k 2 = β k, n

$\alpha_{k',n} = \frac{(n-k+1)-1}{2} = \frac{n-k}{2} = \beta_{k,n}$

β k', n = n - ( n - k + 1 ) 2 = k - 1 2 = α k, n

$\beta_{k',n} = \frac{n-(n-k+1)}{2} = \frac{k-1}{2} = \alpha_{k,n}$

So this explains the symmetry as we vary the number of regressors in the model for a fixed sample size. It also explains the distributions that are themselves symmetric as a special case: for them, $k' = k$ so they are obliged to be symmetric with themselves!

This tells us something we might not have guessed about multiple regression: for a given sample size $n$ , and assuming no regressors have a genuine relationship with $Y$ , the $R^2$ for a model using $k-1$ regressors plus an intercept has the same distribution as $1 - R^2$ does for a model with $k-1$ residual degrees of freedom remaining.

Special distributions

When $k=n$ we have $\beta=0$ , which isn't a valid parameter. However, as $\beta \to 0$ the distribution becomes degenerate with a spike such that $\mathsf{P}(R^2 = 1)=1$ . This is consistent with what we know about a model with as many parameters as data points - it achieves perfect fit. I haven't drawn the degenerate distribution on my graph but did include the mean, mode and standard deviation.

When $k=2$ and $n=3$ we obtain $\mathrm{Beta}(\frac{1}{2}, \, \frac{1}{2})$ which is the arcsine distribution. This is symmetric (since $\alpha = \beta$ ) and bimodal (0 and 1). Since this is the only case where both $\alpha < 1$ and $\beta < 1$ (marked red on both sides), it is our only distribution which goes to infinity at both ends of the support.

The $\mathrm{Beta}(1, \, 1)$ distribution is the only Beta distribution to be rectangular (uniform). All values of $R^2$ from 0 to 1 are equally likely. The only combination of $k$ and $n$ for which $\alpha = \beta =1$ occurs is $k=3$ and $n=5$ (marked blue on both sides).

The previous special cases are of limited applicability but the case $\alpha > 1$ and $\beta=1$ (green on left, blue on right) is important. Now $f(x;\,\alpha,\,\beta) \propto x^{\alpha-1} (1-x)^{\beta-1} = x^{\alpha-1}$ so we have a power-law distribution on [0, 1]. Of course it's unlikely we'd perform a regression with $k=n-2$ and $k>3$ , which is when this situation occurs. But by the previous symmetry argument, or some trivial algebra on the PDF, when $k=3$ and $n > 5$ , which is the frequent procedure of multiple regression with two regressors and an intercept on a non-trivial sample size, $R^2$ will follow a reflected power law distribution on [0, 1] under $H_0$ . This corresponds to $\alpha=1$ and $\beta>1$ so is marked blue on left, green on right.

You may also have noticed the triangular distributions at $(k=5,n=7)$ and its reflection $(k=3,n=7)$ . We can recognise from their $\alpha$ and $\beta$ that these are just special cases of the power-law and reflected power-law distributions where the power is $2-1=1$ .

Mode

If $\alpha>1$ and $\beta>1$ , all green in the plot, $f(x; \, \alpha, \, \beta)$ is concave with $f(0)=f(1)=0$ , and the Beta distribution has a unique mode $\frac{\alpha-1}{\alpha+\beta-2}$ . Putting these in terms of $k$ and $n$ , the condition becomes $k>3$ and $n>k+2$ while the mode is $\frac{k-3}{n-5}$ .

All other cases have been dealt with above. If we relax the inequality to allow $\beta=1$ , then we include the (green-blue) power-law distributions with $k=n-2$ and $k>3$ (equivalently, $n>5$ ). These cases clearly have mode 1, which actually agrees with the previous formula since $\frac{(n-2)-3}{n-5}=1$ . If instead we allowed $\alpha=1$ but still demanded $\beta>1$ , we'd find the (blue-green) reflected power-law distributions with $k=3$ and $n>5$ . Their mode is 0, which agrees with $\frac{3-3}{n-5}=0$ . However, if we relaxed both inequalities simultaneously to allow $\alpha=\beta=1$ , we'd find the (all blue) uniform distribution with $k=3$ and $n=5$ , which does not have a unique mode. Moreover the previous formula can't be applied in this case, since it would return the indeterminate form $\frac{3-3}{5-5}=\frac{0}{0}$ .

When $n=k$ we get a degenerate distribution with mode 1. When $\beta < 1$ (in regression terms, $n=k-1$ so there is only one residual degree of freedom) then $f(x) \to \infty$ as $x \to 1$ , and when $\alpha < 1$ (in regression terms, $k=2$ so a simple linear model with intercept and one regressor) then $f(x) \to \infty$ as $x \to 0$ . These would be unique modes except in the unusual case where $k=2$ and $n=3$ (fitting a simple linear model to three points) which is bimodal at 0 and 1.

Mean

The question asked about the mode, but the mean of $R^2$ under the null is also interesting - it has the remarkably simple form $\frac{k-1}{n-1}$ . For a fixed sample size it increases in arithmetic progression as more regressors are added to the model, until the mean value is 1 when $k=n$ . The mean of a Beta distribution is $\frac{\alpha}{\alpha+\beta}$ so such an arithmetic progression was inevitable from our earlier observation that, for fixed $n$ , the sum $\alpha+\beta$ is constant but $\alpha$ increases by 0.5 for each regressor added to the model.

α α + β = ( k - 1 ) / 2 ( k - 1 ) / 2 + ( n - k ) / 2 = k - 1 n - 1

$\frac{\alpha}{\alpha+\beta} = \frac{(k-1)/2}{(k-1)/2 + (n-k)/2} = \frac{k-1}{n-1}$

Code for plots

require(grid)
require(dplyr)

nlist <- 3:9 #change here which n to plot
klist <- 2:8 #change here which k to plot

totaln <- length(nlist)
totalk <- length(klist)

df <- data.frame(
    x = rep(seq(0, 1, length.out = 100), times = totaln * totalk),
    k = rep(klist, times = totaln, each = 100),
    n = rep(nlist, each = totalk * 100)
)

df <- mutate(df,
    kname = paste("k =", k),
    nname = paste("n =", n),
    a = (k-1)/2,
    b = (n-k)/2,
    density = dbeta(x, (k-1)/2, (n-k)/2),
    groupcol = ifelse(x < 0.5, 
        ifelse(a < 1, "below 1", ifelse(a ==1, "equals 1", "more than 1")),
        ifelse(b < 1, "below 1", ifelse(b ==1, "equals 1", "more than 1")))
)

g <- ggplot(df, aes(x, density)) +
    geom_line(size=0.8) + geom_area(aes(group=groupcol, fill=groupcol)) +
    scale_fill_brewer(palette="Set1") +
    facet_grid(nname ~ kname)  + 
    ylab("probability density") + theme_bw() + 
    labs(x = expression(R^{2}), fill = expression(alpha~(left)~beta~(right))) +
    theme(panel.margin = unit(0.6, "lines"), 
        legend.title=element_text(size=20),
        legend.text=element_text(size=20), 
        legend.background = element_rect(colour = "black"),
        legend.position = c(1, 1), legend.justification = c(1, 1))


df2 <- data.frame(
    k = rep(klist, times = totaln),
    n = rep(nlist, each = totalk),
    x = 0.5,
    ymean = 7.5,
    ymode = 5,
    ysd = 2.5
)

df2 <- mutate(df2,
    kname = paste("k =", k),
    nname = paste("n =", n),
    a = (k-1)/2,
    b = (n-k)/2,
    meanR2 = ifelse(k > n, NaN, a/(a+b)),
    modeR2 = ifelse((a>1 & b>=1) | (a>=1 & b>1), (a-1)/(a+b-2), 
        ifelse(a<1 & b>=1 & n>=k, 0, ifelse(a>=1 & b<1 & n>=k, 1, NaN))),
    sdR2 = ifelse(k > n, NaN, sqrt(a*b/((a+b)^2 * (a+b+1)))),
    meantext = ifelse(is.nan(meanR2), "", paste("Mean =", round(meanR2,3))),
    modetext = ifelse(is.nan(modeR2), "", paste("Mode =", round(modeR2,3))),
    sdtext = ifelse(is.nan(sdR2), "", paste("SD =", round(sdR2,3)))
)

g <- g + geom_text(data=df2, aes(x, ymean, label=meantext)) +
    geom_text(data=df2, aes(x, ymode, label=modetext)) +
    geom_text(data=df2, aes(x, ysd, label=sdtext))
print(g)

Silverfish
fuente

Really illuminating visualization. +1

Khashaa

Great addition, +1, thanks. I noticed that you call

$0$ a mode when the distribution goes to

$+\infty$ when

$x\to 0$ (and nowhere else) -- something @Alecos above (in the comments) did not want to do. I agree with you: it is convenient.

amoeba says Reinstate Monica

@amoeba from the graphs we'd like to say "values around 0 are most likely" (or 1). But the answer of Alecos is also both self-consistent and consistent with many authorities (people differ on what to do about the 0 and 1 full stop, let alone whether they can count as a mode!). My approach to the mode differs from Alecos mostly because I use conditions on alpha and beta to determine where the formula is applicable, rather than taking my starting point as the formula and seeing which k and n give sensible answers.

Silverfish

(+1), this is a very meaty answer. By keeping

$k$ too close to

$n$ and both small, the question studies in detail, and so decisively, the case of really small samples with relatively too many and irrelevant regressors.

Alecos Papadopoulos

@amoeba You probably noticed that this answer furnishes an algebraic answer for why, for sufficiently large

$n$ , the mode of the distribution is 0 for

$k=3$ but positive for

$k>3$ . Since

$f(x) \propto x^{(k-3)/2}(1-x)^{(n-k-2)/2}$ then for

$k=3$ we have

$f(x) \propto (1-x)^{(n-5)/2}$ which will clearly have mode at 0 for

$n>5$ , whereas for

$k=4$ we have

$f(x) \propto x^{1/2}(1-x)^{(n-6)/2}$ whose maximum can be found by calculus to be the quoted mode formula. As

$k$ increases, the power of

$x$ rises by 0.5 each time. It's this

$x^{\alpha-1}$ factor which makes

$f(0)=0$ so kills the mode at 0

Silverfish