¿Cómo calcular las bandas de predicción para la regresión no lineal?

15

La página de ayuda para Prism ofrece la siguiente explicación sobre cómo calcula las bandas de predicción para la regresión no lineal. Disculpe la cita larga, pero no estoy siguiendo el segundo párrafo (que explica cómo se define y d YG|xdY/dP is computed). Any help would be greatly appreciated.

The calculation of the confidence and prediction bands are fairly standard. Read on for the details of how Prism computes prediction and confidence bands of nonlinear regression.

First, let's define G|x, which is the gradient of the parameters at a particular value of X and using all the best-fit values of the parameters. The result is a vector, with one element per parameter. For each parameter, it is defined as dY/dP, where Y is the Y value of the curve given the particular value of X and all the best-fit parameter values, and P is one of the parameters.)

G'|x is that gradient vector transposed, so it is a column rather than a row of values.

Cov is the covariance matrix (inversed Hessian from last iteration). It is a square matrix with the number of rows and columns equal to the number of parameters. Each item in the matrix is the covariance between two parameters.

Now compute c = G'|x * Cov * G|x. The result is a single number for any value of X.

The confidence and prediction bands are centered on the best fit curve, and extend above and below the curve an equal amount.

Las bandas de confianza se extienden por encima y por debajo de la curva: = sqrt (c) * sqrt (SS / DF) * CriticalT (Confidence%, DF)

Las bandas de predicción se extienden una distancia adicional por encima y por debajo de la curva, igual a: = sqrt (c + 1) * sqrt (SS / DF) * CriticalT (% de confianza, DF)

Joe Listerr
fuente
This is indeed known as the delta method and uses a first order Taylor approximation. It is better though to use a 2nd order Taylor approximation for this - the predictNLS function in the propagate package does that if you're interested!
Tom Wenseleers

Respuestas:

18

This is called the Delta Method.

Suppose that you have some function y=G(β,x)+ϵ; note that G() is a function of the parameters that you estimate, β, and the values of your predictors, x. First, find the derivative of this function with respect to your vector of parameters, β: G(β,x). This says, if you change a parameter by a little bit, how much does your function change? Note that this derivative may be a function of your parameters themselves as well as the predictors. For example, if G(β,x)=exp(βx), then the derivative is xexp(βx), which depends upon the value of β and the value of x. To evaluate this, you plug in the estimate of β that your procedure gives, β^, and the value of the predictor x where you want the prediction.

The Delta Method, derived from maximum likelihood procedures, states that the variance of G(β^,x) is going to be

G(β^,x)TVar(β^)G(β^,x),
where Var(β^) is the variance-covariance matrix of your estimates (this is equal to the inverse of the Hessian---the second derivatives of the likelihood function at your estimates). The function that your statistics packages employs calculates this value for each different value of the predictor x. This is just a number, not a vector, for each value of x.

This gives the variance of the value of the function at each point and this is used just like any other variance in calculating confidence intervals: take the square root of this value, multiply by the critical value for the normal or applicable t distribution relevant for a particular confidence level, and add and subtract this value to the estimate of G() at the point.

For prediction intervals, we need to take the variance of the outcome given the predictors x, Var(yx)σ2, into account. Hence, we must boost our variance from the Delta Method by our estimate of the variance of ϵ, σ^2, to get the variance of y, rather than the variance of the expected value of y that is used for confidence intervals. Note that σ^2 is the sum of squared errors (SS in help file notation) divided by the degrees of freedom (DF).

In the notation used in the help file above, it looks like their value of c does not take σ2 into account; that is, the inverse of their Hessian is σ2 times the one that I give. I'm not sure why they do that. It could be a way of writing the confidence and prediction intervals in a more familiar way (of σ times some number times some critical value). The variance that I give is actually c*SS/DF in their notation.

For example, in the familiar case of linear regression, their c would be (xx)1, while the Var(β^)=σ2(xx)1.

Charlie
fuente
Can you explain the ci calculation? Doesn't look like critical point of t * sqrt(var)
B_Miner
I think that I understand their calculation; I updated my response.
Charlie
Charlie, thanks very much for a detailed response. I intend to write code to be able to compute the 95% prediction band. I will let you know how that goes.
Joe Listerr
@Charlie - very very nice!
B_Miner
2
@Charlie. Thanks. I've added a sentence to our GraphPad Prism FAQ explaining that we use cov to mean the normalized covariance matrix (each value ranges from -1 to 1). I've also added a link to this page, which is great for anyone seeking mathematical details.
Harvey Motulsky