¿Por qué la matriz de correlación necesita ser positiva semi-definida y qué significa ser o no positiva semi-definida?

34

He estado investigando el significado de la propiedad positiva semi-definida de las matrices de correlación o covarianza.

Estoy buscando cualquier información sobre

  • Definición de semi-definición positiva;
  • Sus propiedades importantes, implicaciones prácticas;
  • La consecuencia de tener determinante negativo, el impacto en el análisis multivariante o resultados de la simulación etc.
Melón
fuente
55
¿Quieres entender lo semi-definitud es , o quiere saber por qué matrices de correlación deben ser semi-definido, o si desea saber qué resultados importantes están implícitos en esta propiedad?
whuber
44
Si las matrices de correlación no son semi-positivas definidas, entonces podría obtener variaciones que fueron negativas.
Edité su pregunta un poco, compruébelo. Además, tenga en cuenta que una matriz con un número par de valores propios negativos seguirá teniendo un determinante positivo.
ttnphns
¡Una matriz de covarianza NO siempre es igual a la matriz de correlación! La covarianza considera variables normalizadas mientras que la matriz de correlación no.
Manoj Kumar
1
Preguntas relacionadas: ¿Cada matriz de covarianza es positiva definida? considera el caso más amplio de las matrices de covarianza, de las cuales las matrices de correlación son un caso especial; Además, ¿cada matriz de correlación es positiva semi-definida? y ¿Cada matriz de correlación es positiva definida?
Silverfish

Respuestas:

38

iaiXiai

var(iaiXi)=ijaiajcov(Xi,Xj)=ijaiajΣi,j,
we have that the covariance matrix Σ=[Σi,j] must be positive semidefinite (which is sometimes called nonnegative definite). Recall that a matrix C is called positive semidefinite if and only if
ijaiajCi,j0ai,ajR.
Dilip Sarwate
fuente
Thanks, I removed my downvote but I did not upvote because it does not answer about practical implications. Say I have a matrix that is not positive definite (due for exemple to modifification by 'expert'). What would happen if I use it to calibrate and/or simulate data ? Specifically, is this a real problem when trying to study a big sum and there is only a few negative eigen values ? What would be an efficient algorithm to transform a non positive semi-definite correlation matrix to a positive semi-definite one ? What would be the impact of this algorithm ?
lcrmorin
@Were_cat Thanks for the reversal of the downvote.
Dilip Sarwate
Could you please explain the first equality in the first equation?
Vivek Subramanian
1
var(X)=cov(X,X)
cov(iaiXi,Y)=iaicov(Xi,Y)cov(X,ibjYj,)=jbjcov(X,Yj)
Dilip Sarwate
18

The answer is quite simple.

The correlation matrix is defined thus:

Let X=[x1,x2,...,xn] be the m×n data matrix: m observations, n variables.

Define Xb=[(x1μ1e)s1,(x2μ2e)s2,(x3μ3e)s3,...] as the matrix of normalized data, with μ1 being mean for the variable 1, μ2 the mean for variable 2, etc., and s1 the standard deviation of variable 1, etc., and e is a vector of all 1s.

The correlation matrix is then

C=XbXb

A matrix A is positive semi-definite if there is no vector z such that zAz<0.

Suppose C is not positive definite. Then there exists a vector w such that wCw<0.

However (wCw)=(wXbXbw)=(Xbw)(Xbw)=z12+z22..., where z=Xbw, and thus wCw is a sum of squares and therefore cannot be less than zero.

So not only the correlation matrix but any matrix U which can be written in the form VV is positive semi-definite.

gregor
fuente
2
This is by far the clearest most concise and useful answer. Thanks !
Yohan Obadia
12

(Possible looseness in reasoning would be mine. I'm not a mathematician: this is a depiction, not proof, and is from my numeric experimenting, not from books.)

  1. A positive semidefinite (psd) matrix, also called Gramian matrix, is a matrix with no negative eigenvalues. Matrix with negative eigenvalues is not positive semidefinite, or non-Gramian. Both of these can be definite (no zero eigenvalues) or singular (with at least one zero eigenvalue). [Word "Gramian" is used in several different meanings in math, so perhaps should be avoided.]
  2. In statistics, we usually apply these terms to a SSCP-type matrix, also called scalar product matrix. Correlation or covariance matrices are particular cases of such matrix.
  3. Any scalar product matrix is a summary characteristic of some multivariate data (a cloud). For example, given n cases X p variables data, we could compute pXp covariance matrix between the variables or nXn covariance matrix between the cases. When you compute it from real data, the matrix will always be Gramian. You may get non-Gramian (non-psd) matrix if (1) it is similarity matrix measured directly (i.e. not computed from the data) or the similarity measure isn't SSCP-type; (2) the matrix values was incorrectly entered; (3) the matrix is in fact Gramian but is (or so close to be) singular that sometimes the spectral method of computing eigenvalues produces tiny negative ones in place of true zero or tiny positive ones.
  4. An alternative and equivalent summary for the cloud is the matrix of euclidean distances. A scalar product (such as covariance) between a pair of items and the corresponding squared euclidean distance between them are tied by the law of cosines (cosine theorem, look at the picture there): d122=h12+h222s12, where the s is the scalar product and the h's are the distances of the two items from the origin. In case of covariance matrix between variables X and Y this formula looks as dxy2=σx2+σy22covxy.
  5. As interim conclusion: a covariance (or correlation or other scalar product) matrix between some m items is a configuration of points embedded in Euclidean space, so euclidean distances are defined between all these m points.
  6. Now, if [point 5] holds exactly, then the configuration of points is truly euclidean configuration which entails that the scalar product matrix at hand (e.g. the covariance one) is Gramian. Otherwise it is non-Gramian. Thus, to say "mXm covariance matrix is positively semi-definite" is to say "the m points plus the origin fit in Euclidean space perfectly".
  7. What are possible causes or versions of non-Gramian (non-Euclidean) configuration? The answers follow upon contemplating [point 4].
    • Cause 1. Evil is among the points themselves: mXm distance matrix isn't fully euclidean. Some of the pairwise distances d are such that they cannot agree with the rest of the points in Euclidean space. See Fig1.
    • Cause 2. There is general (matrix-level) mismatch between h's and d's. For example, with fixed d's and some h's given, the other h's must only vary within some bounds in order to stay in consent with Euclidean space. See Fig2.
    • Cause 3. There is localized (pair-level) mismatch between a d and the pair of corresponding h's connected to those two points. Namely, the rule of triangular inequality is violated; that rule demands h1+h2d12|h1h2|. See Fig3.
  8. To diagnose the cause, convert the non-Gramian covariance matrix into distance matrix using the above law of cosines. Do double-centering on it. If the resultant matrix has negative eigenvalues, cause 1 is present. Else if any |covij|>σiσj, cause 3 is present. Else cause 2 is present. Sometimes more than one cause get along in one matrix.

Fig1.

Fig1

Fig2.

Fig2

Fig3.

Fig3

ttnphns
fuente
2
Point 6 needs demonstration: you have shown that a matrix of squared Euclidean distances is p-d, but you assert without proof that to each p-d matrix corresponds a Euclidean configuration of points. Also you haven't connected your definition of p-d ("no negative eigenvalues") to any of your subsequent characterizations. The key idea comes at the end (point 8): a p-d matrix can be used to define a distance. Logically, this is where you should begin the analysis.
whuber
@whuber: Thank you for the critical appraisal. I'm afraid, when it comes to mathematically proving something, I sink. I've reported part of my practical experience (I said that); the answer wasn't really an analytical sequence. Wouldn't you like then to add your own answer that can correct/improve mine? It might turn out a valuable aid. Or, you are free to work on my text to improve it if you find it not downright futile.
ttnphns
P.S. My point 8 implies that since double centering anchors a configuration of points to its centroid, this operation itself does not introduce non-euclidity (it itroduces only singularity because the new point, centre, belongs to the same space). Thence we can check if the initial configuration was euclidean. Is that not correct?
ttnphns