Cómo crear una matriz de covarianza arbitraria

21

Por ejemplo, en R, la MASS::mvrnorm()función es útil para generar datos para demostrar varias cosas en las estadísticas. Toma un Sigmaargumento obligatorio que es una matriz simétrica que especifica la matriz de covarianza de las variables. ¿Cómo crearía un n × n simétricon×n matrix with arbitrary entries?

rsl
fuente
3
I think this question would benefit from being edited to focus on "how can I create an arbitrary covariance matrix" and less on the coding aspect. There is certainly an on-topic underlying statistical issue here, as demonstrated by the answer.
Silverfish

Respuestas:

22

Create an n×n matrix A with arbitrary values

and then use Σ=ATA as your covariance matrix.

For example

n <- 4  
A <- matrix(runif(n^2)*2-1, ncol=n) 
Sigma <- t(A) %*% A
Henry
fuente
Likewise, Sigma <- A + t(A).
rsl
6
@MoazzemHossen: Your suggestion will produce a symmetric matrix, but it may not always be positive semidefinite (e.g. your suggestion could produce a matrix with negative eigenvalues) and so it may not be suitable as a covariance matrix
Henry
Yes, I noticed that R returns error in the event my suggested way produced unsuitable matrix.
rsl
4
Note that if you prefer a correlation matrix for better interpretability, there is the ?cov2cor function, which can be applied subsequently.
gung - Reinstate Monica
1
@B11b: You need your covariance matrix to be positive semi-definite. That would put some limits on the covariance values, not totally obvious ones when n>2
Henry
24

I like to have control over the objects I create, even when they might be arbitrary.

Consider, then, that all possible n×n covariance matrices Σ can be expressed in the form

Σ=P Diagonal(σ1,σ2,,σn) P

where P is an orthogonal matrix and σ1σ2σn0.

Geometrically this describes a covariance structure with a range of principal components of sizes σi. These components point in the directions of the rows of P. See the figures at Making sense of principal component analysis, eigenvectors & eigenvalues for examples with n=3. Setting the σi will set the magnitudes of the covariances and their relative sizes, thereby determining any desired ellipsoidal shape. The rows of P orient the axes of the shape as you prefer.

One algebraic and computing benefit of this approach is that when σn>0, Σ is readily inverted (which is a common operation on covariance matrices):

Σ1=P Diagonal(1/σ1,1/σ2,,1/σn) P.

Don't care about the directions, but only about the ranges of sizes of the the σi? That's fine: you can easily generate a random orthogonal matrix. Just wrap n2 iid standard Normal values into a square matrix and then orthogonalize it. It will almost surely work (provided n isn't huge). The QR decomposition will do that, as in this code

n <- 5
p <- qr.Q(qr(matrix(rnorm(n^2), n)))

This works because the n-variate multinormal distribution so generated is "elliptical": it is invariant under all rotations and reflections (through the origin). Thus, all orthogonal matrices are generated uniformly, as argued at How to generate uniformly distributed points on the surface of the 3-d unit sphere?.

A quick way to obtain Σ from P and the σi, once you have specified or created them, uses crossprod and exploits R's re-use of arrays in arithmetic operations, as in this example with σ=(σ1,,σ5)=(5,4,3,2,1):

Sigma <- crossprod(p, p*(5:1))

As a check, the Singular Value decomposition should return both σ and P. You may inspect it with the command

svd(Sigma)

The inverse of Sigma of course is obtained merely by changing the multiplication by σ into a division:

Tau <- crossprod(p, p/(5:1))

You may verify this by viewing zapsmall(Sigma %*% Tau), which should be the n×n identity matrix. A generalized inverse (essential for regression calculations) is obtained by replacing any σi0 by 1/σi, exactly as above, but keeping any zeros among the σi as they were.

whuber
fuente
It might help to demonstrate how to use the rows of P to orient the axes as preferred.
gung - Reinstate Monica
1
Might be worth mentioning that the singular values in svd(Sigma) will be reordered -- that confused me for a minute.
FrankD
1

You can simulate random positive definite matrices from the Wishart distribution using the function "rWishart" from the widely used package "stats".

n <- 4
rWishart(1,n,diag(n))
Carlos Llosa
fuente
1

There is a package specifically for that, clusterGeneration (written among other by Harry Joe, a big name in that field).

There are two main functions:

  • genPositiveDefMat generate a covariance matrix, 4 different methods
  • rcorrmatrix : generate a correlation matrix

Quick example:

library(clusterGeneration)
#> Loading required package: MASS
genPositiveDefMat("unifcorrmat",dim=3)
#> $egvalues
#> [1] 15.408962  5.673916  1.228842
#> 
#> $Sigma
#>          [,1]     [,2]     [,3]
#> [1,] 6.714871 1.643449 6.530493
#> [2,] 1.643449 6.568033 2.312455
#> [3,] 6.530493 2.312455 9.028815
genPositiveDefMat("eigen",dim=3)
#> $egvalues
#> [1] 8.409136 4.076442 2.256715
#> 
#> $Sigma
#>            [,1]       [,2]      [,3]
#> [1,]  2.3217300 -0.1467812 0.5220522
#> [2,] -0.1467812  4.1126757 0.5049819
#> [3,]  0.5220522  0.5049819 8.3078880

Created on 2019-10-27 by the reprex package (v0.3.0)

Finally, note that an alternative approach is to do a first try from scratch, then use Matrix::nearPD() to make your matrix positive-definite.

Matifou
fuente