When I was doing some work on optimal design of experiments I frequently found references to matrix update schemes that gave an almost magical looking form for the update but gave no motivation and sometimes didn't even get a reference. There was some cool linear algebra tricks being used for sure, but since the bulk of this work was done in the '70s it made it difficult to figure out what was going on.

I had talked with my friend Sarah about this as she had just read a paper that said update methods were trivially simple to apply to GLMs, and we both laughed about it. I started trying to pull together the papers that explain how the updating works for experimental design, and decided to make a post about it in the hopes that some poor soul will be helped by it. If nothing else I think the linear algebra used by the folks that derived these results is pretty cool.

What follows is a description of the problem of finding the determinants and inverses for the expected Fisher information matrix (or a related quantity). Next, the updating schemes are discussed. This discussion is separated into rank-one updates of the inverse and determinant, followed by rank-two updates of the inverse and determinant. This post is concluded with a discussion of the application of these schemes, followed by an empty comments section because no one ever reads or comments on my web site. Enjoy!

## Problem Setting

In linear modeling and generalized linear modeling we often speak of the *design matrix* \(\mat{X}\) composed of \(d\)-length rows \(\vect{x}_1, \vect{x}_2, \ldots, \vect{x}_n\) each of which represents the conditions under which we're going to run an experiment and get an observation. Obviously, we can make good or bad choices of \(\mat{X}\) which will affect our ability to estimate parameters and predict outcomes using our model.

Let there be a \(p\)-length parameter vector \(\vect{\beta}\). Traditionally we define a function \(\vect{f} : \mathbb{R}^d \rightarrow \mathbb{R}^p\) for which our model fit for experiment \(i\) will be some function of \(\vect{f}(\vect{x}_i)^\transpose \vect{\beta}\). Define the *model matrix* \(\mat{F}\) as being composed of rows \(\vect{f}(\vect{x}_1), \vect{f}(\vect{x}_2), \ldots, \vect{f}(\vect{x}_n)\). This matrix is important in describing the variance of our parameter estimates.

The \(D\)-criterion for optimal design concerns itself with the determinant of the expected Fisher information matrix \(\mat{M}\). By optimizing with respect to the \(D\)-criterion we help to minimize the variance of our parameter estimates. For the linear model, \(\mat{M}\) is proportional to \(\mat{F}^\transpose \mat{F}\) and does not vary based on \(\vect{\beta}\). For generalized linear models (with independent runs) the matrix \(\mat{M}\) is proportionate to \(\mat{F}^\transpose \mat{D} \mat{F}\) where \(\mat{D}\) is a diagonal matrix which is a function of the design matrix and \(\vect{\beta}\).

Numerical procedures exist to help speed along updates to \(\mat{M}\), \(|\mat{M}|\), and \(\mat{M}^{-1}\), which helps make computer search for designs feasible and efficient. The following is an excerpt from Goos (2002):

The low computational cost of updating the information matrix, its inverse and its determinant after a design change is a direct consequence of the fact that the information matrix can be written as a sum of outer products:

$$ \mat{F}^\transpose \mat{F} = \sum_{i=1}^n \vect{f}(\vect{x}_i)\vect{f}(\vect{x}_i)^{\transpose} $$

It's somewhat quirky of statistics to define \(\mat{F}\) by row, but it is easy to see that the identity given by Goos holds up. Note that the \(ij\)th element of \(\mat{F}^\transpose \mat{F}\) denoted \([\mat{F}^\transpose \mat{F}]_{ij}\) is

Most people remember from linear algebra the fact that for square matrices \(\mat{A}\) and \(\mat{B}\) the determinant

Recall that the information matrix for GLMs (that are independent) can always be expressed \(\mat{F}^\transpose \mat{D} \mat{F}\) where \(\mat{D}\) is a diagonal matrix. While matrix multiplication does not normally commute, in certain special cases it does. Note that when \(\mat{D}\) is diagonal then

Thus, for GLMs we can use the updating schemes for linear models which focus on updating \(\mat{F}^\transpose \mat{F}\). Usually updating \(\mat{D}\) is trivially simple, so we're in good shape to proceed.

## Updating Schemes for \(\mat{F}^\transpose \mat{F}\)

I'm going to discuss rank 1 and rank 2 updating schemes for \((\mat{F}^\transpose \mat{F})^{-1}\) and \(|\mat{F}^\transpose \mat{F}|\). Here, rank 1 update means adding a point or removing a point, and rank 2 updates means swapping a point for a new one. Most of the results can be found on Wikipedia but you can also look at Brodlie et al. (1973) for much of this, and Pearson (1969) gives in his Appendix B the last result.

Let's say we're removing a point \(\vect{x}\) from the design and adding a point \(\vect{y}\) to the design. Then the new information matrix \(\mat{F}_\star^\transpose \mat{F}_\star\) is

Next will be presented a rank 1 update for the inverse and determinant, followed by rank 2 updates for the inverse and determinant.

### Rank 1 Inverse Update

We will show that the inverse can be updated using the equality

To see this, note that when we solve for \(\alpha\)

which is the identity matrix only when \(1 = \alpha (1+\vect{d}^\transpose \vect{c})\) so that

In reality we're concerned not with updating an identity matrix, but with updating some other matrix \(\mat{A}\). But it is easy to see that when \(\mat{A}\) is invertible then

Applying this to the problem at hand, it stands that we can remove point \(\vect{x}\) to get

and we can add point \(\vect{y}\) to get

One can chain these operations to generate rank 2 updates of the inverse, but there is a direct rank 2 way we'll get to later.

### Rank 1 Determinant Update

The rank-one update of the determinant is slightly less intuitive, but still fun. Begin by noting that we can express a block matrix

when the inverse \(\mat{A}^{-1}\) exists.

We can use this to show that the determinant of a block matrix is

Thus, since \(\det(\vect{0}) = 0\), the determinant of a block-triangular matrix is

Now, consider the left hand side of the following equation

Note that the determinants are 1, \(\det(\mat{I}+\vect{c} \vect{d}^\transpose)\), and 1 respectively. The product of those determinants is equal to the determinant on the right, \(\det(\mat{I}+\vect{c} \vect{d}^\transpose)\). Now, the determinant of the right side is \(1 + \vect{d}^\transpose \vect{c}\), so it must be that

Again, we find by factoring out an \(\mat{A}\) that

Applying this to the problem at hand we find that removing a point \(\vect{x}\) gives us

and adding a point \(\vect{y}\) gives us

## Rank 2 Inverse Update

We can directly apply the Sherman-Morrison-Woodbery theorem to get this update. (Sidebar: when I took a course in linear statistical models we called this the Woody Harrelson theorem because we couldn't remember all the names). The result states that:

where of course \(\mat{A}\) and \(\mat{C}\) must be invertable. There are several good proofs in the link to Wikipedia so I won't reproduce them here. But let's spell out the application in our context here.

We want to find the matrix resulting from moving \(\vect{x}_i\) to \(\vect{y}_i\) in the design matrix, so in the model matrix we want:

It's easy to show that

and that \(\mat{C}^{-1} = \mat{C}\). That makes the rank-2 update of the inverse very simple indeed, only requiring the calculation of a \(2 \times 2\) inverse:

### Rank 2 Determinant Update

A rank 2 update of the determinant is given by Goos (2002) as

To demonstrate this we need a few intermediate results first. We begin by showing the existence of a particular linear transformation. Next, we Use this to show the determinant for a simpler form. Finally, we show that by factoring out \(\mat{F}^\transpose \mat{F}\) we can restate the problem in this form and arrive at the above equality.

#### Existence of Transformation Matrix \(\mat{T}\)

Demonstration of this first requires we establish a specific matrix \(\mat{T}\) representing a transformation of coordinate basis. We need \(\mat{T}\) to be full rank and thus invertable. We want a very specific \(\mat{T}\) satisfying the following properties:

where \(\vect{e}_i\) is \(1\) in the \(i\)th coordinate and \(0\) everywhere else. Pearson (1969) just states that such a matrix exists, and it clearly does, and this is sufficient for the argument, but let's construct it anyway. Define

Note that

The transformation \(\mat{T}\) can then be assembled:

which is (usually) full rank and thus \(\mat{T}^{-1}\) exists and \(|\mat{T}| \ne 0\).

We don't need to worry about the specific form of \(\mat{T}^{-1}\) except to note that, necessarily, \(\mat{T}^{-1} \vect{e}_1 = \vect{x}\) and \(\mat{T}^{-1} \vect{e}_2 = \vect{y}\). That's all we need to continue with the argument from Pearson (1969).

#### Determinant of \(|\mat{I} + \vect{x} \vect{y}^\transpose + \vect{u} \vect{v}^\transpose|\)

Let \(\mat{T}\) be a non-singular transformation, and recall that in general the determinant of the inverse is the reciprocal of the determinant, then note that

For any independent vector \(\vect{y}\), \(\vect{v}\) we have

Here we choose \(\mat{T}\) so that the following hold true:

Then let \(\mat{T}^{-1} \vect{x} = \vect{a}\) and \(\mat{T}^{-1} \vect{u} = \vect{b}\). Then it is obvious looking at the determinant by cofactor expansion that the value of the determinant becomes

where

Thus,

#### Applying to Update Determinant

Returning to our original labeling scheme, note that

From here applying Pearson's result to achieve the result in Goos is straitforward.

## How is this even used?

One caution about using updating formula that I've read frequently involved the accumulation of round-off error in the result. Authors have advised recalculating the inverse and determinant periodically to keep things sane. I've always done this so I'm not sure if it's necessary and under what conditions.

In designing experiments using the \(D\)-optimality criterion we want to search across the design space to optimize the determinant. Recent literature prefers coordinate exchange from Meyer and Nachtsheim (1995) which simply optimizes a single coordinate in the design matrix at a time, because of the fast updating schemes available. This does not however inform one how to do such updating.

I've had luck using Brent's method for minimization. This method switches between fitting a quadratic to three local points and using that to guess the new minimum, and the method switches to regular golden section search when such inverse quadratic interpolation looks like it will fail. This can be very fast in finding an optima. However, such a method is very single-threaded in terms of computational requirements. A simple grid search is embarrassingly parallelizable and may be easy to offload onto a GPU for really fast performance.

## Bibliography

Ken W Brodlie, AR Gourlay, and John Greenstadt.
Rank-one and rank-two corrections to positive definite matrices expressed in product form.
*IMA Journal of Applied Mathematics*, 11(1):73–82, 1973. ↩

P. Goos.
*The Optimal Design of Blocked and Split-Plot Experiments*.
Springer Verlag, 2002. ↩ ^{1} ^{2}

Ruth K. Meyer and Christopher J. Nachtsheim.
The Coordinate-Exchange Algorithm for Constructing Exact Optimal Experimental Designs.
*Technometrics*, 37(1):60–69, 1995. ↩

John D Pearson.
Variable metric methods of minimisation.
*The Computer Journal*, 12(2):171–178, 1969. ↩ ^{1} ^{2} ^{3}