Modeling Carryover Effects in Industrial Settings

Experimental Design
Carryover Effects

Sometimes a test can leave lingering effects on the experimental unit. Here I try a simple approach to model this phenomenon and then use genetic algorithms to try to find a good design to mitigate as much as possible the impact of the carryover effects.


May 1, 2016

Carryover effects are effects that come from a previous testing of an experimental unit. You probably hear about them most in clinical trials, where a design might give one person drug A, B, then C and another person drugs C, A, then B. There can be an interaction between what’s left of C and drug A for the second person that will be different than what is experienced by the first person. Usually things like Latin Square designs are used to help average out these effects, and sometimes these effects are modeled directly.

In an industrial setting there are times when we only care about carryover effects. Often devices hold pressures, charges, or other levels that carry forward some amount of internal saturation from one test to the next. If quality inspection tests of such devices occur multiple times it’s possible that you can get very different results. In these situations it’s important to both have a model for how the carryover effect decreases over time, and some idea of maximum “charge” before the device explodes or simply can hold no more from the previous runs. Here I’m concerned with the first problem, modeling the decrease of carryover over time.

Modeling Carryover Effects

I’m assuming a log-linear normal error model is appropriate. Specifically, \(log(y) \sim \mathrm{N}\bigl(p(x)^c \boldsymbol\beta),\sigma^2\bigr)\). Here I’m assuming a simple additive relationship between carryover effects, that the decay is well represented at log scale by a certain low order polynomial, and that the magnitude of the carryover effect is less than the maximum described above.

Let \(f(t)\) be the amount of charge that remains after time \(t\) in the presence of no other tests. Let \(g(t)\) represent the amount of total charge at time \(t\) given a sequence of \(n\) tests occurs at times \(T = \{t_1,\ldots,t_n\}\) on the experimental unit. Then

\[ g(t) = \sum_{s \in T : s < t} f(s) . \]

Let \(p(x) : \mathbb{R} \rightarrow \mathbb{R}^m\) where the \(i\)th element of \(p(x)\) is \(x^{i-1}\), then this further simplifies to

\[ g(t) = \sum_{s \in T : s < t} p(s)^\mathrm{T} \boldsymbol\beta \]

for some fixed parameter vector \(\boldsymbol\beta\). Thus we can express the vector of responses \(\mathbf{Y}\) as

\[ \mathbf{Y} = \mathbf{X}\boldsymbol\beta. \]

The design matrix \(\mathbf{X}\) can then be used in traditional optimal design algorithms, and is easy to construct:

## D-optimal criterion for a time sequence
## Note that we first generate the matrix of differences, then we normalize
## them to the experimental design settings (so that x goes from -1 to 1)
d.crit <- function(
    ts,             # A sequence of times at which to test
    use.model       # An R formula object indicating the polynomial model
    X <-,lapply(
    xx <- 2*X[,2]/(max(X[,2])-min(X[,2])) - 1
    Y <- model.matrix(use.model,data.frame('x'=xx))
    Y[,1] <- X[,1]

This function returns (a scaled version of) the log determinant of the inverse of the information matrix, so we want to minimize this to end up with the \(D\)-optimal design. Minimizing this is the same as maximizing the determinant of the information matrix.

This function also normalizes the polynomials to have domain \([-1,1]\) which is common in experimental design. Thus, it is without loss of generality to assume the times all fall on \([0,1]\) and that \(t_1=0\) and \(t_n = 1\). You can then use something simple like coordinate exchange to find the best times \(t_2,\ldots,t_{n-1}\).

Optimal Continuous Time Testing of A Single Unit

I wrote some R code to do this in 4,000 random starts. If I was going to use a \(m\)th order polynomial then I’d find a design with \(m+2\) points. The optimal time spacing of the effects is given in the following table:

\(n\) \(t_1\) \(t_2\) \(t_3\) \(t_4\) \(t_5\) \(t_6\) \(t_7\)
5 0.000 0.006 0.368 0.780 1.000
6 0.000 0.413 0.478 0.686 0.910 1.000
7 0.000 0.003 0.202 0.492 0.733 0.932 1.000

Note that the spacings are relatively well spaced throughout \([0,1]\), but that for the 5th and 7th order models the \(t_2\) test time is very near to \(t_1\). It may not be possible to run such a test because of the spacing of the points. I added a constraint that the spacing between units for an order \(m\) model is at least \(1/(2m)\). This yielded the following table of optimal designs:

\(n\) \(t_1\) \(t_2\) \(t_3\) \(t_4\) \(t_5\) \(t_6\) \(t_7\)
5 0.000 0.430 0.532 0.830 1.000
6 0.000 0.356 0.447 0.699 0.916 1.000
7 0.000 0.183 0.317 0.531 0.755 0.927 1.000

This is a more manageable run schedule. We could also trivially modify our criterion to handle independent experimental units and construct testing times for many units. Rather than talk about that, I’ll move on to a more challenging problem that sometimes arises from needing to run all of your tests in a sequence without idle time between tests.

Best Sequence for Testing Multiple Units Under Time Constraints

Consider a situation where only a single testing apparatus is available, and our time on this apparatus is restricted such that it doesn’t make sense to spread out the tests very far apart. Further, assume there are multiple units we wish to test. The problem now shits to being one of ordering the tests of the experimental units to best achieve estimation of the model parameters per experimental unit.

Consider units labeled A, B, and C. I’ll write experimental plans as a string like ABCCAACBBACBACBACB, where that particular string represent a testing schedule where A is tested first, then B, then C, then C again, and so on. We want to arrange the testing schedule to get the most information about the decay.

Designs are ranked by the product of the \(D\)-criteria for each unit over a the fixed time interval. Designs were coded in such a way that the first letter to appear is A, the next different letter to appear is B, and so on. Further, we required balance in the sense that the same number of each letters must appear. In the situation where we have 3 letters and a design of 18 tests this yields 2,858,856 possible designs. This is in the realm of computable problems, and a direct enumeration of the space revealed that ABCCAACBBACBACBACB is in fact the unique optimal design.

It is interesting that a single unique optimum exists. It may be the case that, for certain numbers of units and testing lengths, multiple designs may share the optimum criterion value.

Things quickly leave the realm of computable-in-a-day as we allow extra units (letters). If we plan to test \(m\) units a total of \(n\) times each then the number of unique plans of length \(mn\) is

\[ \frac{1}{m!} \prod_{k=0}^{m-1} \binom{mn-kn}{n}. \]

Because of how quickly this grows, we need a better way to construct such designs. The most desperate and last-ditch-effort of such approaches is genetic algorithms (GAs). GAs allow us to balance exploration with exploiting good designs to random walk into optimal or near-optimal designs.

Genetic Algorithm Construction

A great description of genetic algorithms is in [@talbi]. The idea is that you have a small population of designs, you apply some random noise to them (mutation), you combine them in some way (breeding), and then after this you decide which to keep in your population (selection). By selecting those that are better designs in terms of the product of \(D\)-criteria then the population tends to drift toward the best designs overall. In practice, we may not find the optimal design, but we may find a good design.

The individual \(D\)-criterion values are generated with the following:

gene.determinants <- function(vv){
    w <-,lapply(
            ldx <- sort(which(vv==a))
            times <- data.frame('x'=ldx[2:length(ldx)]-ldx[1:(length(ldx)-1)])
            times[,'x'] <- 2*(times[,'x']-gene.length/2)/gene.length
            WW <- model.matrix(use.model,data=times)
            -log(abs(det(t(WW) %*% WW)))

The overall criterion would then be the product of the vector returned from this function.

We can normalize our design strings because all designs that come from permuting the labels are isomorphic. The normalization I chose was that the first appearance of the letters in the design string should be alphabetical. To this end I constructed the following normalization function to be applied to all strings after mutation and breeding.

normalize.gene <- function(vv){ <- unique(vv)
    vv.out <- LETTERS[vapply(
            which( == w)

Another important thing to note is that we want balance in our design strings. Because of this, we have to be careful that mutation and breeding preserves balance. For mutation, we achieve this by randomly choosing one of each letter, and rearranging their order. We do this a number of times governed by a random Poisson draw. Mutation occurs via the following:

mutate <- function(vv){
    for(i in 1:min(c(1,rpois(1,0.5)))){
        idx <- vapply(
                sample(which(vv == a),1)
        vv[idx] <- sample(unique(vv))

You can think of the mutation step as fueling the amount of exploration both awar from and within your population, whereas breeding is exploration into the interior of the region enclosed by your population. It is important that every design in the space is connected under mutation. By connected I mean that there do not exist two designs such that one design cannot be repreatedly mutated to become the other.

For breeding, ensuring balance can be a bit tricky. Here, I lazily just look at all combinations of the two genes and then only keep those that are balanced. I return a matrix where each row is an offspring of the first two genes:

breed <- function(vv,ww){
    idx <- which(vv != ww)
    breeding.pairs <- cbind(
    letter.table <- table(vv[idx]) # must equal table(ww[idx])
    all.combs <-
            'stringsAsFactors' = FALSE
    keep.idx <- apply(
            if(length(unique(m)) != length(letter.table)) return(FALSE)
            all(table(m) == letter.table)
                vw <- vv
                vw[idx] <- chg

Breeding is reserved for only the best solutions, so it’s somewhat justified to be a far more expensive operation than mutation (in GAs and in life).

The code first generates random solutions, and their fitness (product of \(D\)-criterion) is evaluated. Next, a loop is entered. We randomly select solutions and add their mutations (and fitness) to the population. We then take the best 20% of solutions and breed them with the top 50% of solutions in terms of fitness, adding the offspring and offspring fitness to the population. Then, we eliminate anything not in the top \(n\) of fitness. This process is continued until the best solution has an improvement less than a threshhold for \(k\) generations in a row.

Overall, this worked to generate some good solutions for larger problems. It’s slow, but when time on equipment is valued enough to make having such a design desierable it usually means you have a day or two ahead of time to plan your experiment.

This method is also not anywhere near optimal. Using the top \(n\) in terms of fitness as a selection criteria causes the population to cluster around a few solutions, and so we don’t explore the space well. Alternative methods for selection exist that do better at this.

In general, the space of all non-isomorphic designs is rather large. Another alternative approach is to focus on exploring a subspace of much smaller size. Our objective function here is the product of the \(D\)-criterion for each letter and is thus intrinsically multi-objective in nature. If we knew the Pareto front of all non-dominated designs in terms of each \(D\)-criterion then we know that the point nearest to the utopia point is also the optimal design. Thus, searching over the Pareto front may be a much more efficient method of searching for an optimal design.

Converting to Multi-Objective Optimization

Consider again the test problem for which we brute-forced out the optimal design ABCCAACBBACBACBACB. Normalizing for the order of the first three letters, the total number of non-isomorphic designs was 2,858,856. Of these, 5003 designs are on the Pareto front. Using as a distance metric of the number of pairs that don’t match we find the following distribution across the front.

Distance 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Count 1 19 19 75 81 164 272 393 519 592 655 684 612 493 290 113 21

Searching over the Pareto front found that 5002 of the designs are within a 2-swap of another Pareto front design, and the design ABBBCAACBCCACBACBA is not within a 2-swap of any Pareto front design but is within a 4-swap of some other Pareto front design. This doesn’t establish connectedness of the space under mutation, but does demonstrate that it is a necessary condition of connectedness to have at least a chance of a 4-swap mutation.

Unfortunately, rather than fully explore the multi-objective approach I had to abandon it for time issues. It does appear promising as an approach for much larger problems to restrict selection and mutation to the Pareto front.


Here, a method of design and analysis for additive exponentially decaying carryover effects is presented. Proposed heuristics were presented to generate designs for larger problems. I’m curious how the Pareto-front approach works for large problems, so let me know if you try it out. I also wasn’t able to find literature on this problem, though my library access is cut off so perhaps it’s not surprising. This seems like one of those problems that was solved in the ’20s, so also post a comment if you’ve found a similar problem in the literature.