Department of Industrial Engineering and Management Sciences

Page 1

Northwestern University, Evanston, Illinois 60208-3119, U.S.A.

Working Paper No. 08-04

Mixtures of Multiple Testing Procedures with Gatekeeping Applications

Alex Dmitrienko

Eli Lilly and Company

Ajit C. Tamhane

Northwestern University

Lingyun Liu

Northwestern University

December 2008

Page 2

Abstract

This paper introduces a general framework for constructing gatekeeping procedures

for multiple testing problems arising in clinical trials with hierarchical objectives.

These problems frequently exhibit a complex structure, including multiple families of

hypotheses and logical restrictions. The framework is based on combining multiple

tests across families and enables clinical trial sponsors to set up powerful and flexible

multiple testing procedures (e.g., gatekeeping procedures based on Dunnett tests

that account for logical restrictions among the hypotheses of interest). A clinical trial

example is used to illustrate the general approach.

Page 3

Keywords and Phrases: Multiple comparisons; Closure principle; Gatekeeping proce-

dures; Bonferroni test; Dunnett test.

Page 4

1. Introduction

Gatekeeping procedures are commonly used in multiple testing problems with a

hierarchical structure, including problems arising in clinical trials with multiple ob-

jectives. These objectives may represent primary endpoints, secondary endpoints and

subgroup analyses, etc. To account for the hierarchical structure of these objectives,

null hypotheses associated with the objectives are grouped into families. Consider,

for example, a multiple testing problem involving n null hypotheses H1,...,Hn that

are grouped into m families:

Fk = {Hi, i ∈ Nk}, k = 1,...,m, m ≥ 2,

where N1 = {1,...,n1}, Nk = {n1 + ... + nk−1 + 1,...,n1 + ... + nk}, k = 2,...,m,

and n1 + ... + nm = n.

Dmitrienko, Tamhane and Wiens (2008) introduced a framework for constructing

multistage parallel gatekeeping procedures. A parallel gatekeeping procedure tests

hypotheses in Family Fk, k = 2,...,m, only if one or more hypotheses are rejected in

Fk−1. Dmitrienko, Tamhane and Wiens proposed a general algorithm for setting up

parallel gatekeeping procedures with an attractive stepwise form based on tests from

a broad class of multiple testing procedures (known as separable procedures).

One of the limitations of the framework proposed by these authors is that it

cannot be used in problems with logical restrictions, i.e., when the acceptance or

rejection of hypotheses in Fk, k = 2,...,m, depends on the outcomes of signifi-

cance tests in F1,...,Fk−1. Multiple testing problems with logical restrictions are

frequently encountered in clinical trials. Examples are given in Chen, Luo and

Capizzi (2005), Quan, Luo and Capizzi (2005), Dmitrienko, Offen, Wang and Xiao

(2006), Dmitrienko, Wiens, Tamhane and Wang (2007), Dmitrienko, Tamhane, Liu

and Wiens (2008).

This paper describes a framework that enables clinical trial sponsors to set up

flexible multiple testing procedures for problems with a very general class of logical

restrictions (monotone logical restrictions). The framework is based on combining

multiple tests across families using the concept of a mixture of multiple testing pro-

cedures. This term is used here to make an analogy with mixtures of distributions

(Everitt and Hand, 1981). To specify a mixture distribution, one needs to specify

component distributions and a mixing distribution. Similarly, in the case of mix-

ture procedures, one needs to select component procedures and a mixing function.

The mixing function is selected to take into account the logical relationships among

multiple families and provide strong control of the familywise error rate (FWER)

(Hochberg and Tamhane, 1987). The mixture-based framework uses the closure prin-

ciple (Marcus, Peritz and Gabriel, 1976) to achieve FWER control.

Page 5

The paper is organized as follows. Section 2 introduces the mixture-based frame-

work for an arbitrary number of families. Section 3 defines a class of monotone logical

restrictions. Section 4 describes mixing functions that can be used to construct mix-

tures of multiple testing procedures. Properties of mixture procedures are described

in Section 5. Lastly, Section 6 gives examples of mixture procedures (including mix-

tures of Bonferroni and Dunnett procedures) and a clinical trial example to illustrate

the mixture-based framework.

2. Mixture procedures

Consider the multiple testing problem defined in the Introduction. Let Hk denote

the closed family associated with Fk, i.e.,

Hk = {HIk , Ik ⊆ Nk}, where HIk = ∩i∈Ik Hi.

Further, consider multiple testing procedures, known as component procedures, T1,...,Tm.

The procedure Tk, k = 1,...,m, is assumed to be a closed testing procedure that

controls the FWER in the strong sense within Fk. This means that there exists a

set of α-level tests for each intersection hypothesis in Hk such that Tk rejects Hi,

i ∈ Nk, if and only if (iff) all intersection hypotheses including Hi are rejected by the

intersection hypothesis tests. For example, if Tk is the Holm procedure, each inter-

section hypothesis is tested using the Bonferroni test at α. Let pk(Ik), Ik ⊆ Nk, be

the p-value for the intersection hypothesis test associated with HIk . The intersection

hypothesis HIk is rejected iff pk(Ik) ≤ α.

A mixture of the component procedures, denoted by T, is a procedure for testing

all hypotheses in F = F1 ∪ ... ∪ Fm. Let N = {1,...,n} and let H denote the closed

family associated with F, i.e., H = {HI,I ⊆ N}. For each index set I ⊆ N, let

Ik = I ∩ Nk, k = 1,...,m. To define a mixture procedure, one needs to define α-

level tests for all intersection hypotheses in H. Consider any non-empty intersection

hypothesis HI, I ⊆ N. The test for this intersection hypothesis is defined as follows:

Case 1. HI contains hypotheses only from Fk, k = 1,...,m, i.e., I = Ik. The p-value

for HI is given by p(I) = pk(Ik).

Case 2. HI contains hypotheses from Fi1 ,...,Fis for s ≥ 2, i.e., I = Ii1 ∪ ... ∪ Iis .

The p-value for HI is given by

p(I) = mI(pi1 (Ii1 ),...,pis (Iis )),

where mI(xi1 ,...,xis ) is a mixing function.

Mixing functions have the following properties:

Page 6

• 0 ≤ mI(xi1 ,...,xis ) ≤ 1, 0 ≤ xik ≤ 1, k = 1,...,s.

• mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α.

• The test for HI is an α-level test, i.e., P(p(I) ≤ α) ≤ α.

Examples of mixing functions are given in Section 4.

Given the p-values for each intersection in the closed family, the p-value for a

hypothesis in F is computed using the closure principle. For the hypothesis Hi,

i ∈ N, the adjusted p-value is defined as the maximum over the p-values for the

intersections containing this hypothesis, i.e.,

˜pi = max

I: i∈I

p(I).

Since the mixture procedure is constructed using α-level tests for all intersection

hypotheses in H, the procedure controls the FWER in the strong sense at an α level.

3. Logical restrictions

Mixtures of multiple testing procedures are constructed to account for logical

relationships among the hypotheses in F1,...,Fm. Dmitrienko, Wiens, Tamhane and

Wang (2007) and Dmitrienko, Tamhane, Liu and Wiens (2008) proposed to formulate

logical relationships in terms of serial and parallel gatekeeping sets. In this case a

hypothesis in Fk+1, k = 1,...,m−1, is tested iff all hypotheses are rejected in a certain

subset of F1,...,Fk (known as the serial gatekeeping set) and at least one hypothesis

is rejected in another subset of F1,...,Fk (known as the parallel gatekeeping set).

A more general family of monotone logical restrictions is introduced below. The

restrictions are defined using restriction functions. Consider a hypothesis in Fs+1,

s = 1,...,m−1, say, Hi, i ∈ Ns+1. The restriction function Li(I), I ⊆ N1 ∪...∪Ns,

assumes two values, Li(I) = 0 or 1. Here Li(I) = 0 means that Hi is not testable, i.e.,

it is accepted without test if the hypotheses Hj, j ∈ I, are accepted and Li(I)=1

means that Hi is testable. The function Li(I) meets the following conditions:

• Monotonicity condition: If Li(I ) = 0 and I ⊆ I then Li(I )=0.

• Parallel gatekeeping condition: If Nk ⊆ I then Li(I) = 0 for all i ∈ Ns for

s = k + 1,...,m.

Note that, by the monotonicity condition, if a hypothesis in Fs+1 is not testable

given a set of accepted hypotheses in F1,...,Fs, it will remain non-testable if more

hypotheses are accepted in F1,...,Fs. Further, it follows from the parallel gatekeeping

Page 7

condition that all hypotheses are non-testable (and are automatically accepted) in

Fs+1 if all hypotheses are accepted in Fk, k = 1,...,s.

In order to account for logical restrictions, the definition of a mixture of two

multiple testing procedures needs to be modified as follows:

Case 1. HI contains hypotheses only from Fk, k = 1,...,m, i.e., I = Ik. The p-value

for HI is given by p(I) = pk(Ik).

Case 2. HI contains hypotheses from Fi1 ,...,Fis for s ≥ 2, i.e., I = Ii1 ∪ ... ∪ Iis .

For any k = 2,...,s, let I∗

be the subset of Iik

that includes the indices of

hypotheses that are logically consistent, i.e., testable, with the hypotheses from

Fi1 ,...,Fik−1 . In other words,

I∗

= {i : i ∈ Iik and Li(Ii1 ∪ ... ∪ Iik−1 )=1}.

Assume first that I∗

is not empty. In this case the p-value for HI is given by

p(I) = mI(pi1 (Ii1 ),pi2 (I∗

),...,pis (I∗

)),

where pik (I∗

)=1if I∗

is empty, k = 2,...,s − 1. Further, if I∗

ir+1

,...,I∗

are

empty for some r = 1,...,s − 1 then

p(I) = mJ (pi1 (Ii1 ),pi2 (I∗

),...,pir (I∗

)),

where J = Ii1 ∪ ... ∪ Iir and pik (I∗

)=1if I∗

is empty.

4. Mixing functions

This section defines mixing functions based on the Bonferroni and Dunnett global

tests. Both these mixing functions satisfy the properties listed in Section 2 and have

the same general form:

mI(xi1 ,...,xis ) = min

(

xi1

ci1

,...,

xis

cis

)

where I = Ii1 ∪ ... ∪ Iis as before and ci1 ,...,cis is a non-increasing sequence of co-

efficients with 1 = ci1 ≥ ... ≥ cis ≥ 0. This sequence is non-increasing to account for

the hierarchical structure of the problem, i.e., families placed earlier in the sequence

are more important (and receive greater weights) than those later in the sequence.

The Bonferroni and Dunnett mixing functions differ in terms of the choice of these

coefficients. For the Bonferroni mixing function, the coefficients are denoted by b’s

and for the Dunnett mixing function by d’s.

Page 8

4.1 Bonferroni mixing function

To define this function, consider the error rate function of the procedure Tk, k =

1,...,m − 1, introduced in Dmitrienko, Tamhane and Wiens (2008). Since an exact

expression for the error rate function is, in general, difficult to derive, we will focus

on an upper bound, ek(Ik), for the true error rate function, i.e.,

P(pk(Ik) ≤ α) ≤ ek(Ik)

for fixed α. As in Dmitrienko, Tamhane and Wiens (2008), we will treat ek(Ik) as

the actual error rate function. Error rate functions have the following properties:

ek(∅)=0, ek(I ) ≤ ek(I ) if I ⊆ I , ek(Nk) = α.

Also, let fk(Ik) = ek(Ik)/α.

Assume that T1,...,Tm−1 are separable, i.e., fk(Ik) < 1 for all α if Ik is a proper

subset of Nk, k = 1,...,m − 1. The Bonferroni mixing function is given by

mI(xi1 ,...,xis ) = min

(

xi1

bi1

,...,

xis

bis

)

where bi1 = 1 and bik = bik−1 (1 − fik−1 (Iik−1 )), k = 2,...,s. It is clear that

0 ≤ mI(xi1 ,...,xis ) ≤ 1 if 0 ≤ xik ≤ 1

and

mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α.

Since T1,...,Tm−1 are separable, bik > 0 if Iir−1 is a proper subset of Nir−1 for all

r = 2,...,k. On the other hand, bik = ... = bis = 0 if Iik−1 = Nik−1 and thus

mI(xi1 ,...,xis ) = min

(

xi1

bi1

,...,

xik−1

bik−1

)

It is easy to verify that the resulting test for HI is an α-level test. By the Bonferroni

inequality,

P(p(I) ≤ α) ≤

∑

k=1

P(pik (Iik ) ≤ αbik )

≤

s−1

∑

k=1

αbik fik (Iik ) + αbis

Page 9

since P(pik (Iik ) ≤ x) ≤ xfik (Iik ), k = 1,...,s − 1, and P(pis (Iis ) ≤ x) ≤ x. Further,

it is easy to see that bis−1 fis−1 (Iis−1 ) + bis = bis−1 since bis = bis−1 (1 − fis−1 (Iis−1 )).

Doing this recursively, we have

s−1

∑

k=1

bik fik (Iik ) + bis = bi1 = 1

and thus P(p(I) ≤ α) ≤ α.

4.2 Dunnett mixing function

The Bonferroni mixing function defined above is based on the Bonferroni inequal-

ity and thus does not account for the correlation among pi1 (Ii1 ),...,pis (Iis ). By

contrast, the Dunnett mixing function explicitly utilizes the joint distribution of the

p-values.

Assume again that Tk is separable, k = 1,...,m−1. The Dunnett mixing function

is given by

mI(xi1 ,...,xis ) = min

(

xi1

di1

,...,

xis

dis

)

where di1 = 1 and dik , k = 2,...,s, are defined sequentially as follows:

P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ni2 ) ≤ αdi2 ) = α,

P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ii2 ) ≤ αdi2 or pi3 (Ni3 ) ≤ αdi3 ) = α,

...

P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ii2 ) ≤ αdi2 or ... or

pis−2 (Iis−2 ) ≤ αdis−2 or pis−1 (Nis−1 ) ≤ αdis−1 ) = α,

P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ii2 ) ≤ αdi2 or ... or

pis−1 (Iis−1 ) ≤ αdis−1 or pis (Iis ) ≤ αdis ) = α.

It follows from the equations that dik

> 0 if Iir−1 is a proper subset of Nir−1 , r =

2,...,k, and dik = ... = dis = 0 if Iik−1 = Nik−1 .

As in Section 4.1, it is easy to see that 0 ≤ mI(xi1 ,...,xis ) ≤ 1 if 0 ≤ xik ≤ 1 and

mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α. Further, by the definition of dik , k = 1,...,s,

P(p(I) ≤ α) = P(pi1 (Ii1 ) ≤ αdi1 or ... or pis (Iis ) ≤ αdis )

= α

and thus the resulting test for HI is an α-level test.

Since the Dunnett mixing function takes into account the joint distribution of test

statistics, mixture procedures based on this function are more powerful than those

based on the Bonferroni mixing function.

Page 10

5. Properties of mixture procedures

This section summarizes key properties of mixture procedures.

5.1 General properties

We will begin with a discussion of general properties, including consistency with

logical restrictions and independence (inferences in F1 are independent of inferences

in F2).

Proposition 1 Assume that T is consonant in F1,...,Fk, k = 1,...,m − 1, then

the mixture procedure T is consistent with the logical restrictions in Fk+1. In other

words, T accepts Hi, i ∈ Nk+1, at the α level if Li(A1 ∪ ... ∪ Ak)=0, where Ar is

the index set of accepted hypotheses in Fr, r = 1,...,k.

Note that, if T is not consonant in F1,...,Fk, the logical restrictions may be

violated in Fk+1 in the sense that Hi, i ∈ Nk+1, may be rejected even though Li(A1 ∪

... ∪ Ak) = 0. However, the logical restrictions can always be enforced by modifying

multiplicity-adjusted p-values in Fk+1. This can be done using an algorithm similar

to that proposed in Kordzakhia et al. (2008).

Proposition 2 The mixture procedure T is equivalent to the procedure T1 within the

first family. In other words, T rejects a hypothesis in F1 at the α level iff T1 rejects

this hypothesis at the α level.

Proposition 3 The mixture procedure T is equivalent to the procedure Tk, k =

2,...,m, within Fk if T rejects all hypotheses in F1,...,Fk−1. In other words, T

rejects a hypothesis in Fk at the α level iff Tk rejects this hypothesis at the α level

provided all hypotheses in F1,...,Fk−1 are rejected by T.

The proofs of Propositions 1, 2 and 3 are given in the Appendix.

5.2 Stepwise mixture procedures with parallel gatekeeping restrictions

When parallel gatekeeping restrictions are considered, mixture procedures based

on the Bonferroni mixing function admit a stepwise representation. This means that

the mixture procedure is, in fact, identical to a stepwise application of the com-

ponent procedures with an adjustment of the significance level in the last m − 1

families. This result is equivalent to the main result in Dmitrienko, Tamhane and

Wiens (2008) and shows that the mixture framework is an extension of the framework

of multistage gatekeeping procedures introduced in that paper. In particular, mul-

tistage gatekeeping procedures considered by Dmitrienko, Tamhane and Wiens are

Page 11

mixtures of component procedures used at individual stages based on the Bonferroni

mixing function.

To demonstrate that mixture procedures based on the Bonferroni mixing func-

tion are equivalent to multistage gatekeeping procedures proposed by Dmitrienko,

Tamhane and Wiens, we will consider a two-family problem. The proof can be ex-

tended to the general case of m families by recursion.

Proposition 4 Assume that

• Only parallel gatekeeping restrictions are imposed, i.e., Li(N1)=0, i ∈ N2, and

Li(I1)=1, i ∈ N2, I1 ⊂ N1.

• The procedure T1 is separable and consonant.

• The Bonferroni mixing function is used.

The mixture procedure T has the following two-stage structure:

• The hypotheses in F1 are tested at the familywise level α1 = α using T1.

• The hypotheses in F2 are tested at the level α2 = α−e1(A1) using T2, where e1(I)

is the error rate function of T1 and A1 is the index set of accepted hypotheses

in F1.

The proof of Proposition 4 is given in the Appendix.

5.3 Mixture procedures with general logical restrictions

As shown in Proposition 4, mixture procedures in problems with parallel gate-

keeping restrictions have an attractive stepwise form. The following counterexample

shows that mixtures of testing procedures with general logical restrictions may not

have a stepwise form.

Consider a two-family problem with N1 = {1,2} and N2 = {3,...,n}. Assume

that the hypotheses within each family are equally weighted. Further, consider a

mixture of the Bonferroni procedure in F1 and Holm procedure in F2 based on the

Bonferroni mixing function. The following logical restrictions are assumed:

• H3,...,Hn−1 are testable iff H2 is rejected.

• Hn is testable iff at least one hypothesis in F1 is rejected.

In other words,

Page 12

• If I1 = ∅ or I1 = {1}, then L3(I1) = L4(I1) = ... = Ln(I1) = 1.

• If I1 = {2}, then L3(I1) = L4(I1) = ... = Ln−1(I1) = 0 and Ln(I1) = 1.

• If I1 = {1,2}, then L3(I1) = L4(I1) = ... = Ln(I1) = 0.

To demonstrate that the mixture of the two procedures does not have a stepwise

form, it is sufficient to focus on the case when T1 (Bonferroni procedure) rejects H1

but accepts H2. By the logical restrictions, only one hypothesis is testable in F2 in

this case (namely, the hypothesis Hn). If the mixture procedure had a stepwise form,

this hypothesis would have been tested by T2 (Holm procedure), i.e., its decision

rule would have been expressed in terms of pn compared to an appropriately chosen

significance level. However, as shown in Proposition 5, this is not the case.

Proposition 5 Let q(1) ≤ ... ≤ q(n−2) denote the ordered p-values in F2 and assume

that pn is the kth ordered p-value, i.e., pn = q(k), k = 1,...,n−2. Then the hypothesis

Hn is rejected iff all of the following conditions are met

pn ≤ α/2, q(i) ≤ α/(n − i − 1), i = 1,...,k.

The proof of Proposition 5 is given in the Appendix.

6. Examples

In this section we will give examples of mixture procedures that help illustrate

the general method introduced in Section 2.

6.1 Mixtures of Bonferroni procedures

Consider a problem of testing n hypotheses and let w1,...,wn denote the weights

assigned to the hypotheses in the m families. The weights are non-negative and sum

to 1 within each family, i.e.,

wi ≥ 0, i = 1,...,n,

∑

i∈Nk

wi = 1, k = 1,...,m.

The n hypotheses are grouped into m families. Assume that the first m−1 families

are tested using a weighted version of the Bonferroni procedure and the last family

is tested using a weighted version of the Holm procedure. In other words,

pk(Ik) = min

i∈Ik

(pi/wi) if Ik ⊆ Nk, k = 1,...,m − 1,

pm(Im) =





∑

k∈Im



 min

i∈Im

(pi/wi) if Im ⊆ Nm.

Page 13

We will assume first that parallel gatekeeping restrictions are imposed, i.e.,

Li(Nk−1) = 0, i ∈ Nk, k = 2,...,m,

Li(Ik−1) = 1, i ∈ Nk, Ik−1 ⊂ Nk−1, k = 2,...,m.

Noting that the error rate function for the weighted Bonferroni procedure is given by

ek(Ik) = α

∑

i∈Ik

wi, k = 1,...,m − 1,

it can be shown that the mixture of the m procedures based on the Bonferroni mixing

function is defined as follows. Let HI, I ⊆ N, be a non-empty intersection hypothesis.

If I ⊆ Nk, k = 1,...,m, then p(I) = pk(Ik), where Ik = I ∩ Nk. If HI contains

hypotheses from Fi1 ,...,Fis for s ≥ 2, the p-value for HI is given by

p(I) = min

i∈I

vi(I)

where

vi(I) = v∗

k(I)wi, i ∈ Iik , k = 1,...,s − 1,

vi(I) = v∗

s (I)wi, i ∈ Iis and is = m,

vi(I) = v∗

s (I)wi/

∑

k∈Iis

wk, i ∈ Iis and is = m,

v∗

1(I) = 1, v∗

k+1(I) = v∗

k(I)



1 −

∑

i∈Iik



 , k = 1,...,s − 1.

The resulting procedure is equivalent to the Bonferroni-based parallel gatekeeping

procedure (Dmitrienko, Offen and Westfall, 2003).

Further, we will consider the general case of monotone logical restrictions. The

mixture procedure based on the Bonferroni mixing function has a structure similar

to that of the parallel gatekeeping procedure. First p(I) = pk(Ik) if I ⊆ Nk, k =

1,...,m, where Ik = I ∩ Nk. Further, if HI contains hypotheses from Fi1 ,...,Fis for

s ≥ 2, then the p-value for HI is given by

p(I) = min

i∈I∗

vi(I)

where I∗ = Ii1 ∪ I∗

∪ ... ∪ I∗

and

vi(I) = v∗

k(I)wi, i ∈ I∗

, k = 1,...,s − 1,

vi(I) = v∗

s (I)wi, i ∈ I∗

and is = m,

vi(I) = v∗

s (I)wi/

∑

k∈Iis

wk, i ∈ I∗

and is = m,

v∗

1(I) = 1, v∗

k+1(I) = v∗

k(I)



1 −

∑

i∈Iik



 , k = 1,...,s − 1.

Page 14

Note that the presence of logical restrictions has an impact only on the index sets used

in the decision rule in the sense that a hypothesis is removed from the decision rule

if is not consistent with the logical restrictions. The process of combining component

procedures is not affected by logical restrictions and therefore v∗

1(I),...,v∗

s (I) remain

the same. This mixture procedure is equivalent to the tree gatekeeping procedure

based on Algorithm III (Kordzakhia et al., 2008).

It is also important to note that the weighting scheme used in this mixture pro-

cedure satisfies the monotonicity condition (Condition 3) formulated in Dmitrienko,

Tamhane, Liu and Wiens (2008). Weighting schemes proposed in other papers, in-

cluding Algorithm 2 in Dmitrienko, Tamhane, Liu and Wiens (2008), do not always

satisfy the monotonicity condition and gatekeeping procedures based on those schemes

can be inconsistent with the prespecified logical restrictions. In this case, the logical

restrictions need to be enforced as explained in Section 5.1.

6.2 Mixtures of Dunnett procedures

The algorithm given in Section 6.1 can be easily extended to construct more power-

ful mixture procedures, e.g., mixtures of Dunnett procedures based on the Bonferroni

mixing function. Considering a general problem of testing n hypotheses grouped into

m families, let ti, i ∈ N, denote the test statistic associated with Hi and assume that

ti, i ∈ Nk, follow a multivariate t distribution for any k = 1,...,m. Suppose that the

hypotheses in Fk, k = 1,...,m, are tested using the Dunnett procedure. In this case,

the p-value for the intersection hypothesis HIk , Ik ⊆ Nk, k = 1,...,m, is given by

pk(Ik)=1 − G|Ik|

(

max

i∈Ik

)

where Gn(x) is the cumulative distribution function of the n-variate one-sided Dun-

nett distribution, i.e.,

G|Ik|(x) = P

(

max

i∈Ik

t∗

i ≤ x

)

and t∗

i , i ∈ Nk, have the same joint distribution as ti, i ∈ Nk, under the global null

hypothesis. A mixture of the Dunnett procedures based on the Bonferroni mixing

function can now be defined using the steps described in Section 6.1.

6.3 Clinical trial example

The mixture procedures introduced in Sections 6.1 and 6.2 will be illustrated here

using a clinical trial example from Dmitrienko, Offen, Wang and Xiao (2006) and

Dmitrienko, Wiens, Tamhane and Wang (2007, Section 6). Consider a clinical trial

in patients with Type II diabetes conducted to test three doses of an experimental

treatment versus placebo. The three doses are labeled L, M and H and the placebo is

Page 15

Table 1. Test statistics and raw p-values in the Type II diabetes clinical

trial example.

Family

Null

Test

P-value

hypothesis statistic

2.81

0.005

2.56

0.011

2.39

0.018

2.61

0.009

2.24

0.026

2.50

0.013

2.60

0.010

2.78

0.006

1.96

0.051

labeled Plac. The dose-placebo comparisons are performed with respect to three or-

dered endpoints, Endpoint P (Hemoglobin A1c), Endpoint S1 (Fasting serum glucose)

and Endpoint S2 (HDL cholesterol). The sample size per arm is 87 patients.

The resulting nine hypotheses of no treatment effect (three dose-placebo compar-

isons times three endpoints) are grouped into three families:

• Family F1: H-Plac (H1), M-Plac (H2) and L-Plac (H3) comparisons for End-

point P.

• Family F2: H-Plac (H4), M-Plac (H5) and L-Plac (H6) comparisons for End-

point S1.

• Family F3: H-Plac (H7), M-Plac (H8) and L-Plac (H9) comparisons for End-

point S2.

The three doses are assumed to be equally important and thus the hypotheses are

equally weighted within each family, i.e., wi = 1/3, i = 1,...,9. The two-sample t

statistics and associated p-values for the nine hypotheses are listed in Table 1.

The null hypotheses in this clinical trial example will be tested using three multiple

testing procedures:

• Procedure 1 (Mixture of Bonferroni and Holm procedures with parallel gate-

keeping restrictions). The hypotheses in F1 and F2 are tested using the Bonfer-

roni procedure and the hypotheses in F3 are tested using the Holm procedure.

The mixture procedure is based on the Bonferroni mixing function with the

parallel gatekeeping restrictions defined in Section 6.1.

Page 16

• Procedure 2 (Mixture of Bonferroni and Holm procedures with multiple-sequence

restrictions). This procedure is similar to Procedure 1 in the sense that it is also

a mixture of the Bonferroni procedures in F1 and F2 and Holm procedure in

F3 based on the Bonferroni mixing function. However, unlike Procedure 1, this

procedure uses a more general type of logical restrictions known as multiple-

sequence restrictions. A hypothesis in Fk, k = 2,3, is tested if higher-level

hypotheses associated with the same dose are rejected, e.g., H7 is testable iff

H1 and H4 are rejected. More formally,

– Li(I1)=0if I1 contains i − 3 and Li(I1) = 1 otherwise, i = 4, 5,6.

– Li(I1 ∪I2)=0if I1 ∪I2 contains i−3 or i−6 and Li(I1 ∪I2) = 1 otherwise,

i = 7, 8,9.

• Procedure 3 (Mixture of Dunnett procedures with multiple-sequence restric-

tions). This procedure is a mixture of the Dunnett procedures in F1, F2 and

F3 based on the Bonferroni mixing function and imposes multiple-sequence re-

strictions defined above.

Beginning with Procedures 1 and 2, adjusted p-values can be computed using the

algorithm given in Section 6.1. This algorithm is based on a complete enumeration

of all non-empty intersections of the original nine hypotheses. A p-value is computed

for each intersection and then the p-values for the original hypotheses are found using

the closure principle (see Section 2 for more details). As an illustration, consider the

intersection hypothesis corresponding to the index set I = {1,3,5, 6,7,8,9}, i.e.,

HI = H1 ∩ H3 ∩ H5 ∩ H6 ∩ H7 ∩ H8 ∩ H9.

Assuming parallel gatekeeping restrictions (Procedure 1), one first needs to define

p-values for HI1 , HI2 and HI3 , where I1 = {1,3}, I2 = {5, 6} and I3 = {7,8,9}.

Using the raw p-values displayed in Table 1, the p-values are computed based on

the Bonferroni and Holm procedures as shown below

p1(I1) = n1 min(p1,p2)=0.015,

p2(I2) = n2 min(p5,p6)=0.039,

p3(I3) = |I3|min(p7,p8,p9)=0.018,

where n1 = 3, n2 = 3 and |I3| = 3. Using the Bonferroni mixing function, the p-value

for HI is given by

p(I) = min

(

p1(I1)

p2(I2)

p3(I3)

)

Page 17

Table 2. Mixtures of three procedures with parallel gatekeeping restric-

tions (Procedure 1) and multiple-sequence restrictions (Procedure 2) in

the Type II diabetes clinical trial example. The asterisk identifies the

adjusted p-values that are significant at the 0.05 level.

Family

Null

Adjusted p-value

hypothesis Procedure 1 Procedure 2

0.015∗

0.033∗

0.054

0.041∗

0.078

0.054

0.045∗

0.054

0.078

0.077

where b1 = 1 and, to compute bk, k = 2,3, one needs to utilize the error rate

function of the Bonferroni procedure. As shown in Section 6.1, ek(Ik) = α|Ik|/nk or,

equivalently, fk(Ik) = |Ik|/nk, k = 1, 2, and thus

= b1(1 − f1(I1)) = 1 −

|I1|

= b2(1 − f2(I2)) =

(

1 −

|I2|

)

This immediately implies that p(I)=0.015.

Now consider the case of multiple-sequence restrictions (Procedure 2). The index

sets I2 and I3 need to be modified to account for the logical restrictions. Note that

H6 depends on H3, H7 depends on H1, H8 depends on H5, H9 depends on H3 and

H6. Thus the modified index sets are given by I∗

2 = {5} and I∗

3 = ∅. The next step

is to compute the p-values for HI1 and HI∗

p1(I1) = n1 min(p1,p2)=0.015,

p2(I∗

2 ) = n2p5 = 0.078.

Lastly, the p-value for HI is given by

p(I) = min

(

p1(I1)

p2(I∗

2 )

)

Page 18

where bk, k = 1,2, are defined above, i.e., b1 = 1 and b2 = 1/3, and therefore

p(I)=0.015.

Table 2 displays the raw p-values for the nine hypotheses of interest along with

the adjusted p-values produced by the two procedures. Procedure 1 rejects three

hypotheses in this problem (H1, H2 and H4) and Procedure 2 one more hypothesis

(H7). It is easy to verify that, as shown in Proposition 1, both procedures are con-

sistent with the logical restrictions (note that the Bonferroni procedures in F1 and

F2 are consonant and thus there is no need to enforce the logical restrictions). Fur-

ther, as stated in Proposition 2, Procedures 1 and 2 are equivalent to the Bonferroni

procedure in F1. Indeed, the adjusted p-values for the hypotheses in F1 are equal to

Bonferroni-adjusted p-values (each raw p-value is multiplied by 3).

Further, it is worth noting that Procedure 1 is based on parallel gatekeeping

restrictions and thus, as shown in Proposition 4, it has a stepwise representation.

This procedure is identical to a stepwise application of the Bonferroni procedures in

F1 and F2 and Holm procedure in F3 with appropriate adjustments of the significance

levels in F2 and F3. For more information, see Dmitrienko, Tamhane and Wiens (2008,

Section 6).

The calculation of adjusted p-values for Procedure 3 is based on an algorithm

similar to the one used in Section 6.1. The only change that needs to be made is that

the Bonferroni and Holm p-values for intersection hypotheses need to be replaced by

the Dunnett p-values defined in Section 6.2. To illustrate the process, select the same

intersection as above, i.e.,

HI = H1 ∩ H3 ∩ H5 ∩ H6 ∩ H7 ∩ H8 ∩ H9.

and consider the multiple-sequence restrictions. The modified index sets are I∗

2 = {5}

and I∗

3 = ∅. Given the sample size per arm (87 patients) and number of doses

(3 doses), the Dunnett p-values for HI1 and HI∗

are computed using the one-sided

Dunnett distribution with 3 and 344 degrees of freedom. These p-values are given by

p1(I1) = 1 − G2(max(t1,t3)) = 0.0073,

p2(I∗

2 ) = 1 − G1(t5)=0.0336,

where F(x) is the cumulative distribution function of the Dunnett distribution. Fur-

ther, this mixture is also based on the Bonferroni mixing function and thus b1 = 1

and b2 = 1/3. Therefore,

p(I) = min

(

p1(I1)

p2(I∗

2 )

)

= 0.0073.

The adjusted p-values produced by Procedure 3 are shown in Table 3. One can

see from this table that the mixture of Dunnett procedures rejects more hypotheses

Page 19

Table 3. Mixture of three procedures with multiple-sequence restric-

tions (Procedure 3) in the Type II diabetes clinical trial example. The

asterisk identifies the adjusted p-values that are significant at the 0.05

level.

Family

Null

Adjusted

hypothesis

p-value

0.007∗

0.015∗

0.023∗

0.019∗

0.034∗

0.023∗

0.034∗

0.064

than a similar procedure based on the Bonferroni and Holm procedures (Procedure

2). Specifically, Procedure 3 rejects eight hypotheses whereas Procedure 2 rejects only

four hypotheses. This is a direct consequence of the fact that the Dunnett procedure

is uniformly more powerful than the Bonferroni procedure.

It is worth noting that the mixture of Dunnett procedures defined above can serve

as a computationally attractive alternative to the Dunnett-based parallel gatekeeping

procedure with logical restrictions introduced in Dmitrienko, Offen, Wang and Xiao

(2006). The parallel gatekeeping procedure requires the computation of a vector

of critical values for each intersection hypothesis in the closed family based on the

multivariate distribution of the associated test statistics. Even in the case of nine

hypotheses, the algorithm is computationally intensive (it involves the evaluation

of multivariate probabilities for up to six dimensions). By contrast, the mixture

procedure is based on regular Dunnett-adjusted p-values that are combined across

the three families. This approach considerably simplifies the calculation of adjusted

p-values and leads to a relatively small reduction in the overall power compared to

the Dunnett-based parallel gatekeeping procedure.

References

Chen, X., Luo, X., Capizzi, T. (2005). The application of enhanced parallel gate-

keeping strategies. Statistics in Medicine. 24, 1385–1397.

Dmitrienko, A., Offen, W.W., Westfall, P.H. (2003). Gatekeeping strategies for

Page 20

clinical trials that do not require all primary effects to be significant. Statistics

in Medicine. 22, 2387–2400.

Dmitrienko, A., Offen, W., Wang, O., Xiao, D. (2006). Gatekeeping procedures in

dose-response clinical trials based on the Dunnett test. Pharmaceutical Statis-

tics. 5, 19–28.

Dmitrienko, A., Wiens, B.L., Tamhane, A.C., Wang, X. (2007). Tree-structured

gatekeeping tests in clinical trials with hierarchically ordered multiple objec-

tives. Statistics in Medicine. 26, 2465–2478.

Dmitrienko, A., Tamhane, A., Liu, L., Wiens, B. (2008). A note on tree gatekeeping

procedures in clinical trials. Statistics in Medicine. 27, 3446–3451.

Dmitrienko, A., Tamhane, A., Wiens, B. (2008). General multistage gatekeeping

procedures. Biometrical Journal. 50, 667–677.

Everitt, B.S., Hand, D.J. (1981). Finite Mixture Distributions. Chapman and Hall,

London, New York.

Hochberg, Y., Tamhane, A.C. (1987). Multiple Comparison Procedures. New York:

John Wiley and Sons.

Kordzakhia, G., Dinh, P., Bai, S., Lawrence, J., Yang, P. (2008). Bonferroni-based

tree-structured gatekeeping testing procedures. Unpublished manuscript.

Marcus, R. Peritz, E., Gabriel, K.R. (1976). On closed testing procedures with

special reference to ordered analysis of variance. Biometrika. 63, 655–660.

Quan, H., Luo, X., Capizzi, T. (2005). Multiplicity adjustment for multiple end-

points in clinical trials with multiple doses of an active treatment. Statistics in

Medicine. 24, 2151–2170.

Page 21

Appendix

Proof of Proposition 1. Consider a hypothesis in Fk+1, k = 1,...,m − 1, say,

Hi, i ∈ Nk+1, and assume that Li(A1 ∪ ... ∪ Ak) = 0. Let Is = As, s = 1,...,k,

Ik+1 = {i}. Further, let J = I1 ∪ ... ∪ Ik and I = I1 ∪ ... ∪ Ik+1. Considering the

intersection hypothesis HI, note that Li(I1 ∪ ... ∪ Ik) = 0 and thus

I∗

k+1 = {i : i ∈ Ik+1 and Li(I1 ∪ ... ∪ Ik)=1}

is empty. Therefore,

p(I) = mJ (p1(I1),p2(I∗

2 ),...,pk(I∗

k )).

Note that the mixture procedure T accepts all hypotheses Hj, j ∈ J and T is con-

sonant in F1,...,Fk. Therefore, p(J) > α (if p(J) was less than or equal to α, then

T would reject at least one hypothesis Hj with i ∈ J; however, all hypotheses Hj,

j ∈ J, are rejected, which implies that p(J) > α). Further, p(I) = p(J) > α and the

index set I contains i. Thus, T accepts Hi. The proof is complete.

Proof of Proposition 2. Assume first that T1 rejects Hi, i ∈ N1. This means that

p1(I1) ≤ α for any I1 ⊆ N1 if i ∈ I1. Now consider any index set I ⊆ N that contains

i. In general, I = Ii1 ∪...∪Iis , where Iir ⊆ Nir , r = 1,...,s, i1 = 1 and s = 1,...,m,

and

p(I) = mI(pi1 (Ii1 ),...,pis (Iis )).

By the definition of a mixture function, mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α and, since

xi1 = p1(I1) ≤ α, we conclude that p(I) is no greater than α and thus T rejects Hi.

Now assume that T rejects Hi, i ∈ N1. In this case, p(I) ≤ α for any I ⊆ N

that contains i, which immediately implies that p1(I1) ≤ α for any I1 ⊆ N1 if i ∈ I1.

Therefore, T1 rejects Hi. The proof is complete.

Proof of Proposition 3. Assume that all hypotheses in F1,...,Fk−1 are rejected by

T and Tk rejects Hi, i ∈ Nk, i.e., pk(Ik) ≤ α for any Ik ⊆ Nk if i ∈ Ik. Consider any

index set I ⊆ N that contains i. If this set includes any indices from N1 ∪...∪Nk−1,

p(I) is no greater than α since T rejects all hypotheses in F1,...,Fk−1. If this set

does not include any indices from N1 ∪ ... ∪ Nk−1, the p-value for HI is given by

p(I) = mI(pi1 (Ii1 ),...,pis (Iis )),

where I = Ii1 ∪ ... ∪ Iis , Iir ⊆ Nir , r = 1,...,s, i1 = k and s = 1,...,m − k + 1.

As in Proposition 2, recall that mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α and xi1 = pk(Ik) ≤ α.

Thus p(I) ≤ α, which implies that T rejects Hi.

On the other hand, if T rejects Hi, i ∈ Nk, the arguments used in the proof of

Proposition 2 can be applied to show that Tk rejects Hi. The proof is complete.

Page 22

Proof of Proposition 4. The first statement follows from Proposition 2. Consider

the second statement and assume that T1 rejects Hk, k ∈ R1, at the α level and T2

rejects Hj, j ∈ N2, at the level α2 = α − e1(A1). Here R1 ⊆ N1 is the index set of

hypotheses rejected in F1. Considering any I ⊆ N with j ∈ I, let I1 = I ∩ N1 and

I2 = I ∩ N2. If I1 ∩ R1 = ∅, then p1(I1) ≤ α and thus

p(I) = min

(

p1(I1),

p2(I2)

1 − f1(I1)

)

≤ p1(I1) ≤ α.

Further, if I1 ∩ R1 = ∅, then I1 ⊆ A1 and, by the monotonicity of the error rate

function, f1(I1) ≤ f1(A1). Since T2 rejects Hj at the level α2 = α − e1(A1),

p2(I2) ≤ α − e1(A1) = α(1 − f1(A1)) ≤ α(1 − f1(I1))

and

p(I) = min

(

p1(I1),

p2(I2)

1 − f1(I1)

)

≤

p2(I2)

1 − f1(I1)

≤ α.

This means that T rejects Hj at the α level.

Assume now that T rejects Hk, k ∈ R1, and Hj, j ∈ N2, at the α level. Consider

any I2 ⊆ N2 such that j ∈ I2 and let I = I1 ∪ I2, where I1 = A1. Recall that T is

equivalent to T1 in F1 and thus T1 also rejects Hk, k ∈ R1. Since T1 is consonant, we

conclude that p1(I1) > α. On the other hand, T rejects Hj and thus

p(I) = min

(

p1(I1),

p2(I2)

1 − f1(I1)

)

≤ α.

This implies that

p2(I2) ≤ α(1 − f1(I1)) = α − e1(I1)

if I2 ⊆ N2 and j ∈ I2. Therefore, T2 rejects Hj, j ∈ N2, at the level α2 = α − e1(A1).

The proof is complete.

Proof of Proposition 5. Note first that the p-values for intersection hypotheses in

H1 and H2 are given by

p1(I1) = 2min

i∈I1

pi, I1 ⊆ N1,

p2(I2) = |I2|min

i∈I2

pi, I2 ⊆ N2,

where pi is the raw p-value for testing Hi, i ∈ N. Also, the error rate function for

the Bonferroni procedure is e1(I1) = |I1|α/2, where |I1| is the cardinality of the set

I1 (Dmitrienko, Tamhane and Wiens, 2008). Therefore, the p-values for intersection

hypotheses in H are given by

Page 23

Case 1. If I1 = {1,2},

p(I) = 2 min

i∈I1

pi.

Case 2. If I1 = {1},

p(I) = 2 min

(

p1,|I2|min

i∈I2

)

Case 3. If I1 = {2},

p(I) = 2 min(p2,pn).

Case 4. If I1 = ∅,

p(I) = |I2|min

i∈I2

pi.

Recall now that H1 is rejected and H2 is accepted. This means that all intersection

hypotheses in H that include H1 are rejected. Therefore, to determine the conditions

under which the mixture procedure rejects Hn, it is sufficient to concentrate on the

intersection hypotheses that include Hn but exclude H1. It follows from Cases 3 and

4 that Hn is rejected iff pn ≤ α/2 (note that p2 > α/2 since H2 is accepted) and the

k smallest p-values are significant at the Holm-adjusted significance levels in F2, i.e.,

q(i) ≤ α/(n − i − 1), i = 1,...,k. The proof is complete.