Department of Industrial Engineering and Management Sciences
Northwestern University, Evanston, Illinois 60208-3119, U.S.A.
Working Paper No. 08-04
Mixtures of Multiple Testing Procedures with Gatekeeping Applications
Alex Dmitrienko
Eli Lilly and Company
Ajit C. Tamhane
Northwestern University
Lingyun Liu
Northwestern University
December 2008
Abstract
This paper introduces a general framework for constructing gatekeeping procedures
for multiple testing problems arising in clinical trials with hierarchical objectives.
These problems frequently exhibit a complex structure, including multiple families of
hypotheses and logical restrictions. The framework is based on combining multiple
tests across families and enables clinical trial sponsors to set up powerful and flexible
multiple testing procedures (e.g., gatekeeping procedures based on Dunnett tests
that account for logical restrictions among the hypotheses of interest). A clinical trial
example is used to illustrate the general approach.
Keywords and Phrases: Multiple comparisons; Closure principle; Gatekeeping proce-
dures; Bonferroni test; Dunnett test.
1. Introduction
Gatekeeping procedures are commonly used in multiple testing problems with a
hierarchical structure, including problems arising in clinical trials with multiple ob-
jectives. These objectives may represent primary endpoints, secondary endpoints and
subgroup analyses, etc. To account for the hierarchical structure of these objectives,
null hypotheses associated with the objectives are grouped into families. Consider,
for example, a multiple testing problem involving n null hypotheses H1,...,Hn that
are grouped into m families:
Fk = {Hi, i ∈ Nk}, k = 1,...,m, m ≥ 2,
where N1 = {1,...,n1}, Nk = {n1 + ... + nk−1 + 1,...,n1 + ... + nk}, k = 2,...,m,
and n1 + ... + nm = n.
Dmitrienko, Tamhane and Wiens (2008) introduced a framework for constructing
multistage parallel gatekeeping procedures. A parallel gatekeeping procedure tests
hypotheses in Family Fk, k = 2,...,m, only if one or more hypotheses are rejected in
Fk−1. Dmitrienko, Tamhane and Wiens proposed a general algorithm for setting up
parallel gatekeeping procedures with an attractive stepwise form based on tests from
a broad class of multiple testing procedures (known as separable procedures).
One of the limitations of the framework proposed by these authors is that it
cannot be used in problems with logical restrictions, i.e., when the acceptance or
rejection of hypotheses in Fk, k = 2,...,m, depends on the outcomes of signifi-
cance tests in F1,...,Fk−1. Multiple testing problems with logical restrictions are
frequently encountered in clinical trials. Examples are given in Chen, Luo and
Capizzi (2005), Quan, Luo and Capizzi (2005), Dmitrienko, Offen, Wang and Xiao
(2006), Dmitrienko, Wiens, Tamhane and Wang (2007), Dmitrienko, Tamhane, Liu
and Wiens (2008).
This paper describes a framework that enables clinical trial sponsors to set up
flexible multiple testing procedures for problems with a very general class of logical
restrictions (monotone logical restrictions). The framework is based on combining
multiple tests across families using the concept of a mixture of multiple testing pro-
cedures. This term is used here to make an analogy with mixtures of distributions
(Everitt and Hand, 1981). To specify a mixture distribution, one needs to specify
component distributions and a mixing distribution. Similarly, in the case of mix-
ture procedures, one needs to select component procedures and a mixing function.
The mixing function is selected to take into account the logical relationships among
multiple families and provide strong control of the familywise error rate (FWER)
(Hochberg and Tamhane, 1987). The mixture-based framework uses the closure prin-
ciple (Marcus, Peritz and Gabriel, 1976) to achieve FWER control.
The paper is organized as follows. Section 2 introduces the mixture-based frame-
work for an arbitrary number of families. Section 3 defines a class of monotone logical
restrictions. Section 4 describes mixing functions that can be used to construct mix-
tures of multiple testing procedures. Properties of mixture procedures are described
in Section 5. Lastly, Section 6 gives examples of mixture procedures (including mix-
tures of Bonferroni and Dunnett procedures) and a clinical trial example to illustrate
the mixture-based framework.
2. Mixture procedures
Consider the multiple testing problem defined in the Introduction. Let Hk denote
the closed family associated with Fk, i.e.,
Hk = {HIk , Ik ⊆ Nk}, where HIk = ∩i∈Ik Hi.
Further, consider multiple testing procedures, known as component procedures, T1,...,Tm.
The procedure Tk, k = 1,...,m, is assumed to be a closed testing procedure that
controls the FWER in the strong sense within Fk. This means that there exists a
set of α-level tests for each intersection hypothesis in Hk such that Tk rejects Hi,
i ∈ Nk, if and only if (iff) all intersection hypotheses including Hi are rejected by the
intersection hypothesis tests. For example, if Tk is the Holm procedure, each inter-
section hypothesis is tested using the Bonferroni test at α. Let pk(Ik), Ik ⊆ Nk, be
the p-value for the intersection hypothesis test associated with HIk . The intersection
hypothesis HIk is rejected iff pk(Ik) ≤ α.
A mixture of the component procedures, denoted by T, is a procedure for testing
all hypotheses in F = F1 ∪ ... ∪ Fm. Let N = {1,...,n} and let H denote the closed
family associated with F, i.e., H = {HI,I ⊆ N}. For each index set I ⊆ N, let
Ik = I ∩ Nk, k = 1,...,m. To define a mixture procedure, one needs to define α-
level tests for all intersection hypotheses in H. Consider any non-empty intersection
hypothesis HI, I ⊆ N. The test for this intersection hypothesis is defined as follows:
Case 1. HI contains hypotheses only from Fk, k = 1,...,m, i.e., I = Ik. The p-value
for HI is given by p(I) = pk(Ik).
Case 2. HI contains hypotheses from Fi1 ,...,Fis for s ≥ 2, i.e., I = Ii1 ∪ ... ∪ Iis .
The p-value for HI is given by
p(I) = mI(pi1 (Ii1 ),...,pis (Iis )),
where mI(xi1 ,...,xis ) is a mixing function.
Mixing functions have the following properties:
• 0 ≤ mI(xi1 ,...,xis ) ≤ 1, 0 ≤ xik ≤ 1, k = 1,...,s.
• mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α.
• The test for HI is an α-level test, i.e., P(p(I) ≤ α) ≤ α.
Examples of mixing functions are given in Section 4.
Given the p-values for each intersection in the closed family, the p-value for a
hypothesis in F is computed using the closure principle. For the hypothesis Hi,
i ∈ N, the adjusted p-value is defined as the maximum over the p-values for the
intersections containing this hypothesis, i.e.,
˜pi = max
I: i∈I
p(I).
Since the mixture procedure is constructed using α-level tests for all intersection
hypotheses in H, the procedure controls the FWER in the strong sense at an α level.
3. Logical restrictions
Mixtures of multiple testing procedures are constructed to account for logical
relationships among the hypotheses in F1,...,Fm. Dmitrienko, Wiens, Tamhane and
Wang (2007) and Dmitrienko, Tamhane, Liu and Wiens (2008) proposed to formulate
logical relationships in terms of serial and parallel gatekeeping sets. In this case a
hypothesis in Fk+1, k = 1,...,m−1, is tested iff all hypotheses are rejected in a certain
subset of F1,...,Fk (known as the serial gatekeeping set) and at least one hypothesis
is rejected in another subset of F1,...,Fk (known as the parallel gatekeeping set).
A more general family of monotone logical restrictions is introduced below. The
restrictions are defined using restriction functions. Consider a hypothesis in Fs+1,
s = 1,...,m−1, say, Hi, i ∈ Ns+1. The restriction function Li(I), I ⊆ N1 ∪...∪Ns,
assumes two values, Li(I) = 0 or 1. Here Li(I) = 0 means that Hi is not testable, i.e.,
it is accepted without test if the hypotheses Hj, j ∈ I, are accepted and Li(I)=1
means that Hi is testable. The function Li(I) meets the following conditions:
• Monotonicity condition: If Li(I ) = 0 and I ⊆ I then Li(I )=0.
• Parallel gatekeeping condition: If Nk ⊆ I then Li(I) = 0 for all i ∈ Ns for
s = k + 1,...,m.
Note that, by the monotonicity condition, if a hypothesis in Fs+1 is not testable
given a set of accepted hypotheses in F1,...,Fs, it will remain non-testable if more
hypotheses are accepted in F1,...,Fs. Further, it follows from the parallel gatekeeping
condition that all hypotheses are non-testable (and are automatically accepted) in
Fs+1 if all hypotheses are accepted in Fk, k = 1,...,s.
In order to account for logical restrictions, the definition of a mixture of two
multiple testing procedures needs to be modified as follows:
Case 1. HI contains hypotheses only from Fk, k = 1,...,m, i.e., I = Ik. The p-value
for HI is given by p(I) = pk(Ik).
Case 2. HI contains hypotheses from Fi1 ,...,Fis for s ≥ 2, i.e., I = Ii1 ∪ ... ∪ Iis .
For any k = 2,...,s, let I∗
ik
be the subset of Iik
that includes the indices of
hypotheses that are logically consistent, i.e., testable, with the hypotheses from
Fi1 ,...,Fik−1 . In other words,
I∗
ik
= {i : i ∈ Iik and Li(Ii1 ∪ ... ∪ Iik−1 )=1}.
Assume first that I∗
is
is not empty. In this case the p-value for HI is given by
p(I) = mI(pi1 (Ii1 ),pi2 (I∗
i2
),...,pis (I∗
is
)),
where pik (I∗
ik
)=1if I∗
ik
is empty, k = 2,...,s − 1. Further, if I∗
ir+1
,...,I∗
is
are
empty for some r = 1,...,s − 1 then
p(I) = mJ (pi1 (Ii1 ),pi2 (I∗
i2
),...,pir (I∗
ir
)),
where J = Ii1 ∪ ... ∪ Iir and pik (I∗
ik
)=1if I∗
ik
is empty.
4. Mixing functions
This section defines mixing functions based on the Bonferroni and Dunnett global
tests. Both these mixing functions satisfy the properties listed in Section 2 and have
the same general form:
mI(xi1 ,...,xis ) = min
(
xi1
ci1
,...,
xis
cis
)
,
where I = Ii1 ∪ ... ∪ Iis as before and ci1 ,...,cis is a non-increasing sequence of co-
efficients with 1 = ci1 ≥ ... ≥ cis ≥ 0. This sequence is non-increasing to account for
the hierarchical structure of the problem, i.e., families placed earlier in the sequence
are more important (and receive greater weights) than those later in the sequence.
The Bonferroni and Dunnett mixing functions differ in terms of the choice of these
coefficients. For the Bonferroni mixing function, the coefficients are denoted by b’s
and for the Dunnett mixing function by d’s.
4.1 Bonferroni mixing function
To define this function, consider the error rate function of the procedure Tk, k =
1,...,m − 1, introduced in Dmitrienko, Tamhane and Wiens (2008). Since an exact
expression for the error rate function is, in general, difficult to derive, we will focus
on an upper bound, ek(Ik), for the true error rate function, i.e.,
P(pk(Ik) ≤ α) ≤ ek(Ik)
for fixed α. As in Dmitrienko, Tamhane and Wiens (2008), we will treat ek(Ik) as
the actual error rate function. Error rate functions have the following properties:
ek(∅)=0, ek(I ) ≤ ek(I ) if I ⊆ I , ek(Nk) = α.
Also, let fk(Ik) = ek(Ik)/α.
Assume that T1,...,Tm−1 are separable, i.e., fk(Ik) < 1 for all α if Ik is a proper
subset of Nk, k = 1,...,m − 1. The Bonferroni mixing function is given by
mI(xi1 ,...,xis ) = min
(
xi1
bi1
,...,
xis
bis
)
,
where bi1 = 1 and bik = bik−1 (1 − fik−1 (Iik−1 )), k = 2,...,s. It is clear that
0 ≤ mI(xi1 ,...,xis ) ≤ 1 if 0 ≤ xik ≤ 1
and
mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α.
Since T1,...,Tm−1 are separable, bik > 0 if Iir−1 is a proper subset of Nir−1 for all
r = 2,...,k. On the other hand, bik = ... = bis = 0 if Iik−1 = Nik−1 and thus
mI(xi1 ,...,xis ) = min
(
xi1
bi1
,...,
xik−1
bik−1
)
.
It is easy to verify that the resulting test for HI is an α-level test. By the Bonferroni
inequality,
P(p(I) ≤ α) ≤
s
∑
k=1
P(pik (Iik ) ≤ αbik )
≤
s−1
∑
k=1
αbik fik (Iik ) + αbis
since P(pik (Iik ) ≤ x) ≤ xfik (Iik ), k = 1,...,s − 1, and P(pis (Iis ) ≤ x) ≤ x. Further,
it is easy to see that bis−1 fis−1 (Iis−1 ) + bis = bis−1 since bis = bis−1 (1 − fis−1 (Iis−1 )).
Doing this recursively, we have
s−1
∑
k=1
bik fik (Iik ) + bis = bi1 = 1
and thus P(p(I) ≤ α) ≤ α.
4.2 Dunnett mixing function
The Bonferroni mixing function defined above is based on the Bonferroni inequal-
ity and thus does not account for the correlation among pi1 (Ii1 ),...,pis (Iis ). By
contrast, the Dunnett mixing function explicitly utilizes the joint distribution of the
p-values.
Assume again that Tk is separable, k = 1,...,m−1. The Dunnett mixing function
is given by
mI(xi1 ,...,xis ) = min
(
xi1
di1
,...,
xis
dis
)
,
where di1 = 1 and dik , k = 2,...,s, are defined sequentially as follows:
P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ni2 ) ≤ αdi2 ) = α,
P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ii2 ) ≤ αdi2 or pi3 (Ni3 ) ≤ αdi3 ) = α,
...
P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ii2 ) ≤ αdi2 or ... or
pis−2 (Iis−2 ) ≤ αdis−2 or pis−1 (Nis−1 ) ≤ αdis−1 ) = α,
P(pi1 (Ii1 ) ≤ αdi1 or pi2 (Ii2 ) ≤ αdi2 or ... or
pis−1 (Iis−1 ) ≤ αdis−1 or pis (Iis ) ≤ αdis ) = α.
It follows from the equations that dik
> 0 if Iir−1 is a proper subset of Nir−1 , r =
2,...,k, and dik = ... = dis = 0 if Iik−1 = Nik−1 .
As in Section 4.1, it is easy to see that 0 ≤ mI(xi1 ,...,xis ) ≤ 1 if 0 ≤ xik ≤ 1 and
mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α. Further, by the definition of dik , k = 1,...,s,
P(p(I) ≤ α) = P(pi1 (Ii1 ) ≤ αdi1 or ... or pis (Iis ) ≤ αdis )
= α
and thus the resulting test for HI is an α-level test.
Since the Dunnett mixing function takes into account the joint distribution of test
statistics, mixture procedures based on this function are more powerful than those
based on the Bonferroni mixing function.
5. Properties of mixture procedures
This section summarizes key properties of mixture procedures.
5.1 General properties
We will begin with a discussion of general properties, including consistency with
logical restrictions and independence (inferences in F1 are independent of inferences
in F2).
Proposition 1 Assume that T is consonant in F1,...,Fk, k = 1,...,m − 1, then
the mixture procedure T is consistent with the logical restrictions in Fk+1. In other
words, T accepts Hi, i ∈ Nk+1, at the α level if Li(A1 ∪ ... ∪ Ak)=0, where Ar is
the index set of accepted hypotheses in Fr, r = 1,...,k.
Note that, if T is not consonant in F1,...,Fk, the logical restrictions may be
violated in Fk+1 in the sense that Hi, i ∈ Nk+1, may be rejected even though Li(A1 ∪
... ∪ Ak) = 0. However, the logical restrictions can always be enforced by modifying
multiplicity-adjusted p-values in Fk+1. This can be done using an algorithm similar
to that proposed in Kordzakhia et al. (2008).
Proposition 2 The mixture procedure T is equivalent to the procedure T1 within the
first family. In other words, T rejects a hypothesis in F1 at the α level iff T1 rejects
this hypothesis at the α level.
Proposition 3 The mixture procedure T is equivalent to the procedure Tk, k =
2,...,m, within Fk if T rejects all hypotheses in F1,...,Fk−1. In other words, T
rejects a hypothesis in Fk at the α level iff Tk rejects this hypothesis at the α level
provided all hypotheses in F1,...,Fk−1 are rejected by T.
The proofs of Propositions 1, 2 and 3 are given in the Appendix.
5.2 Stepwise mixture procedures with parallel gatekeeping restrictions
When parallel gatekeeping restrictions are considered, mixture procedures based
on the Bonferroni mixing function admit a stepwise representation. This means that
the mixture procedure is, in fact, identical to a stepwise application of the com-
ponent procedures with an adjustment of the significance level in the last m − 1
families. This result is equivalent to the main result in Dmitrienko, Tamhane and
Wiens (2008) and shows that the mixture framework is an extension of the framework
of multistage gatekeeping procedures introduced in that paper. In particular, mul-
tistage gatekeeping procedures considered by Dmitrienko, Tamhane and Wiens are
mixtures of component procedures used at individual stages based on the Bonferroni
mixing function.
To demonstrate that mixture procedures based on the Bonferroni mixing func-
tion are equivalent to multistage gatekeeping procedures proposed by Dmitrienko,
Tamhane and Wiens, we will consider a two-family problem. The proof can be ex-
tended to the general case of m families by recursion.
Proposition 4 Assume that
• Only parallel gatekeeping restrictions are imposed, i.e., Li(N1)=0, i ∈ N2, and
Li(I1)=1, i ∈ N2, I1 ⊂ N1.
• The procedure T1 is separable and consonant.
• The Bonferroni mixing function is used.
The mixture procedure T has the following two-stage structure:
• The hypotheses in F1 are tested at the familywise level α1 = α using T1.
• The hypotheses in F2 are tested at the level α2 = α−e1(A1) using T2, where e1(I)
is the error rate function of T1 and A1 is the index set of accepted hypotheses
in F1.
The proof of Proposition 4 is given in the Appendix.
5.3 Mixture procedures with general logical restrictions
As shown in Proposition 4, mixture procedures in problems with parallel gate-
keeping restrictions have an attractive stepwise form. The following counterexample
shows that mixtures of testing procedures with general logical restrictions may not
have a stepwise form.
Consider a two-family problem with N1 = {1,2} and N2 = {3,...,n}. Assume
that the hypotheses within each family are equally weighted. Further, consider a
mixture of the Bonferroni procedure in F1 and Holm procedure in F2 based on the
Bonferroni mixing function. The following logical restrictions are assumed:
• H3,...,Hn−1 are testable iff H2 is rejected.
• Hn is testable iff at least one hypothesis in F1 is rejected.
In other words,
• If I1 = ∅ or I1 = {1}, then L3(I1) = L4(I1) = ... = Ln(I1) = 1.
• If I1 = {2}, then L3(I1) = L4(I1) = ... = Ln−1(I1) = 0 and Ln(I1) = 1.
• If I1 = {1,2}, then L3(I1) = L4(I1) = ... = Ln(I1) = 0.
To demonstrate that the mixture of the two procedures does not have a stepwise
form, it is sufficient to focus on the case when T1 (Bonferroni procedure) rejects H1
but accepts H2. By the logical restrictions, only one hypothesis is testable in F2 in
this case (namely, the hypothesis Hn). If the mixture procedure had a stepwise form,
this hypothesis would have been tested by T2 (Holm procedure), i.e., its decision
rule would have been expressed in terms of pn compared to an appropriately chosen
significance level. However, as shown in Proposition 5, this is not the case.
Proposition 5 Let q(1) ≤ ... ≤ q(n−2) denote the ordered p-values in F2 and assume
that pn is the kth ordered p-value, i.e., pn = q(k), k = 1,...,n−2. Then the hypothesis
Hn is rejected iff all of the following conditions are met
pn ≤ α/2, q(i) ≤ α/(n − i − 1), i = 1,...,k.
The proof of Proposition 5 is given in the Appendix.
6. Examples
In this section we will give examples of mixture procedures that help illustrate
the general method introduced in Section 2.
6.1 Mixtures of Bonferroni procedures
Consider a problem of testing n hypotheses and let w1,...,wn denote the weights
assigned to the hypotheses in the m families. The weights are non-negative and sum
to 1 within each family, i.e.,
wi ≥ 0, i = 1,...,n,
∑
i∈Nk
wi = 1, k = 1,...,m.
The n hypotheses are grouped into m families. Assume that the first m−1 families
are tested using a weighted version of the Bonferroni procedure and the last family
is tested using a weighted version of the Holm procedure. In other words,
pk(Ik) = min
i∈Ik
(pi/wi) if Ik ⊆ Nk, k = 1,...,m − 1,
pm(Im) =
∑
k∈Im
wk
min
i∈Im
(pi/wi) if Im ⊆ Nm.
We will assume first that parallel gatekeeping restrictions are imposed, i.e.,
Li(Nk−1) = 0, i ∈ Nk, k = 2,...,m,
Li(Ik−1) = 1, i ∈ Nk, Ik−1 ⊂ Nk−1, k = 2,...,m.
Noting that the error rate function for the weighted Bonferroni procedure is given by
ek(Ik) = α
∑
i∈Ik
wi, k = 1,...,m − 1,
it can be shown that the mixture of the m procedures based on the Bonferroni mixing
function is defined as follows. Let HI, I ⊆ N, be a non-empty intersection hypothesis.
If I ⊆ Nk, k = 1,...,m, then p(I) = pk(Ik), where Ik = I ∩ Nk. If HI contains
hypotheses from Fi1 ,...,Fis for s ≥ 2, the p-value for HI is given by
p(I) = min
i∈I
pi
vi(I)
,
where
vi(I) = v∗
k(I)wi, i ∈ Iik , k = 1,...,s − 1,
vi(I) = v∗
s (I)wi, i ∈ Iis and is = m,
vi(I) = v∗
s (I)wi/
∑
k∈Iis
wk, i ∈ Iis and is = m,
v∗
1(I) = 1, v∗
k+1(I) = v∗
k(I)
1 −
∑
i∈Iik
wi
, k = 1,...,s − 1.
The resulting procedure is equivalent to the Bonferroni-based parallel gatekeeping
procedure (Dmitrienko, Offen and Westfall, 2003).
Further, we will consider the general case of monotone logical restrictions. The
mixture procedure based on the Bonferroni mixing function has a structure similar
to that of the parallel gatekeeping procedure. First p(I) = pk(Ik) if I ⊆ Nk, k =
1,...,m, where Ik = I ∩ Nk. Further, if HI contains hypotheses from Fi1 ,...,Fis for
s ≥ 2, then the p-value for HI is given by
p(I) = min
i∈I∗
pi
vi(I)
,
where I∗ = Ii1 ∪ I∗
i2
∪ ... ∪ I∗
is
and
vi(I) = v∗
k(I)wi, i ∈ I∗
ik
, k = 1,...,s − 1,
vi(I) = v∗
s (I)wi, i ∈ I∗
is
and is = m,
vi(I) = v∗
s (I)wi/
∑
k∈Iis
wk, i ∈ I∗
is
and is = m,
v∗
1(I) = 1, v∗
k+1(I) = v∗
k(I)
1 −
∑
i∈Iik
wi
, k = 1,...,s − 1.
Note that the presence of logical restrictions has an impact only on the index sets used
in the decision rule in the sense that a hypothesis is removed from the decision rule
if is not consistent with the logical restrictions. The process of combining component
procedures is not affected by logical restrictions and therefore v∗
1(I),...,v∗
s (I) remain
the same. This mixture procedure is equivalent to the tree gatekeeping procedure
based on Algorithm III (Kordzakhia et al., 2008).
It is also important to note that the weighting scheme used in this mixture pro-
cedure satisfies the monotonicity condition (Condition 3) formulated in Dmitrienko,
Tamhane, Liu and Wiens (2008). Weighting schemes proposed in other papers, in-
cluding Algorithm 2 in Dmitrienko, Tamhane, Liu and Wiens (2008), do not always
satisfy the monotonicity condition and gatekeeping procedures based on those schemes
can be inconsistent with the prespecified logical restrictions. In this case, the logical
restrictions need to be enforced as explained in Section 5.1.
6.2 Mixtures of Dunnett procedures
The algorithm given in Section 6.1 can be easily extended to construct more power-
ful mixture procedures, e.g., mixtures of Dunnett procedures based on the Bonferroni
mixing function. Considering a general problem of testing n hypotheses grouped into
m families, let ti, i ∈ N, denote the test statistic associated with Hi and assume that
ti, i ∈ Nk, follow a multivariate t distribution for any k = 1,...,m. Suppose that the
hypotheses in Fk, k = 1,...,m, are tested using the Dunnett procedure. In this case,
the p-value for the intersection hypothesis HIk , Ik ⊆ Nk, k = 1,...,m, is given by
pk(Ik)=1 − G|Ik|
(
max
i∈Ik
ti
)
,
where Gn(x) is the cumulative distribution function of the n-variate one-sided Dun-
nett distribution, i.e.,
G|Ik|(x) = P
(
max
i∈Ik
t∗
i ≤ x
)
,
and t∗
i , i ∈ Nk, have the same joint distribution as ti, i ∈ Nk, under the global null
hypothesis. A mixture of the Dunnett procedures based on the Bonferroni mixing
function can now be defined using the steps described in Section 6.1.
6.3 Clinical trial example
The mixture procedures introduced in Sections 6.1 and 6.2 will be illustrated here
using a clinical trial example from Dmitrienko, Offen, Wang and Xiao (2006) and
Dmitrienko, Wiens, Tamhane and Wang (2007, Section 6). Consider a clinical trial
in patients with Type II diabetes conducted to test three doses of an experimental
treatment versus placebo. The three doses are labeled L, M and H and the placebo is
Table 1. Test statistics and raw p-values in the Type II diabetes clinical
trial example.
Family
Null
Test
P-value
hypothesis statistic
F1
H1
2.81
0.005
H2
2.56
0.011
H3
2.39
0.018
F2
H4
2.61
0.009
H5
2.24
0.026
H6
2.50
0.013
F3
H7
2.60
0.010
H8
2.78
0.006
H9
1.96
0.051
labeled Plac. The dose-placebo comparisons are performed with respect to three or-
dered endpoints, Endpoint P (Hemoglobin A1c), Endpoint S1 (Fasting serum glucose)
and Endpoint S2 (HDL cholesterol). The sample size per arm is 87 patients.
The resulting nine hypotheses of no treatment effect (three dose-placebo compar-
isons times three endpoints) are grouped into three families:
• Family F1: H-Plac (H1), M-Plac (H2) and L-Plac (H3) comparisons for End-
point P.
• Family F2: H-Plac (H4), M-Plac (H5) and L-Plac (H6) comparisons for End-
point S1.
• Family F3: H-Plac (H7), M-Plac (H8) and L-Plac (H9) comparisons for End-
point S2.
The three doses are assumed to be equally important and thus the hypotheses are
equally weighted within each family, i.e., wi = 1/3, i = 1,...,9. The two-sample t
statistics and associated p-values for the nine hypotheses are listed in Table 1.
The null hypotheses in this clinical trial example will be tested using three multiple
testing procedures:
• Procedure 1 (Mixture of Bonferroni and Holm procedures with parallel gate-
keeping restrictions). The hypotheses in F1 and F2 are tested using the Bonfer-
roni procedure and the hypotheses in F3 are tested using the Holm procedure.
The mixture procedure is based on the Bonferroni mixing function with the
parallel gatekeeping restrictions defined in Section 6.1.
• Procedure 2 (Mixture of Bonferroni and Holm procedures with multiple-sequence
restrictions). This procedure is similar to Procedure 1 in the sense that it is also
a mixture of the Bonferroni procedures in F1 and F2 and Holm procedure in
F3 based on the Bonferroni mixing function. However, unlike Procedure 1, this
procedure uses a more general type of logical restrictions known as multiple-
sequence restrictions. A hypothesis in Fk, k = 2,3, is tested if higher-level
hypotheses associated with the same dose are rejected, e.g., H7 is testable iff
H1 and H4 are rejected. More formally,
– Li(I1)=0if I1 contains i − 3 and Li(I1) = 1 otherwise, i = 4, 5,6.
– Li(I1 ∪I2)=0if I1 ∪I2 contains i−3 or i−6 and Li(I1 ∪I2) = 1 otherwise,
i = 7, 8,9.
• Procedure 3 (Mixture of Dunnett procedures with multiple-sequence restric-
tions). This procedure is a mixture of the Dunnett procedures in F1, F2 and
F3 based on the Bonferroni mixing function and imposes multiple-sequence re-
strictions defined above.
Beginning with Procedures 1 and 2, adjusted p-values can be computed using the
algorithm given in Section 6.1. This algorithm is based on a complete enumeration
of all non-empty intersections of the original nine hypotheses. A p-value is computed
for each intersection and then the p-values for the original hypotheses are found using
the closure principle (see Section 2 for more details). As an illustration, consider the
intersection hypothesis corresponding to the index set I = {1,3,5, 6,7,8,9}, i.e.,
HI = H1 ∩ H3 ∩ H5 ∩ H6 ∩ H7 ∩ H8 ∩ H9.
Assuming parallel gatekeeping restrictions (Procedure 1), one first needs to define
p-values for HI1 , HI2 and HI3 , where I1 = {1,3}, I2 = {5, 6} and I3 = {7,8,9}.
Using the raw p-values displayed in Table 1, the p-values are computed based on
the Bonferroni and Holm procedures as shown below
p1(I1) = n1 min(p1,p2)=0.015,
p2(I2) = n2 min(p5,p6)=0.039,
p3(I3) = |I3|min(p7,p8,p9)=0.018,
where n1 = 3, n2 = 3 and |I3| = 3. Using the Bonferroni mixing function, the p-value
for HI is given by
p(I) = min
(
p1(I1)
b1
,
p2(I2)
b2
,
p3(I3)
b3
)
,
Table 2. Mixtures of three procedures with parallel gatekeeping restric-
tions (Procedure 1) and multiple-sequence restrictions (Procedure 2) in
the Type II diabetes clinical trial example. The asterisk identifies the
adjusted p-values that are significant at the 0.05 level.
Family
Null
Adjusted p-value
hypothesis Procedure 1 Procedure 2
F1
H1
0.015∗
0.015∗
H2
0.033∗
0.033∗
H3
0.054
0.054
F2
H4
0.041∗
0.041∗
H5
0.078
0.078
H6
0.054
0.054
F3
H7
0.054
0.045∗
H8
0.054
0.078
H9
0.077
0.077
where b1 = 1 and, to compute bk, k = 2,3, one needs to utilize the error rate
function of the Bonferroni procedure. As shown in Section 6.1, ek(Ik) = α|Ik|/nk or,
equivalently, fk(Ik) = |Ik|/nk, k = 1, 2, and thus
b2
= b1(1 − f1(I1)) = 1 −
|I1|
n1
=
1
3
,
b3
= b2(1 − f2(I2)) =
1
3
(
1 −
|I2|
n2
)
=
1
9
.
This immediately implies that p(I)=0.015.
Now consider the case of multiple-sequence restrictions (Procedure 2). The index
sets I2 and I3 need to be modified to account for the logical restrictions. Note that
H6 depends on H3, H7 depends on H1, H8 depends on H5, H9 depends on H3 and
H6. Thus the modified index sets are given by I∗
2 = {5} and I∗
3 = ∅. The next step
is to compute the p-values for HI1 and HI∗
2
,
p1(I1) = n1 min(p1,p2)=0.015,
p2(I∗
2 ) = n2p5 = 0.078.
Lastly, the p-value for HI is given by
p(I) = min
(
p1(I1)
b1
,
p2(I∗
2 )
b2
)
,
where bk, k = 1,2, are defined above, i.e., b1 = 1 and b2 = 1/3, and therefore
p(I)=0.015.
Table 2 displays the raw p-values for the nine hypotheses of interest along with
the adjusted p-values produced by the two procedures. Procedure 1 rejects three
hypotheses in this problem (H1, H2 and H4) and Procedure 2 one more hypothesis
(H7). It is easy to verify that, as shown in Proposition 1, both procedures are con-
sistent with the logical restrictions (note that the Bonferroni procedures in F1 and
F2 are consonant and thus there is no need to enforce the logical restrictions). Fur-
ther, as stated in Proposition 2, Procedures 1 and 2 are equivalent to the Bonferroni
procedure in F1. Indeed, the adjusted p-values for the hypotheses in F1 are equal to
Bonferroni-adjusted p-values (each raw p-value is multiplied by 3).
Further, it is worth noting that Procedure 1 is based on parallel gatekeeping
restrictions and thus, as shown in Proposition 4, it has a stepwise representation.
This procedure is identical to a stepwise application of the Bonferroni procedures in
F1 and F2 and Holm procedure in F3 with appropriate adjustments of the significance
levels in F2 and F3. For more information, see Dmitrienko, Tamhane and Wiens (2008,
Section 6).
The calculation of adjusted p-values for Procedure 3 is based on an algorithm
similar to the one used in Section 6.1. The only change that needs to be made is that
the Bonferroni and Holm p-values for intersection hypotheses need to be replaced by
the Dunnett p-values defined in Section 6.2. To illustrate the process, select the same
intersection as above, i.e.,
HI = H1 ∩ H3 ∩ H5 ∩ H6 ∩ H7 ∩ H8 ∩ H9.
and consider the multiple-sequence restrictions. The modified index sets are I∗
2 = {5}
and I∗
3 = ∅. Given the sample size per arm (87 patients) and number of doses
(3 doses), the Dunnett p-values for HI1 and HI∗
2
are computed using the one-sided
Dunnett distribution with 3 and 344 degrees of freedom. These p-values are given by
p1(I1) = 1 − G2(max(t1,t3)) = 0.0073,
p2(I∗
2 ) = 1 − G1(t5)=0.0336,
where F(x) is the cumulative distribution function of the Dunnett distribution. Fur-
ther, this mixture is also based on the Bonferroni mixing function and thus b1 = 1
and b2 = 1/3. Therefore,
p(I) = min
(
p1(I1)
b1
,
p2(I∗
2 )
b2
)
= 0.0073.
The adjusted p-values produced by Procedure 3 are shown in Table 3. One can
see from this table that the mixture of Dunnett procedures rejects more hypotheses
Table 3. Mixture of three procedures with multiple-sequence restric-
tions (Procedure 3) in the Type II diabetes clinical trial example. The
asterisk identifies the adjusted p-values that are significant at the 0.05
level.
Family
Null
Adjusted
hypothesis
p-value
F1
H1
0.007∗
H2
0.015∗
H3
0.023∗
F2
H4
0.019∗
H5
0.034∗
H6
0.023∗
F3
H7
0.023∗
H8
0.034∗
H9
0.064
than a similar procedure based on the Bonferroni and Holm procedures (Procedure
2). Specifically, Procedure 3 rejects eight hypotheses whereas Procedure 2 rejects only
four hypotheses. This is a direct consequence of the fact that the Dunnett procedure
is uniformly more powerful than the Bonferroni procedure.
It is worth noting that the mixture of Dunnett procedures defined above can serve
as a computationally attractive alternative to the Dunnett-based parallel gatekeeping
procedure with logical restrictions introduced in Dmitrienko, Offen, Wang and Xiao
(2006). The parallel gatekeeping procedure requires the computation of a vector
of critical values for each intersection hypothesis in the closed family based on the
multivariate distribution of the associated test statistics. Even in the case of nine
hypotheses, the algorithm is computationally intensive (it involves the evaluation
of multivariate probabilities for up to six dimensions). By contrast, the mixture
procedure is based on regular Dunnett-adjusted p-values that are combined across
the three families. This approach considerably simplifies the calculation of adjusted
p-values and leads to a relatively small reduction in the overall power compared to
the Dunnett-based parallel gatekeeping procedure.
References
Chen, X., Luo, X., Capizzi, T. (2005). The application of enhanced parallel gate-
keeping strategies. Statistics in Medicine. 24, 1385–1397.
Dmitrienko, A., Offen, W.W., Westfall, P.H. (2003). Gatekeeping strategies for
clinical trials that do not require all primary effects to be significant. Statistics
in Medicine. 22, 2387–2400.
Dmitrienko, A., Offen, W., Wang, O., Xiao, D. (2006). Gatekeeping procedures in
dose-response clinical trials based on the Dunnett test. Pharmaceutical Statis-
tics. 5, 19–28.
Dmitrienko, A., Wiens, B.L., Tamhane, A.C., Wang, X. (2007). Tree-structured
gatekeeping tests in clinical trials with hierarchically ordered multiple objec-
tives. Statistics in Medicine. 26, 2465–2478.
Dmitrienko, A., Tamhane, A., Liu, L., Wiens, B. (2008). A note on tree gatekeeping
procedures in clinical trials. Statistics in Medicine. 27, 3446–3451.
Dmitrienko, A., Tamhane, A., Wiens, B. (2008). General multistage gatekeeping
procedures. Biometrical Journal. 50, 667–677.
Everitt, B.S., Hand, D.J. (1981). Finite Mixture Distributions. Chapman and Hall,
London, New York.
Hochberg, Y., Tamhane, A.C. (1987). Multiple Comparison Procedures. New York:
John Wiley and Sons.
Kordzakhia, G., Dinh, P., Bai, S., Lawrence, J., Yang, P. (2008). Bonferroni-based
tree-structured gatekeeping testing procedures. Unpublished manuscript.
Marcus, R. Peritz, E., Gabriel, K.R. (1976). On closed testing procedures with
special reference to ordered analysis of variance. Biometrika. 63, 655–660.
Quan, H., Luo, X., Capizzi, T. (2005). Multiplicity adjustment for multiple end-
points in clinical trials with multiple doses of an active treatment. Statistics in
Medicine. 24, 2151–2170.
Appendix
Proof of Proposition 1. Consider a hypothesis in Fk+1, k = 1,...,m − 1, say,
Hi, i ∈ Nk+1, and assume that Li(A1 ∪ ... ∪ Ak) = 0. Let Is = As, s = 1,...,k,
Ik+1 = {i}. Further, let J = I1 ∪ ... ∪ Ik and I = I1 ∪ ... ∪ Ik+1. Considering the
intersection hypothesis HI, note that Li(I1 ∪ ... ∪ Ik) = 0 and thus
I∗
k+1 = {i : i ∈ Ik+1 and Li(I1 ∪ ... ∪ Ik)=1}
is empty. Therefore,
p(I) = mJ (p1(I1),p2(I∗
2 ),...,pk(I∗
k )).
Note that the mixture procedure T accepts all hypotheses Hj, j ∈ J and T is con-
sonant in F1,...,Fk. Therefore, p(J) > α (if p(J) was less than or equal to α, then
T would reject at least one hypothesis Hj with i ∈ J; however, all hypotheses Hj,
j ∈ J, are rejected, which implies that p(J) > α). Further, p(I) = p(J) > α and the
index set I contains i. Thus, T accepts Hi. The proof is complete.
Proof of Proposition 2. Assume first that T1 rejects Hi, i ∈ N1. This means that
p1(I1) ≤ α for any I1 ⊆ N1 if i ∈ I1. Now consider any index set I ⊆ N that contains
i. In general, I = Ii1 ∪...∪Iis , where Iir ⊆ Nir , r = 1,...,s, i1 = 1 and s = 1,...,m,
and
p(I) = mI(pi1 (Ii1 ),...,pis (Iis )).
By the definition of a mixture function, mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α and, since
xi1 = p1(I1) ≤ α, we conclude that p(I) is no greater than α and thus T rejects Hi.
Now assume that T rejects Hi, i ∈ N1. In this case, p(I) ≤ α for any I ⊆ N
that contains i, which immediately implies that p1(I1) ≤ α for any I1 ⊆ N1 if i ∈ I1.
Therefore, T1 rejects Hi. The proof is complete.
Proof of Proposition 3. Assume that all hypotheses in F1,...,Fk−1 are rejected by
T and Tk rejects Hi, i ∈ Nk, i.e., pk(Ik) ≤ α for any Ik ⊆ Nk if i ∈ Ik. Consider any
index set I ⊆ N that contains i. If this set includes any indices from N1 ∪...∪Nk−1,
p(I) is no greater than α since T rejects all hypotheses in F1,...,Fk−1. If this set
does not include any indices from N1 ∪ ... ∪ Nk−1, the p-value for HI is given by
p(I) = mI(pi1 (Ii1 ),...,pis (Iis )),
where I = Ii1 ∪ ... ∪ Iis , Iir ⊆ Nir , r = 1,...,s, i1 = k and s = 1,...,m − k + 1.
As in Proposition 2, recall that mI(xi1 ,...,xis ) ≤ α if xi1 ≤ α and xi1 = pk(Ik) ≤ α.
Thus p(I) ≤ α, which implies that T rejects Hi.
On the other hand, if T rejects Hi, i ∈ Nk, the arguments used in the proof of
Proposition 2 can be applied to show that Tk rejects Hi. The proof is complete.
Proof of Proposition 4. The first statement follows from Proposition 2. Consider
the second statement and assume that T1 rejects Hk, k ∈ R1, at the α level and T2
rejects Hj, j ∈ N2, at the level α2 = α − e1(A1). Here R1 ⊆ N1 is the index set of
hypotheses rejected in F1. Considering any I ⊆ N with j ∈ I, let I1 = I ∩ N1 and
I2 = I ∩ N2. If I1 ∩ R1 = ∅, then p1(I1) ≤ α and thus
p(I) = min
(
p1(I1),
p2(I2)
1 − f1(I1)
)
≤ p1(I1) ≤ α.
Further, if I1 ∩ R1 = ∅, then I1 ⊆ A1 and, by the monotonicity of the error rate
function, f1(I1) ≤ f1(A1). Since T2 rejects Hj at the level α2 = α − e1(A1),
p2(I2) ≤ α − e1(A1) = α(1 − f1(A1)) ≤ α(1 − f1(I1))
and
p(I) = min
(
p1(I1),
p2(I2)
1 − f1(I1)
)
≤
p2(I2)
1 − f1(I1)
≤ α.
This means that T rejects Hj at the α level.
Assume now that T rejects Hk, k ∈ R1, and Hj, j ∈ N2, at the α level. Consider
any I2 ⊆ N2 such that j ∈ I2 and let I = I1 ∪ I2, where I1 = A1. Recall that T is
equivalent to T1 in F1 and thus T1 also rejects Hk, k ∈ R1. Since T1 is consonant, we
conclude that p1(I1) > α. On the other hand, T rejects Hj and thus
p(I) = min
(
p1(I1),
p2(I2)
1 − f1(I1)
)
≤ α.
This implies that
p2(I2) ≤ α(1 − f1(I1)) = α − e1(I1)
if I2 ⊆ N2 and j ∈ I2. Therefore, T2 rejects Hj, j ∈ N2, at the level α2 = α − e1(A1).
The proof is complete.
Proof of Proposition 5. Note first that the p-values for intersection hypotheses in
H1 and H2 are given by
p1(I1) = 2min
i∈I1
pi, I1 ⊆ N1,
p2(I2) = |I2|min
i∈I2
pi, I2 ⊆ N2,
where pi is the raw p-value for testing Hi, i ∈ N. Also, the error rate function for
the Bonferroni procedure is e1(I1) = |I1|α/2, where |I1| is the cardinality of the set
I1 (Dmitrienko, Tamhane and Wiens, 2008). Therefore, the p-values for intersection
hypotheses in H are given by
Case 1. If I1 = {1,2},
p(I) = 2 min
i∈I1
pi.
Case 2. If I1 = {1},
p(I) = 2 min
(
p1,|I2|min
i∈I2
pi
)
.
Case 3. If I1 = {2},
p(I) = 2 min(p2,pn).
Case 4. If I1 = ∅,
p(I) = |I2|min
i∈I2
pi.
Recall now that H1 is rejected and H2 is accepted. This means that all intersection
hypotheses in H that include H1 are rejected. Therefore, to determine the conditions
under which the mixture procedure rejects Hn, it is sufficient to concentrate on the
intersection hypotheses that include Hn but exclude H1. It follows from Cases 3 and
4 that Hn is rejected iff pn ≤ α/2 (note that p2 > α/2 since H2 is accepted) and the
k smallest p-values are significant at the Holm-adjusted significance levels in F2, i.e.,
q(i) ≤ α/(n − i − 1), i = 1,...,k. The proof is complete.