Department of Industrial Engineering and Management Sciences

Page 1
Department of Industrial Engineering and Management Sciences
Northwestern University, Evanston, Illinois 60208-3119, U.S.A.
Working Paper No. 08-04
Mixtures of Multiple Testing Procedures with Gatekeeping Applications
Alex Dmitrienko
Eli Lilly and Company
Ajit C. Tamhane
Northwestern University
Lingyun Liu
Northwestern University
December 2008

Page 2
Abstract
This paper introduces a general framework for constructing gatekeeping procedures
for multiple testing problems arising in clinical trials with hierarchical objectives.
These problems frequently exhibit a complex structure, including multiple families of
hypotheses and logical restrictions. The framework is based on combining multiple
tests across families and enables clinical trial sponsors to set up powerful and flexible
multiple testing procedures (e.g., gatekeeping procedures based on Dunnett tests
that account for logical restrictions among the hypotheses of interest). A clinical trial
example is used to illustrate the general approach.

Page 3
Keywords and Phrases: Multiple comparisons; Closure principle; Gatekeeping proce-
dures; Bonferroni test; Dunnett test.

Page 4
1. Introduction
Gatekeeping procedures are commonly used in multiple testing problems with a
hierarchical structure, including problems arising in clinical trials with multiple ob-
jectives. These objectives may represent primary endpoints, secondary endpoints and
subgroup analyses, etc. To account for the hierarchical structure of these objectives,
null hypotheses associated with the objectives are grouped into families. Consider,
for example, a multiple testing problem involving null hypotheses H1,...,Hthat
are grouped into families:
F{Hi, i ∈ Nk}, k = 1,...,m, m ≥ 2,
where N{1,...,n1}N{n... nk−+ 1,...,n... nk}= 2,...,m,
and n... nn.
Dmitrienko, Tamhane and Wiens (2008) introduced a framework for constructing
multistage parallel gatekeeping procedures. A parallel gatekeeping procedure tests
hypotheses in Family Fk= 2,...,m, only if one or more hypotheses are rejected in
Fk−1. Dmitrienko, Tamhane and Wiens proposed a general algorithm for setting up
parallel gatekeeping procedures with an attractive stepwise form based on tests from
a broad class of multiple testing procedures (known as separable procedures).
One of the limitations of the framework proposed by these authors is that it
cannot be used in problems with logical restrictions, i.e., when the acceptance or
rejection of hypotheses in Fk= 2,...,m, depends on the outcomes of signifi-
cance tests in F1,...,Fk−1. Multiple testing problems with logical restrictions are
frequently encountered in clinical trials. Examples are given in Chen, Luo and
Capizzi (2005), Quan, Luo and Capizzi (2005), Dmitrienko, Offen, Wang and Xiao
(2006), Dmitrienko, Wiens, Tamhane and Wang (2007), Dmitrienko, Tamhane, Liu
and Wiens (2008).
This paper describes a framework that enables clinical trial sponsors to set up
flexible multiple testing procedures for problems with a very general class of logical
restrictions (monotone logical restrictions). The framework is based on combining
multiple tests across families using the concept of a mixture of multiple testing pro-
cedures. This term is used here to make an analogy with mixtures of distributions
(Everitt and Hand, 1981). To specify a mixture distribution, one needs to specify
component distributions and a mixing distribution. Similarly, in the case of mix-
ture procedures, one needs to select component procedures and a mixing function.
The mixing function is selected to take into account the logical relationships among
multiple families and provide strong control of the familywise error rate (FWER)
(Hochberg and Tamhane, 1987). The mixture-based framework uses the closure prin-
ciple (Marcus, Peritz and Gabriel, 1976) to achieve FWER control.

Page 5
The paper is organized as follows. Section 2 introduces the mixture-based frame-
work for an arbitrary number of families. Section 3 defines a class of monotone logical
restrictions. Section 4 describes mixing functions that can be used to construct mix-
tures of multiple testing procedures. Properties of mixture procedures are described
in Section 5. Lastly, Section 6 gives examples of mixture procedures (including mix-
tures of Bonferroni and Dunnett procedures) and a clinical trial example to illustrate
the mixture-based framework.
2. Mixture procedures
Consider the multiple testing problem defined in the Introduction. Let Hdenote
the closed family associated with Fk, i.e.,
H{HI, I⊆ Nk}, where HIi∈IHi.
Further, consider multiple testing procedures, known as component procedures, T1,...,Tm.
The procedure Tk= 1,...,m, is assumed to be a closed testing procedure that
controls the FWER in the strong sense within Fk. This means that there exists a
set of α-level tests for each intersection hypothesis in Hsuch that Trejects Hi,
i ∈ Nk, if and only if (iff) all intersection hypotheses including Hare rejected by the
intersection hypothesis tests. For example, if Tis the Holm procedure, each inter-
section hypothesis is tested using the Bonferroni test at α. Let pk(Ik), I⊆ Nk, be
the p-value for the intersection hypothesis test associated with HI. The intersection
hypothesis HIis rejected iff pk(Ik≤ α.
A mixture of the component procedures, denoted by T, is a procedure for testing
all hypotheses in F∪ ... ∪ Fm. Let {1,...,n} and let denote the closed
family associated with F, i.e., {HI,I ⊆ N}. For each index set I ⊆ N, let
II ∩ Nk= 1,...,m. To define a mixture procedure, one needs to define α-
level tests for all intersection hypotheses in H. Consider any non-empty intersection
hypothesis HII ⊆ N. The test for this intersection hypothesis is defined as follows:
Case 1. Hcontains hypotheses only from Fk= 1,...,m, i.e., Ik. The p-value
for His given by p(I) = pk(Ik).
Case 2. Hcontains hypotheses from Fi,...,Fifor s ≥ 2, i.e., Ii∪ ... ∪ Ii.
The p-value for His given by
p(I) = mI(pi(Ii),...,pi(Ii)),
where mI(xi,...,xi) is a mixing function.
Mixing functions have the following properties:

Page 6
• ≤ mI(xi,...,xi≤ 1, 0 ≤ xi≤ 1, = 1,...,s.
• mI(xi,...,xi≤ α if xi≤ α.
• The test for His an α-level test, i.e., P(p(I≤ α≤ α.
Examples of mixing functions are given in Section 4.
Given the p-values for each intersection in the closed family, the p-value for a
hypothesis in is computed using the closure principle. For the hypothesis Hi,
i ∈ N, the adjusted p-value is defined as the maximum over the p-values for the
intersections containing this hypothesis, i.e.,
˜p= max
Ii∈I
p(I).
Since the mixture procedure is constructed using α-level tests for all intersection
hypotheses in H, the procedure controls the FWER in the strong sense at an α level.
3. Logical restrictions
Mixtures of multiple testing procedures are constructed to account for logical
relationships among the hypotheses in F1,...,Fm. Dmitrienko, Wiens, Tamhane and
Wang (2007) and Dmitrienko, Tamhane, Liu and Wiens (2008) proposed to formulate
logical relationships in terms of serial and parallel gatekeeping sets. In this case a
hypothesis in Fk+1= 1,...,m−1, is tested iff all hypotheses are rejected in a certain
subset of F1,...,F(known as the serial gatekeeping set) and at least one hypothesis
is rejected in another subset of F1,...,F(known as the parallel gatekeeping set).
A more general family of monotone logical restrictions is introduced below. The
restrictions are defined using restriction functions. Consider a hypothesis in Fs+1,
= 1,...,m−1, say, Hii ∈ Ns+1. The restriction function Li(I), I ⊆ N∪...∪Ns,
assumes two values, Li(I) = 0 or 1. Here Li(I) = 0 means that His not testable, i.e.,
it is accepted without test if the hypotheses Hjj ∈ I, are accepted and Li(I)=1
means that His testable. The function Li(I) meets the following conditions:
• Monotonicity condition: If Li() = 0 and I ⊆ I then Li()=0.
• Parallel gatekeeping condition: If N⊆ I then Li(I) = 0 for all i ∈ Nfor
+ 1,...,m.
Note that, by the monotonicity condition, if a hypothesis in Fs+1 is not testable
given a set of accepted hypotheses in F1,...,Fs, it will remain non-testable if more
hypotheses are accepted in F1,...,Fs. Further, it follows from the parallel gatekeeping

Page 7
condition that all hypotheses are non-testable (and are automatically accepted) in
Fs+1 if all hypotheses are accepted in Fk= 1,...,s.
In order to account for logical restrictions, the definition of a mixture of two
multiple testing procedures needs to be modified as follows:
Case 1. Hcontains hypotheses only from Fk= 1,...,m, i.e., Ik. The p-value
for His given by p(I) = pk(Ik).
Case 2. Hcontains hypotheses from Fi,...,Fifor s ≥ 2, i.e., Ii∪ ... ∪ Ii.
For any = 2,...,s, let I
ik
be the subset of Iik
that includes the indices of
hypotheses that are logically consistent, i.e., testable, with the hypotheses from
Fi,...,Fik−. In other words,
I
ik
{i i ∈ Iiand Li(Ii∪ ... ∪ Iik−)=1}.
Assume first that I
is
is not empty. In this case the p-value for His given by
p(I) = mI(pi(Ii),pi(I
i2
),...,pi(I
is
)),
where pi(I
ik
)=1if I
ik
is empty, = 2,...,s − 1. Further, if I
ir+1
,...,I
is
are
empty for some = 1,...,s − 1 then
p(I) = m(pi(Ii),pi(I
i2
),...,pi(I
ir
)),
where Ii∪ ... ∪ Iiand pi(I
ik
)=1if I
ik
is empty.
4. Mixing functions
This section defines mixing functions based on the Bonferroni and Dunnett global
tests. Both these mixing functions satisfy the properties listed in Section 2 and have
the same general form:
mI(xi,...,xi) = min
(
xi1
ci1
,...,
xis
cis
)
,
where Ii∪ ... ∪ Iias before and ci,...,ciis a non-increasing sequence of co-
efficients with 1 = ci≥ ... ≥ ci≥ 0. This sequence is non-increasing to account for
the hierarchical structure of the problem, i.e., families placed earlier in the sequence
are more important (and receive greater weights) than those later in the sequence.
The Bonferroni and Dunnett mixing functions differ in terms of the choice of these
coefficients. For the Bonferroni mixing function, the coefficients are denoted by b’s
and for the Dunnett mixing function by d’s.

Page 8
4.1 Bonferroni mixing function
To define this function, consider the error rate function of the procedure Tk=
1,...,m − 1, introduced in Dmitrienko, Tamhane and Wiens (2008). Since an exact
expression for the error rate function is, in general, difficult to derive, we will focus
on an upper bound, ek(Ik), for the true error rate function, i.e.,
P(pk(Ik≤ α≤ ek(Ik)
for fixed α. As in Dmitrienko, Tamhane and Wiens (2008), we will treat ek(Ik) as
the actual error rate function. Error rate functions have the following properties:
ek()=0, ek(≤ ek() if I ⊆ I , ek(Nk) = α.
Also, let fk(Ik) = ek(Ik).
Assume that T1,...,Tm−are separable, i.e., fk(Ik1 for all α if Iis a proper
subset of Nk= 1,...,m − 1. The Bonferroni mixing function is given by
mI(xi,...,xi) = min
(
xi1
bi1
,...,
xis
bis
)
,
where bi= 1 and bibik−(1 − fik−(Iik−)), = 2,...,s. It is clear that
≤ mI(xi,...,xi≤ 1 if 0 ≤ xi≤ 1
and
mI(xi,...,xi≤ α if xi≤ α.
Since T1,...,Tm−are separable, bi0 if Iir−is a proper subset of Nir−for all
= 2,...,k. On the other hand, bi... bi= 0 if Iik−Nik−and thus
mI(xi,...,xi) = min
(
xi1
bi1
,...,
xik−1
bik−1
)
.
It is easy to verify that the resulting test for His an α-level test. By the Bonferroni
inequality,
P(p(I≤ α
s
k=1
P(pi(Ii≤ αbi)
s−1
k=1
αbifi(Ii) + αbis

Page 9
since P(pi(Ii≤ x≤ xfi(Ii), = 1,...,s − 1, and P(pi(Ii≤ x≤ x. Further,
it is easy to see that bis−fis−(Iis−) + bibis−since bibis−(1 − fis−(Iis−)).
Doing this recursively, we have
s−1
k=1
bifi(Ii) + bibi= 1
and thus P(p(I≤ α≤ α.
4.2 Dunnett mixing function
The Bonferroni mixing function defined above is based on the Bonferroni inequal-
ity and thus does not account for the correlation among pi(Ii),...,pi(Ii). By
contrast, the Dunnett mixing function explicitly utilizes the joint distribution of the
p-values.
Assume again that Tis separable, = 1,...,m−1. The Dunnett mixing function
is given by
mI(xi,...,xi) = min
(
xi1
di1
,...,
xis
dis
)
,
where di= 1 and di= 2,...,s, are defined sequentially as follows:
P(pi(Ii≤ αdior pi(Ni≤ αdi) = α,
P(pi(Ii≤ αdior pi(Ii≤ αdior pi(Ni≤ αdi) = α,
...
P(pi(Ii≤ αdior pi(Ii≤ αdior ... or
pis−(Iis−≤ αdis−or pis−(Nis−≤ αdis−) = α,
P(pi(Ii≤ αdior pi(Ii≤ αdior ... or
pis−(Iis−≤ αdis−or pi(Ii≤ αdi) = α.
It follows from the equations that dik
0 if Iir−is a proper subset of Nir−=
2,...,k, and di... di= 0 if Iik−Nik−.
As in Section 4.1, it is easy to see that 0 ≤ mI(xi,...,xi≤ 1 if 0 ≤ xi≤ 1 and
mI(xi,...,xi≤ α if xi≤ α. Further, by the definition of di= 1,...,s,
P(p(I≤ α) = P(pi(Ii≤ αdior ... or pi(Ii≤ αdi)
α
and thus the resulting test for His an α-level test.
Since the Dunnett mixing function takes into account the joint distribution of test
statistics, mixture procedures based on this function are more powerful than those
based on the Bonferroni mixing function.

Page 10
5. Properties of mixture procedures
This section summarizes key properties of mixture procedures.
5.1 General properties
We will begin with a discussion of general properties, including consistency with
logical restrictions and independence (inferences in Fare independent of inferences
in F2).
Proposition 1 Assume that T is consonant in F1,...,Fk, k = 1,...,m − 1, then
the mixture procedure T is consistent with the logical restrictions in Fk+1. In other
words, T accepts Hi, i ∈ Nk+1, at the α level if Li(A∪ ... ∪ Ak)=0, where Ais
the index set of accepted hypotheses in Fr, r = 1,...,k.
Note that, if is not consonant in F1,...,Fk, the logical restrictions may be
violated in Fk+1 in the sense that Hii ∈ Nk+1, may be rejected even though Li(A
... ∪ Ak) = 0. However, the logical restrictions can always be enforced by modifying
multiplicity-adjusted p-values in Fk+1. This can be done using an algorithm similar
to that proposed in Kordzakhia et al. (2008).
Proposition 2 The mixture procedure T is equivalent to the procedure Twithin the
first family. In other words, T rejects a hypothesis in Fat the α level iff Trejects
this hypothesis at the α level.
Proposition 3 The mixture procedure T is equivalent to the procedure Tk, k =
2,...,m, within Fif T rejects all hypotheses in F1,...,Fk−1. In other words, T
rejects a hypothesis in Fat the α level iff Trejects this hypothesis at the α level
provided all hypotheses in F1,...,Fk−are rejected by T.
The proofs of Propositions 1, 2 and 3 are given in the Appendix.
5.2 Stepwise mixture procedures with parallel gatekeeping restrictions
When parallel gatekeeping restrictions are considered, mixture procedures based
on the Bonferroni mixing function admit a stepwise representation. This means that
the mixture procedure is, in fact, identical to a stepwise application of the com-
ponent procedures with an adjustment of the significance level in the last m − 1
families. This result is equivalent to the main result in Dmitrienko, Tamhane and
Wiens (2008) and shows that the mixture framework is an extension of the framework
of multistage gatekeeping procedures introduced in that paper. In particular, mul-
tistage gatekeeping procedures considered by Dmitrienko, Tamhane and Wiens are

Page 11
mixtures of component procedures used at individual stages based on the Bonferroni
mixing function.
To demonstrate that mixture procedures based on the Bonferroni mixing func-
tion are equivalent to multistage gatekeeping procedures proposed by Dmitrienko,
Tamhane and Wiens, we will consider a two-family problem. The proof can be ex-
tended to the general case of families by recursion.
Proposition 4 Assume that
• Only parallel gatekeeping restrictions are imposed, i.e., Li(N1)=0, i ∈ N2, and
Li(I1)=1, i ∈ N2, I⊂ N1.
• The procedure Tis separable and consonant.
• The Bonferroni mixing function is used.
The mixture procedure T has the following two-stage structure:
• The hypotheses in Fare tested at the familywise level αα using T1.
• The hypotheses in Fare tested at the level αα−e1(A1using T2, where e1(I)
is the error rate function of Tand Ais the index set of accepted hypotheses
in F1.
The proof of Proposition 4 is given in the Appendix.
5.3 Mixture procedures with general logical restrictions
As shown in Proposition 4, mixture procedures in problems with parallel gate-
keeping restrictions have an attractive stepwise form. The following counterexample
shows that mixtures of testing procedures with general logical restrictions may not
have a stepwise form.
Consider a two-family problem with N{1,2and N{3,...,n}. Assume
that the hypotheses within each family are equally weighted. Further, consider a
mixture of the Bonferroni procedure in Fand Holm procedure in Fbased on the
Bonferroni mixing function. The following logical restrictions are assumed:
• H3,...,Hn−are testable iff His rejected.
• His testable iff at least one hypothesis in Fis rejected.
In other words,

Page 12
• If I∅ or I{1}, then L3(I1) = L4(I1) = ... Ln(I1) = 1.
• If I{2}, then L3(I1) = L4(I1) = ... Ln−1(I1) = 0 and Ln(I1) = 1.
• If I{1,2}, then L3(I1) = L4(I1) = ... Ln(I1) = 0.
To demonstrate that the mixture of the two procedures does not have a stepwise
form, it is sufficient to focus on the case when T(Bonferroni procedure) rejects H1
but accepts H2. By the logical restrictions, only one hypothesis is testable in Fin
this case (namely, the hypothesis Hn). If the mixture procedure had a stepwise form,
this hypothesis would have been tested by T(Holm procedure), i.e., its decision
rule would have been expressed in terms of pcompared to an appropriately chosen
significance level. However, as shown in Proposition 5, this is not the case.
Proposition 5 Let q(1) ≤ ... ≤ q(n−2) denote the ordered p-values in Fand assume
that pis the kth ordered p-value, i.e., pq(k), k = 1,...,n−2. Then the hypothesis
His rejected iff all of the following conditions are met
p≤ α/2, q(i≤ α/(n − i − 1), i = 1,...,k.
The proof of Proposition 5 is given in the Appendix.
6. Examples
In this section we will give examples of mixture procedures that help illustrate
the general method introduced in Section 2.
6.1 Mixtures of Bonferroni procedures
Consider a problem of testing hypotheses and let w1,...,wdenote the weights
assigned to the hypotheses in the families. The weights are non-negative and sum
to 1 within each family, i.e.,
w≥ 0, i = 1,...,n,
i∈Nk
w= 1, k = 1,...,m.
The hypotheses are grouped into families. Assume that the first m−1 families
are tested using a weighted version of the Bonferroni procedure and the last family
is tested using a weighted version of the Holm procedure. In other words,
pk(Ik) = min
i∈Ik
(pi/wi) if I⊆ Nk, k = 1,...,m − 1,
pm(Im) =
k∈Im
wk
 min
i∈Im
(pi/wi) if I⊆ Nm.

Page 13
We will assume first that parallel gatekeeping restrictions are imposed, i.e.,
Li(Nk−1) = 0, i ∈ Nk, k = 2,...,m,
Li(Ik−1) = 1, i ∈ Nk, Ik−⊂ Nk−1, k = 2,...,m.
Noting that the error rate function for the weighted Bonferroni procedure is given by
ek(Ik) = α
i∈Ik
wi, k = 1,...,m − 1,
it can be shown that the mixture of the procedures based on the Bonferroni mixing
function is defined as follows. Let HII ⊆ N, be a non-empty intersection hypothesis.
If I ⊆ Nk= 1,...,m, then p(I) = pk(Ik), where II ∩ Nk. If Hcontains
hypotheses from Fi,...,Fifor s ≥ 2, the p-value for His given by
p(I) = min
i∈I
pi
vi(I)
,
where
vi(I) = v
k(I)wi, i ∈ Ii, k = 1,...,s − 1,
vi(I) = v
(I)wi, i ∈ Iiand im,
vi(I) = v
(I)wi/
k∈Iis
wk, i ∈ Iiand im,
v
1(I) = 1, v
k+1(I) = v
k(I)
i∈Iik
wi
 , k = 1,...,s − 1.
The resulting procedure is equivalent to the Bonferroni-based parallel gatekeeping
procedure (Dmitrienko, Offen and Westfall, 2003).
Further, we will consider the general case of monotone logical restrictions. The
mixture procedure based on the Bonferroni mixing function has a structure similar
to that of the parallel gatekeeping procedure. First p(I) = pk(Ik) if I ⊆ Nk=
1,...,m, where II ∩ Nk. Further, if Hcontains hypotheses from Fi,...,Fifor
s ≥ 2, then the p-value for His given by
p(I) = min
i∈I
pi
vi(I)
,
where I∗ Ii∪ I
i2
∪ ... ∪ I
is
and
vi(I) = v
k(I)wi, i ∈ I
ik
, k = 1,...,s − 1,
vi(I) = v
(I)wi, i ∈ I
is
and im,
vi(I) = v
(I)wi/
k∈Iis
wk, i ∈ I
is
and im,
v
1(I) = 1, v
k+1(I) = v
k(I)
i∈Iik
wi
 , k = 1,...,s − 1.

Page 14
Note that the presence of logical restrictions has an impact only on the index sets used
in the decision rule in the sense that a hypothesis is removed from the decision rule
if is not consistent with the logical restrictions. The process of combining component
procedures is not affected by logical restrictions and therefore v
1(I),...,v
(I) remain
the same. This mixture procedure is equivalent to the tree gatekeeping procedure
based on Algorithm III (Kordzakhia et al., 2008).
It is also important to note that the weighting scheme used in this mixture pro-
cedure satisfies the monotonicity condition (Condition 3) formulated in Dmitrienko,
Tamhane, Liu and Wiens (2008). Weighting schemes proposed in other papers, in-
cluding Algorithm 2 in Dmitrienko, Tamhane, Liu and Wiens (2008), do not always
satisfy the monotonicity condition and gatekeeping procedures based on those schemes
can be inconsistent with the prespecified logical restrictions. In this case, the logical
restrictions need to be enforced as explained in Section 5.1.
6.2 Mixtures of Dunnett procedures
The algorithm given in Section 6.1 can be easily extended to construct more power-
ful mixture procedures, e.g., mixtures of Dunnett procedures based on the Bonferroni
mixing function. Considering a general problem of testing hypotheses grouped into
families, let tii ∈ N, denote the test statistic associated with Hand assume that
tii ∈ Nk, follow a multivariate distribution for any = 1,...,m. Suppose that the
hypotheses in Fk= 1,...,m, are tested using the Dunnett procedure. In this case,
the p-value for the intersection hypothesis HII⊆ Nk= 1,...,m, is given by
pk(Ik)=1 − G|Ik|
(
max
i∈Ik
ti
)
,
where Gn(x) is the cumulative distribution function of the n-variate one-sided Dun-
nett distribution, i.e.,
G|Ik|(x) = P
(
max
i∈Ik
t
≤ x
)
,
and t
i ∈ Nk, have the same joint distribution as tii ∈ Nk, under the global null
hypothesis. A mixture of the Dunnett procedures based on the Bonferroni mixing
function can now be defined using the steps described in Section 6.1.
6.3 Clinical trial example
The mixture procedures introduced in Sections 6.1 and 6.2 will be illustrated here
using a clinical trial example from Dmitrienko, Offen, Wang and Xiao (2006) and
Dmitrienko, Wiens, Tamhane and Wang (2007, Section 6). Consider a clinical trial
in patients with Type II diabetes conducted to test three doses of an experimental
treatment versus placebo. The three doses are labeled L, M and H and the placebo is

Page 15
Table 1. Test statistics and raw p-values in the Type II diabetes clinical
trial example.
Family
Null
Test
P-value
hypothesis statistic
F1
H1
2.81
0.005
H2
2.56
0.011
H3
2.39
0.018
F2
H4
2.61
0.009
H5
2.24
0.026
H6
2.50
0.013
F3
H7
2.60
0.010
H8
2.78
0.006
H9
1.96
0.051
labeled Plac. The dose-placebo comparisons are performed with respect to three or-
dered endpoints, Endpoint P (Hemoglobin A1c), Endpoint S1 (Fasting serum glucose)
and Endpoint S2 (HDL cholesterol). The sample size per arm is 87 patients.
The resulting nine hypotheses of no treatment effect (three dose-placebo compar-
isons times three endpoints) are grouped into three families:
• Family F1: H-Plac (H1), M-Plac (H2) and L-Plac (H3) comparisons for End-
point P.
• Family F2: H-Plac (H4), M-Plac (H5) and L-Plac (H6) comparisons for End-
point S1.
• Family F3: H-Plac (H7), M-Plac (H8) and L-Plac (H9) comparisons for End-
point S2.
The three doses are assumed to be equally important and thus the hypotheses are
equally weighted within each family, i.e., w= 1/3, = 1,...,9. The two-sample t
statistics and associated p-values for the nine hypotheses are listed in Table 1.
The null hypotheses in this clinical trial example will be tested using three multiple
testing procedures:
• Procedure 1 (Mixture of Bonferroni and Holm procedures with parallel gate-
keeping restrictions). The hypotheses in Fand Fare tested using the Bonfer-
roni procedure and the hypotheses in Fare tested using the Holm procedure.
The mixture procedure is based on the Bonferroni mixing function with the
parallel gatekeeping restrictions defined in Section 6.1.

Page 16
• Procedure 2 (Mixture of Bonferroni and Holm procedures with multiple-sequence
restrictions). This procedure is similar to Procedure 1 in the sense that it is also
a mixture of the Bonferroni procedures in Fand Fand Holm procedure in
Fbased on the Bonferroni mixing function. However, unlike Procedure 1, this
procedure uses a more general type of logical restrictions known as multiple-
sequence restrictions. A hypothesis in Fk= 2,3, is tested if higher-level
hypotheses associated with the same dose are rejected, e.g., His testable iff
Hand Hare rejected. More formally,
– Li(I1)=0if Icontains i − 3 and Li(I1) = 1 otherwise, = 45,6.
– Li(I∪I2)=0if I∪Icontains i−3 or i−6 and Li(I∪I2) = 1 otherwise,
= 78,9.
• Procedure 3 (Mixture of Dunnett procedures with multiple-sequence restric-
tions). This procedure is a mixture of the Dunnett procedures in F1Fand
Fbased on the Bonferroni mixing function and imposes multiple-sequence re-
strictions defined above.
Beginning with Procedures 1 and 2, adjusted p-values can be computed using the
algorithm given in Section 6.1. This algorithm is based on a complete enumeration
of all non-empty intersections of the original nine hypotheses. A p-value is computed
for each intersection and then the p-values for the original hypotheses are found using
the closure principle (see Section 2 for more details). As an illustration, consider the
intersection hypothesis corresponding to the index set {1,3,56,7,8,9}, i.e.,
HH∩ H∩ H∩ H∩ H∩ H∩ H9.
Assuming parallel gatekeeping restrictions (Procedure 1), one first needs to define
p-values for HIHIand HI, where I{1,3}I{56and I{7,8,9}.
Using the raw p-values displayed in Table 1, the p-values are computed based on
the Bonferroni and Holm procedures as shown below
p1(I1) = nmin(p1,p2)=0.015,
p2(I2) = nmin(p5,p6)=0.039,
p3(I3) = |I3|min(p7,p8,p9)=0.018,
where n= 3, n= 3 and |I3= 3. Using the Bonferroni mixing function, the p-value
for His given by
p(I) = min
(
p1(I1)
b1
,
p2(I2)
b2
,
p3(I3)
b3
)
,

Page 17
Table 2. Mixtures of three procedures with parallel gatekeeping restric-
tions (Procedure 1) and multiple-sequence restrictions (Procedure 2) in
the Type II diabetes clinical trial example. The asterisk identifies the
adjusted p-values that are significant at the 0.05 level.
Family
Null
Adjusted p-value
hypothesis Procedure 1 Procedure 2
F1
H1
0.015
0.015
H2
0.033
0.033
H3
0.054
0.054
F2
H4
0.041
0.041
H5
0.078
0.078
H6
0.054
0.054
F3
H7
0.054
0.045
H8
0.054
0.078
H9
0.077
0.077
where b= 1 and, to compute bk= 2,3, one needs to utilize the error rate
function of the Bonferroni procedure. As shown in Section 6.1, ek(Ik) = α|Ik|/nor,
equivalently, fk(Ik) = |Ik|/nk= 12, and thus
b2
b1(1 − f1(I1)) = 1 
|I1|
n1
=
1
3
,
b3
b2(1 − f2(I2)) =
1
3
(
|I2|
n2
)
=
1
9
.
This immediately implies that p(I)=0.015.
Now consider the case of multiple-sequence restrictions (Procedure 2). The index
sets Iand Ineed to be modified to account for the logical restrictions. Note that
Hdepends on H3Hdepends on H1Hdepends on H5Hdepends on Hand
H6. Thus the modified index sets are given by I
{5and I
. The next step
is to compute the p-values for HIand HI
2
,
p1(I1) = nmin(p1,p2)=0.015,
p2(I
) = n2p= 0.078.
Lastly, the p-value for His given by
p(I) = min
(
p1(I1)
b1
,
p2(I
)
b2
)
,

Page 18
where bk= 1,2, are defined above, i.e., b= 1 and b= 1/3, and therefore
p(I)=0.015.
Table 2 displays the raw p-values for the nine hypotheses of interest along with
the adjusted p-values produced by the two procedures. Procedure 1 rejects three
hypotheses in this problem (H1Hand H4) and Procedure 2 one more hypothesis
(H7). It is easy to verify that, as shown in Proposition 1, both procedures are con-
sistent with the logical restrictions (note that the Bonferroni procedures in Fand
Fare consonant and thus there is no need to enforce the logical restrictions). Fur-
ther, as stated in Proposition 2, Procedures 1 and 2 are equivalent to the Bonferroni
procedure in F1. Indeed, the adjusted p-values for the hypotheses in Fare equal to
Bonferroni-adjusted p-values (each raw p-value is multiplied by 3).
Further, it is worth noting that Procedure 1 is based on parallel gatekeeping
restrictions and thus, as shown in Proposition 4, it has a stepwise representation.
This procedure is identical to a stepwise application of the Bonferroni procedures in
Fand Fand Holm procedure in Fwith appropriate adjustments of the significance
levels in Fand F3. For more information, see Dmitrienko, Tamhane and Wiens (2008,
Section 6).
The calculation of adjusted p-values for Procedure 3 is based on an algorithm
similar to the one used in Section 6.1. The only change that needs to be made is that
the Bonferroni and Holm p-values for intersection hypotheses need to be replaced by
the Dunnett p-values defined in Section 6.2. To illustrate the process, select the same
intersection as above, i.e.,
HH∩ H∩ H∩ H∩ H∩ H∩ H9.
and consider the multiple-sequence restrictions. The modified index sets are I
{5}
and I
. Given the sample size per arm (87 patients) and number of doses
(3 doses), the Dunnett p-values for HIand HI
2
are computed using the one-sided
Dunnett distribution with 3 and 344 degrees of freedom. These p-values are given by
p1(I1) = 1 − G2(max(t1,t3)) = 0.0073,
p2(I
) = 1 − G1(t5)=0.0336,
where F(x) is the cumulative distribution function of the Dunnett distribution. Fur-
ther, this mixture is also based on the Bonferroni mixing function and thus b= 1
and b= 1/3. Therefore,
p(I) = min
(
p1(I1)
b1
,
p2(I
)
b2
)
= 0.0073.
The adjusted p-values produced by Procedure 3 are shown in Table 3. One can
see from this table that the mixture of Dunnett procedures rejects more hypotheses

Page 19
Table 3. Mixture of three procedures with multiple-sequence restric-
tions (Procedure 3) in the Type II diabetes clinical trial example. The
asterisk identifies the adjusted p-values that are significant at the 0.05
level.
Family
Null
Adjusted
hypothesis
p-value
F1
H1
0.007
H2
0.015
H3
0.023
F2
H4
0.019
H5
0.034
H6
0.023
F3
H7
0.023
H8
0.034
H9
0.064
than a similar procedure based on the Bonferroni and Holm procedures (Procedure
2). Specifically, Procedure 3 rejects eight hypotheses whereas Procedure 2 rejects only
four hypotheses. This is a direct consequence of the fact that the Dunnett procedure
is uniformly more powerful than the Bonferroni procedure.
It is worth noting that the mixture of Dunnett procedures defined above can serve
as a computationally attractive alternative to the Dunnett-based parallel gatekeeping
procedure with logical restrictions introduced in Dmitrienko, Offen, Wang and Xiao
(2006). The parallel gatekeeping procedure requires the computation of a vector
of critical values for each intersection hypothesis in the closed family based on the
multivariate distribution of the associated test statistics. Even in the case of nine
hypotheses, the algorithm is computationally intensive (it involves the evaluation
of multivariate probabilities for up to six dimensions). By contrast, the mixture
procedure is based on regular Dunnett-adjusted p-values that are combined across
the three families. This approach considerably simplifies the calculation of adjusted
p-values and leads to a relatively small reduction in the overall power compared to
the Dunnett-based parallel gatekeeping procedure.
References
Chen, X., Luo, X., Capizzi, T. (2005). The application of enhanced parallel gate-
keeping strategies. Statistics in Medicine. 24, 1385–1397.
Dmitrienko, A., Offen, W.W., Westfall, P.H. (2003). Gatekeeping strategies for

Page 20
clinical trials that do not require all primary effects to be significant. Statistics
in Medicine. 22, 2387–2400.
Dmitrienko, A., Offen, W., Wang, O., Xiao, D. (2006). Gatekeeping procedures in
dose-response clinical trials based on the Dunnett test. Pharmaceutical Statis-
tics. 5, 19–28.
Dmitrienko, A., Wiens, B.L., Tamhane, A.C., Wang, X. (2007). Tree-structured
gatekeeping tests in clinical trials with hierarchically ordered multiple objec-
tives. Statistics in Medicine. 26, 2465–2478.
Dmitrienko, A., Tamhane, A., Liu, L., Wiens, B. (2008). A note on tree gatekeeping
procedures in clinical trials. Statistics in Medicine. 27, 3446–3451.
Dmitrienko, A., Tamhane, A., Wiens, B. (2008). General multistage gatekeeping
procedures. Biometrical Journal. 50, 667–677.
Everitt, B.S., Hand, D.J. (1981). Finite Mixture Distributions. Chapman and Hall,
London, New York.
Hochberg, Y., Tamhane, A.C. (1987). Multiple Comparison Procedures. New York:
John Wiley and Sons.
Kordzakhia, G., Dinh, P., Bai, S., Lawrence, J., Yang, P. (2008). Bonferroni-based
tree-structured gatekeeping testing procedures. Unpublished manuscript.
Marcus, R. Peritz, E., Gabriel, K.R. (1976). On closed testing procedures with
special reference to ordered analysis of variance. Biometrika. 63, 655–660.
Quan, H., Luo, X., Capizzi, T. (2005). Multiplicity adjustment for multiple end-
points in clinical trials with multiple doses of an active treatment. Statistics in
Medicine. 24, 2151–2170.

Page 21
Appendix
Proof of Proposition 1. Consider a hypothesis in Fk+1= 1,...,m − 1, say,
Hii ∈ Nk+1, and assume that Li(A∪ ... ∪ Ak) = 0. Let IAs= 1,...,k,
Ik+1 {i}. Further, let I∪ ... ∪ Iand I∪ ... ∪ Ik+1. Considering the
intersection hypothesis HI, note that Li(I∪ ... ∪ Ik) = 0 and thus
I
k+1 {i i ∈ Ik+1 and Li(I∪ ... ∪ Ik)=1}
is empty. Therefore,
p(I) = m(p1(I1),p2(I
),...,pk(I
)).
Note that the mixture procedure accepts all hypotheses Hjj ∈ J and is con-
sonant in F1,...,Fk. Therefore, p(J> α (if p(J) was less than or equal to α, then
would reject at least one hypothesis Hwith i ∈ J; however, all hypotheses Hj,
j ∈ J, are rejected, which implies that p(J> α). Further, p(I) = p(J> α and the
index set contains i. Thus, accepts Hi. The proof is complete.
Proof of Proposition 2. Assume first that Trejects Hii ∈ N1. This means that
p1(I1≤ α for any I⊆ Nif i ∈ I1. Now consider any index set I ⊆ N that contains
i. In general, Ii∪...∪Ii, where Ii⊆ Ni= 1,...,si= 1 and = 1,...,m,
and
p(I) = mI(pi(Ii),...,pi(Ii)).
By the definition of a mixture function, mI(xi,...,xi≤ α if xi≤ α and, since
xip1(I1≤ α, we conclude that p(I) is no greater than α and thus rejects Hi.
Now assume that rejects Hii ∈ N1. In this case, p(I≤ α for any I ⊆ N
that contains i, which immediately implies that p1(I1≤ α for any I⊆ Nif i ∈ I1.
Therefore, Trejects Hi. The proof is complete.
Proof of Proposition 3. Assume that all hypotheses in F1,...,Fk−are rejected by
and Trejects Hii ∈ Nk, i.e., pk(Ik≤ α for any I⊆ Nif i ∈ Ik. Consider any
index set I ⊆ N that contains i. If this set includes any indices from N∪...∪Nk−1,
p(I) is no greater than α since rejects all hypotheses in F1,...,Fk−1. If this set
does not include any indices from N∪ ... ∪ Nk−1, the p-value for His given by
p(I) = mI(pi(Ii),...,pi(Ii)),
where Ii∪ ... ∪ IiIi⊆ Ni= 1,...,siand = 1,...,m − k + 1.
As in Proposition 2, recall that mI(xi,...,xi≤ α if xi≤ α and xipk(Ik≤ α.
Thus p(I≤ α, which implies that rejects Hi.
On the other hand, if rejects Hii ∈ Nk, the arguments used in the proof of
Proposition 2 can be applied to show that Trejects Hi. The proof is complete.

Page 22
Proof of Proposition 4. The first statement follows from Proposition 2. Consider
the second statement and assume that Trejects Hkk ∈ R1, at the α level and T2
rejects Hjj ∈ N2, at the level αα − e1(A1). Here R⊆ Nis the index set of
hypotheses rejected in F1. Considering any I ⊆ N with j ∈ I, let II ∩ Nand
II ∩ N2. If I∩ R, then p1(I1≤ α and thus
p(I) = min
(
p1(I1),
p2(I2)
− f1(I1)
)
≤ p1(I1≤ α.
Further, if I∩ R, then I⊆ Aand, by the monotonicity of the error rate
function, f1(I1≤ f1(A1). Since Trejects Hat the level αα − e1(A1),
p2(I2≤ α − e1(A1) = α(1 − f1(A1)) ≤ α(1 − f1(I1))
and
p(I) = min
(
p1(I1),
p2(I2)
− f1(I1)
)
p2(I2)
− f1(I1)
≤ α.
This means that rejects Hat the α level.
Assume now that rejects Hkk ∈ R1, and Hjj ∈ N2, at the α level. Consider
any I⊆ Nsuch that j ∈ Iand let I∪ I2, where IA1. Recall that is
equivalent to Tin Fand thus Talso rejects Hkk ∈ R1. Since Tis consonant, we
conclude that p1(I1> α. On the other hand, rejects Hand thus
p(I) = min
(
p1(I1),
p2(I2)
− f1(I1)
)
≤ α.
This implies that
p2(I2≤ α(1 − f1(I1)) = α − e1(I1)
if I⊆ Nand j ∈ I2. Therefore, Trejects Hjj ∈ N2, at the level αα − e1(A1).
The proof is complete.
Proof of Proposition 5. Note first that the p-values for intersection hypotheses in
Hand Hare given by
p1(I1) = 2min
i∈I1
pi, I⊆ N1,
p2(I2) = |I2|min
i∈I2
pi, I⊆ N2,
where pis the raw p-value for testing Hii ∈ N. Also, the error rate function for
the Bonferroni procedure is e1(I1) = |I1|α/2, where |I1is the cardinality of the set
I(Dmitrienko, Tamhane and Wiens, 2008). Therefore, the p-values for intersection
hypotheses in are given by

Page 23
Case 1. If I{1,2},
p(I) = 2 min
i∈I1
pi.
Case 2. If I{1},
p(I) = 2 min
(
p1,|I2|min
i∈I2
pi
)
.
Case 3. If I{2},
p(I) = 2 min(p2,pn).
Case 4. If I,
p(I) = |I2|min
i∈I2
pi.
Recall now that His rejected and His accepted. This means that all intersection
hypotheses in that include Hare rejected. Therefore, to determine the conditions
under which the mixture procedure rejects Hn, it is sufficient to concentrate on the
intersection hypotheses that include Hbut exclude H1. It follows from Cases 3 and
4 that His rejected iff p≤ α/2 (note that p> α/2 since His accepted) and the
smallest p-values are significant at the Holm-adjusted significance levels in F2, i.e.,
q(i≤ α/(n − i − 1), = 1,...,k. The proof is complete.