**2.7 Bilan du chapitre 2**

**3.1.2 Article accepté dans la revue Statistical Methods in Medical Research . 88**

Cet article a été accepté par la revue Statistical Methods in Medical Research et présente de manière approfondie la fonction d’utilité déﬁnie précédemment, détaille sa justiﬁcation, pro-pose une méthode d’inférence associée au seuil optimal, et en évalue les performances. Les informations supplémentaires disponibles sur le site de la revue sont présentées en Annexe C.

### A Bayesian method to estimate

### the optimal threshold of a marker

### used to select patients’ treatment

Yoann Blangero,^{1,2} Muriel Rabilloud,^{1,2} Rene´ Ecochard^{1,2} and
Fabien Subtil^{1,2}

Abstract

The use of a quantitative treatment selection marker to choose between two treatment options requires the estimate of an optimal threshold above which one of these two treatments is preferred. Herein, the optimal threshold expression is based on the definition of a utility function which aims to quantify the expected utility of the population (e.g. life expectancy, quality of life) by taking into account both efficacy (success or failure) and toxicity of each treatment option. Therefore, the optimal threshold is the marker value that maximizes the expected utility of the population. A method modelling the marker distribution in patient subgroups defined by the received treatment and the outcome is proposed to calculate the parameters of the utility function so as to estimate the optimal threshold and its 95% credible interval using the Bayesian inference. The simulation study found that the method had low bias and coverage probability close to 95% in multiple settings, but also the need of large sample size to estimate the optimal threshold in some settings. The method is then applied to the PETACC-8 trial that compares the efficacy of chemotherapy with a combined chemotherapy þ anti-epidermal growth factor receptor in stage III colorectal cancer.

Keywords

Biomarker, threshold, expected utility, treatment selection, Bayesian, predictive marker

1 Introduction

Biomarkers help improve patient outcomes by targeting treating those who are likely to beneﬁt from a given treatment and avoiding treating those who would not have great interest in precision medicine (e.g. oncology,

cardiology). ‘‘Treatment selection’’ biomarkers, often called ‘‘predictive’’ biomarkers,^{1,2}are generally sought for

when the beneﬁt of a new treatment in comparison to a reference treatment is considered, and that this beneﬁt is suspected to vary according to the characteristics of the patients. One example of a binary treatment selection

biomarker is the absence of KRAS gene mutation that predicts the beneﬁt of a combined

chemotherapy þ epidermal growth factor receptor (EGFR) inhibitor over chemotherapy alone in metastatic colorectal cancer; patients with tumors harboring mutated KRAS exon 2 are known to be resistant to EGFR

inhibitors, whereas patients with KRAS wild-type tumors do beneﬁt from the combined treatment.^{3}

While it is easy to identify new binary treatment selection biomarkers, this is more diﬃcult for quantitative ones. Various methods have been proposed to assess quantitative treatment selection biomarkers by modelling the risk of

event given the treatment options and the biomarker values,^{4–7}or by modelling the beneﬁt to use one treatment

rather than the other one (i.e. the diﬀerence in outcomes) for a patient given its biomarker value.^{8,9}Classically,

modelling the risk is performed using logistic regression models or survival regression models (e.g. Cox models), by introducing an interaction between the biomarker and the treatments. Modelling the beneﬁt is a harder task. Indeed, the individual diﬀerence in outcomes is unobservable because each patient can receive only one treatment, that is

1

Service de Biostatistique-Bioinformatique, Poˆle Sante´ Publique, Hospices Civils de Lyon, Lyon, France

2

Universite´ de Lyon, Universite´ Lyon 1, CNRS, Laboratoire de Biome´trie et Biologie Evolutive UMR 5558, Villeurbanne, France Corresponding author:

Yoann Blangero, Service de Biostatistique-Bioinformatique, Poˆle Sante´ Publique, Hospices Civils de Lyon, 162 avenue Lacassagne, F-69003, Lyon, France.

Email: yoann.blangero@chu-lyon.fr

0(0) 1–15

! The Author(s) 2018 Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/0962280218821394 journals.sagepub.com/home/smm

why strong assumptions have to be made in order to estimate this individual beneﬁt. For example, Huang et al.^{8}
made the monotonicity assumption (one treatment is always at least as eﬃcient as the other one) to estimate the

individual beneﬁt. Zhang et al.^{9} relaxed the latter assumption by assuming that the potential outcomes are

independent given observed covariates which means that the beneﬁt for a patient can be calculated by comparing its outcome to the outcome of patients that are similar given the observed covariates, and that these observed covariates are suﬃcient to explain the dependence between the potential outcomes.

After identiﬁcation of a quantitative treatment selection biomarker, its use in clinical practice needs the

determination of a threshold to help clinicians decide which treatment to administrate to a given patient.^{10}

Some of the above-cited papers deﬁne a threshold for the diﬀerence in risks (between the two treatment

options) above or below which one of the two treatment options should be preferred.^{5–7} In order for this

threshold to be optimal (or clinically relevant), some of these methods were extended to take into account the

relative toxicities of both treatment options^{5,11}as well as the impact of treatment failure. The conversion of the

diﬀerence in risks threshold to a threshold on the marker scale is straightforward but relies on a good calibration

of the model used to predict the risk, or the diﬀerence in risks,^{12,13} and on modelling assumptions that may be

diﬃcult to check (e.g. the increase of the biomarker value must lead to a linear increase of the logit of the risk in a logistic regression). To the best of our knowledge, no method of inference has been proposed in the literature for estimating a threshold directly on the marker scale.

These remarks have motivated the development of a new method to estimate the optimal threshold of a treatment selection biomarker, based on the maximization of a utility function and on the modelling of the marker distribution rather than the risk of event. A Bayesian inference method was used in order to obtain a punctual estimation of the threshold and its credible interval. In our opinion, modelling the marker distribution has the advantage that multiple tools exist to check the adequacy of a theoretical distribution ﬁtted on the

observed marker distribution,^{14–16} contrary to risk models for which it may be diﬃcult to check and correct

calibration issues. Moreover, an extension of decision curves,^{17} initially proposed for the analysis of diagnostic

and prognostic markers, is presented and its connection with a previous approach^{18}is discussed.

A simulation study was conducted to evaluate the bias in the optimal threshold estimate, the coverage probability and the mean width of its credible interval. The method was then applied to estimate the threshold of the DDR2 gene expression level when this marker is used to choose between FOLFOX4 and FOLFOX4 þ cetuximab in stage III colon adenocarcinoma.

2 Method

The method described hereafter relies on the expression of a utility function that quantiﬁes the expected utility of the target population when using a marker-based treatment strategy by taking into account both the eﬃcacy and toxicity of the treatment options. The motivation behind utility analysis is the assumption that decision-makers choose the most preferable treatment option for each patient in order to maximize the expected utility of the population. For example, when the utility is measured by the length of the patient’s life, the expected utility

analysis identiﬁes the most preferable treatment-strategy to maximize the life expectancy of the population.^{10}

When a marker-based treatment strategy is used to guide treatment decision, the marker optimal threshold is the marker value that maximizes the expected utility of the target population; it is therefore the marker value that maximizes the utility function.

2.1 Expected utility function

Let us consider the simple context in which the task is to decide between two treatment options, referred to as ‘‘innovative treatment’’ (T ¼ 1) and ‘‘reference treatment’’ (T ¼ 0). The binary event of interest is denoted by E,

where E ¼ 1 indicates the presence of the event of interest in a post-treatment interval, and E ¼ 0 (or E) indicates its

absence. For example, E might be an indicator of cancer recurrence or death. Let 0¼ PðE ¼ 1jT ¼ 0Þ and

1¼ PðE ¼ 1jT ¼ 1Þ, indicating the mean risk of event under each treatment. The mean risk of event measures

the eﬃcacy of each treatment option. At equal eﬃcacy, it is assumed that the harm of the innovative treatment is greater than that of the reference treatment; if the harm of the innovative treatment is lower than that of the reference treatment the following developments could be adapted (see online Supplemental Material). The candidate marker, denoted by X, is a quantitative measurement, or it may be a score that combines multiple measurements. It is assumed that the innovative treatment is recommended for marker values lower than the marker threshold c, and the reference treatment is recommended for marker values greater than the

marker threshold. If the innovative treatment is recommended for marker values greater than the marker threshold c, the following developments could also be adapted (see online Supplemental Material).

Under these assumptions, four populations are deﬁned:

. Population 0E: Patients who developed the event of interest under the reference treatment (X > c) . Population 0 E: Patients who did not develop the event of interest under the reference treatment (X > c) . Population 1E: Patients who developed the event of interest under the innovative treatment (X c) . Population 1 E: Patients who did not develop the event of interest under the innovative treatment (X c).

Let UijðxÞ deﬁne the utility associated with a patient in the population ij and having a marker value equal to x

with i ¼ 0, 1 for patients treated with the reference or the innovative treatment, respectively, and j ¼ E, E for

patients who developed the event of interest or who did not develop the event, respectively.

The expected utility function, U(c), that quantiﬁes the mean utility given the above-mentioned treatment decision rule for threshold c, is expressed as

UðcÞ ¼
Zþ1
c
PðE ¼1, X ¼ xjT ¼ 0Þ U_{0E}ðxÞ þ PðE ¼ 0, X ¼ xjT ¼ 0Þ U_{0 }_{E}ðxÞ dx
þ
Z _{c}
1PðE ¼1, X ¼ xjT ¼ 1Þ U1EðxÞ þ PðE ¼ 0, X ¼ xjT ¼ 1Þ U_{1 }_{E}ðxÞ dx

This function quantiﬁes the expected utility that corresponds to a weighted sum of utilities for a given threshold c. As the marker is measured before any treatment intervention, the marker distribution and the treatments are

independent, therefore it is assumed that PðX cjT ¼ 0Þ ¼ PðX cjT ¼ 1Þ ¼ PðX cÞ ¼ FXðcÞ, 8c.

Then the utility function can be expressed as
UðcÞ ¼
Zþ1
c
fXðxÞ½0ðxÞ U_{0E}ðxÞ þ f1 0ðxÞg U_{0 }_{E}ðxÞ dx
þ
Z_{c}
1f_{X}ðxÞ½1ðxÞ U_{1E}ðxÞ þ f1 1ðxÞg U_{1 }_{E}ðxÞ dx
ð1Þ

where f_{X}ðxÞ is the probability density function of the marker X, 0ðxÞ ¼ PðE ¼ 1jX ¼ x, T ¼ 0Þ is the risk of event

under the reference treatment given the marker value X ¼ x, and1ðxÞ ¼ PðE ¼ 1jX ¼ x, T ¼ 1Þ is the risk of event

under the innovative treatment given the marker value X ¼ x.

It is assumed, with very little restriction, that U_{0 }_{E}ðxÞ 4 U0EðxÞ and U_{1 }_{E}ðxÞ 4 U1EðxÞ, which means that this is

preferable not to develop the event of interest in either treatment arm. Here, the utilities are allowed to depend on the marker value x. For example, age could be a candidate marker in some settings, and it is likely that utilities will diﬀer between young patients and older ones.

With a little algebra manipulation, one can see that cancelling the derivative (with respect to c) of the expected utility function leads to the following equation

f_{X}ðcÞ½0ðcÞU_{0E}ðcÞ þ f1 0ðcÞgU_{0 }_{E}ðcÞ þ f_{X}ðcÞ½1ðcÞU_{1E}ðcÞ þ f1 1ðcÞgU_{1 }_{E}ðcÞ ¼ 0

So, the expected utility is maximal for the threshold value that veriﬁes the following equality^{11}

0ðcÞ½U_{0E}ðcÞ U_{0 }_{E}ðcÞ 1ðcÞ½U_{1E}ðcÞ U_{1 }_{E}ðcÞ ¼ U_{1 }_{E}ðcÞ U_{0 }_{E}ðcÞ ð2Þ

This expression helps deﬁne the optimal diﬀerence in risks threshold from which one of the two treatment options is preferred.

In practice, it may be complex to specify meaningful quantities for each utility. Vickers et al.^{5} made some

assumptions on utilities to avoid having to estimate the four utilities separately. They assume that U_{0E}ðxÞ ¼

U0E, U_{0 }_{E}ðxÞ ¼ U_{0 }_{E}¼ 1, U1EðxÞ ¼ U1E and U_{1 }_{E}ðxÞ ¼ U_{1 }_{E}, which means that the utilities are not allowed to

depend on the marker value, and that each utility is expressed relatively to the others. They also assume that

U1E U_{1 }_{E}¼ U_{0E} U_{0 }_{E}, which means that the cost associated with the development of an event is the same in

Under these assumptions, it can be shown (details in online Supplemental Material) that the utility function (1) is proportional to

Uðc, rÞ / FXðcÞ½PðE ¼ 1jX c, T ¼ 0Þ PðE ¼ 1jX c, T ¼ 1Þ r þ A ð3Þ

In this expression, r ¼treatment

event , treatment ¼ U_{1 }_{E} U_{0 }_{E}¼ U_{1E} U_{0E} denotes the additional cost of the

innovative treatment regarding the reference one, event ¼ U0E U_{0 }_{E}¼ U_{1E} U_{1 }_{E} denotes the cost associated

with the development of the event of interest in either arm and A ¼ ^{U}0E0þU_{0 }_{E}ð10Þ

U0EU_{0 }_{E} is an additive constant

relative to c. As one can see, applying the Vickers et al.^{5} assumptions directly to equation (2), or maximizing

the utility function presented in equation (3), leads to the expression of the optimal diﬀerence in risks threshold as

deﬁned by Vickers et al. in their original paper^{5}(details in online Supplemental Material)

0ðcÞ 1ðcÞ ¼ r ð4Þ

This equation means that a patient should be treated with the reference treatment if the diﬀerence in risks of event between the two treatment arms exceeds the ratio r. This result is interesting as it gives a simple interpretation of the ratio r which helps to deﬁne the utilities. This is the diﬀerence in risks of event from which it is acceptable to use the most harmful treatment option. The inverse of the ratio has also a meaning; this corresponds to the number of patients that a clinician is ready to treat with the most harmful treatment option to avoid one additional event-case compared with the less harmful treatment option.

As these assumptions simplify the utility function and that r is meaningful, the same assumptions are made in the following sections. An optimal threshold can be estimated for each r value; however, deﬁning the treatment allocation using the marker is only relevant for a range of r values. This range can be deﬁned using decision curves.

2.2 Decision curves

Decision curves were originally designed for diagnostic or prognostic markers.^{17}Herein, it helps to compare the

expected utility of using the marker to guide the treatment choice with the expected utility of extreme strategies

(‘‘Treat everyone with the innovative treatment’’: U_{T¼1}ðrÞ or ‘‘Treat everyone with the reference treatment’’:

U_{T¼0}ðrÞ) for multiple r settings.

Based on equation (3), it is possible to deﬁne the expected utility of extreme strategies as

UT¼1ðrÞ ¼ lim

c!þ1Uðc, rÞ /0 1 r þ A

UT¼0ðrÞ ¼ lim

c!1Uðc, rÞ / A

In fact, decision curves enable a sensitivity analysis to be conducted using the r ratio as the sensitivity parameter. The relevant range of r values is deﬁned when the marker-based strategy is better than any of the two extreme strategies.

An extension of decision curves was proposed by Baker et al.^{19}for diagnostic markers and is named ‘‘relative

utility’’ curves. Huang et al.^{18}proposed to apply this concept to treatment selection markers. In this case, these

relative utility curves standardize the utility of the marker-based strategy in order to give to its value a meaningful interpretation (0: useless marker, 1: perfect marker). The expression of the relative utility function is proposed in the online Supplemental Material.

2.3 Estimation method

In this section, it is assumed that the marker is measured before treatment administration in a controlled parallel randomized clinical trial with two treatment arms, and that the treatment allocation does not depend on the marker values. Such a study ensures that the treatment eﬃcacies are measured without confounders, and that

PðX cjT ¼0Þ ¼ PðX cjT ¼ 1Þ ¼ PðX cÞ. The latter formula is called the ‘‘randomization constraint’’, and

we demonstrate later that this constraint is useful in the estimation method.

Previously reported methods propose to estimate the optimal diﬀerence in risks threshold by modelling the risk

of event given the marker values and the treatment arms.^{5,7,11}The optimal marker threshold can be derived from a

on the good calibration of the model (which is not always easy to control), and no inference method was proposed to obtain a conﬁdence interval of the marker threshold.

Herein, an alternative approach that does not require calibration is proposed; it is based on modelling the marker distribution in each group of patients previously deﬁned rather than the risk of event given the treatments and the marker values.

Simply applying Bayes’ theorem, and with a little algebra manipulation, the utility function (3) can be expressed as

Uðc, rÞ / F_{X}_{0E}ðcÞ0 F_{X}_{1E}ðcÞ1 r ½PðT ¼ 0ÞfF_{X}_{0E}0þ F_{X}_{0 }_{E}ð1 0Þg

þ PðT ¼ 1ÞfF_{X}_{1E}1þ F_{X}_{1 }_{E}ð1 1Þg þ A

where F_{X}_{ij}ðcÞ is the cumulative distribution function of the marker in the population ij. PðT ¼ 0Þ and PðT ¼ 1Þ are

ﬁxed by the study design. The quantities0and1are estimated by the mean risk of event in each arm. Empirical

cumulative distribution functions may be used to estimate the marker cumulative distribution functions

F_{X}_{0E}ðcÞ, F_{X}_{0 }_{E}ðcÞ, F_{X}_{1E}ðcÞ and F_{X}_{1 }_{E}ðcÞ; however, depending on the sample size, it could lead to an important

uncertainty in the optimal threshold estimate, and the only way to obtain a conﬁdence interval would be by

applying bootstrap methods. Bootstrap methods^{20} have been proposed to obtain the conﬁdence interval of the

optimal threshold of a diagnostic marker; however, it was shown by Schisterman and Perkins^{21}that it may have a

poor coverage performance in some situations. This is why modelling the cumulative distribution functions of the

marker has been preferred.^{22,23} The proposed approach relies on ﬁnding, for each of the four aforementioned

populations, a parametric distribution that ﬁts well the data.

In practice, there is no need to ﬁt each of the four distributions. Using the randomization constraint, one of the four marker cumulative distribution functions can be deﬁned indirectly by the three others and the mean risks of event. We chose to express the most diﬃcult distribution to ﬁt, with respect to the three others. For example, if

FX0EðcÞ is the most diﬃcult distribution to estimate, the randomization constraint states that

FX0E¼ fF_{X}_{1E}1þ F_{X}_{1 }_{E}ð1 1Þ F_{X}_{0 }_{E}ð1 0Þg=0 ð5Þ

Then, the utility function is estimated as ^

Uðc, rÞ / ^FX_{1 }_{E}ðcÞð1 ^1Þ ^FX_{0 }_{E}ðcÞð1 ^0Þ r f ^FX1E^1þ ^FX_{1 }_{E}ðcÞð1 ^1Þg þ A

Obviously, the same process could be applied for any of the four marker distributions.

Suppose, for example, that the biomarker values follow a Gaussian distribution in the populations i Ewith mean

_{i }_{E} and variance2

i E, and that r ¼ 0 (the two treatment options have equal toxicities), then the utility function is

expressed as
Uðc, rÞ / ^{c }^{}1 E
_{1 }_{E}
ð1 1Þ ^{c }^{}0 E
_{0 }_{E}
ð1 0Þ þ A

where denotes the standard normal cumulative distribution function. In this very special case, an explicit

solution exists for the optimal threshold, c^{}:

c^{} ¼_{1 }_{E}ðb2 1Þ a þ b ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
a2þ ðb2 1Þ2
1 Elogðb2R2Þ
q
b2 1
with a ¼ ð_{0 }_{E} _{1 }_{E}Þ, b ¼^{}0 E
1 E and R ¼^{1}0
11.
Or, when_{0 }_{E}¼ _{1 }_{E}
c^{}¼ 0^{2}logðR^{2}Þ þ _{0 }_{E} _{1 }_{E}
2a

wherer0denotes the standard deviation common to the 0 Eand 1 Epopulations.

The variance of the optimal threshold can be derived using the Delta method. However, it is likely that in most cases there is no explicit solution for the optimal threshold. In such cases, numerical algorithms may be used to

optimize the utility function so as to estimate the threshold.^{21,22} In this case, the Delta method cannot be
used anymore. One solution is to use the Bayesian inference to derive an estimate and a credible interval of the
optimal threshold.

An estimate of the optimal threshold could be, for example, the median of its posterior distribution, and a 95% credible interval could be obtained using the 2.5% and 97.5% quantiles of its posterior distribution. As the optimal threshold is a function of the marker distribution in each subgroup of patients and of the mean risk of event in each treatment group, its posterior distribution can be derived from the posterior distribution of the marker distribution parameters and of the posterior distribution of the mean risks of event using a Monte-Carlo

method.^{24}

Lethijdenote the parameter(s) of the distribution function of the marker or the parameters of the distribution

function of the mean risk of event in each treatment group (note thathijmay be a vector of parameters). Let also

pðhijÞ denote the prior distribution on hij. This prior distribution may be non-informative, or vaguely informative,^{24}

when there is no prior information. Let xij¼ x_{ij1},. . . , xijnij be the set of n_{ij} biomarker measurements in the