Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
S48: Sample size considerations
Time:
Wednesday, 06/Sept/2023:
8:30am - 10:10am

Session Chair: Lorenz Uhlmann
Session Chair: Fetene Tekle
Location: Seminar Room U1.197 hybrid


Show help for 'Increase or decrease the abstract text size'
Presentations
8:30am - 8:50am

Sample Size Re-estimation for the Wilcoxon-Mann-Whitney and Brunner-Munzel Test

Stephen Schüürhuis1, Tobias Mütze2, Georg Zimmermann3, Frank Konietschke1

1Institute of Biometry and Clinical Epidemiology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany; 2Statistical Methodology, Novartis Pharma AG, Basel, Switzerland; 3Team Biostatistics and Big Medical Data, IDA Lab Salzburg, Paracelsus Medical University, Salzburg, Austria

Proper sample size determination throughout the planning stage of a randomized controlled trial is crucial. The sample size is usually determined as the (minimum) sample size to detect an alternative δ, say, with target power of 1 − β (e.g. 80%) at significance level α (e.g. 5%). In general, however, the sample size computed does not only depend on the aforementioned parameters, but also on nuisance parameters (e.g. variance). Hence, the appropriateness of the resulting sample size particularly depends on the validity of the specified input assumption on the effect.

In practice, however, a-priori knowledge about parameter values might be scarce, e.g. in novel indications or rare diseases. Accordingly, the assumptions on the effect (and its variability) can be highly uncertain. Therefore, allowing for modifications of the preplanned sample size during the trial based on updated knowledge about this effect might be an attractive alternative option. Interim sample size re-estimation is regarded as a particular class of interim adaptions within the general framework of adaptive designs (see, e.g., [1]). In (unblinded) sample size re-estimation designs, a first-stage cohort of patient data can be used in order to adaptively de- or increase the overall sample size based on the interim effect estimate as compared to the initially planned overall sample size.

While extensive theory has been developed for binary, continuous and survival endpoints (e.g. [1, 2]), there has been comparatively little discussion in the adaptive design literature on how to perform sample size re-estimation if the underlying statistical procedure is nonparametric, e.g. if the analysis should be done using the Wilcoxon-Mann-Whitney or the Brunner-Munzel-test. In disease areas such as amyotrophic lateral sclerosis (ALS), however, rank-based methods are commonly used, see e.g. [3], and they are considered more robust if distributional assumptions or asymptotics do not hold. The effect size of those tests is the so-called relative effect p = P(X<Y)+0.5P(X=Y) for X∼FX ,Y∼FY, where p>0.5 means that Y stochastically tends to larger values than X.

In this talk, we will present unblinded sample size re-estimation procedures for the Wilcoxon-Mann-Whitney and the Brunner-Munzel test. In particular, we will focus on interim sample size adaptations based on estimates of the relative effect p utilizing the conditional power of those tests, i.e. the probability to obtain a significant result given the already observed interim data. Moreover, we will provide simulation studies to investigate the designs with respect to type I error rate control in various settings for continuous and ordered categorical data.

[1] G. Wassmer and W. Brannath. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials.Springer, Heidelberg, 2016.

[2] C. Chuang-Stein, K. Anderson, P. Gallo, and S. Collins. Sample size reestimation: a review and recommendations. Drug information journal: DIJ/Drug Information Association, 40(4):475–484, 2006.

[3] J. D. Berry, R. Miller, D. H. Moore, M. E. Cudkowicz, L. H. Van Den Berg, D. A. Kerr, Y. Dong, E. W. Ingersoll, and D. Archibald. The combined assessment of function and survival (cafs): a new endpoint for als clinical trials. Amyotrophic lateral sclerosis and frontotemporal degeneration, 14(3):162–168, 2013.



8:50am - 9:10am

Sample size recalculation in three-stage clinical trials

Björn Bokelmann1, Geraldine Rauch1, Jan Meis2, Meinhard Kieser2, Carolin Herrmann1

1Institute of Biometry and Clinical Epidemiology, Charité - Universitätsmedizin Berlin; 2Institute of Medical Biometry, University Medical Center Ruprechts-Karls University Heidelberg

Choosing an adequate sample size for a clinical trial is an important task and can be challenging. To find a sample size, which puts as few as possible patients at risk while maintaining a high power, among others, information about the effect size of the medical treatment is required. Wrong assumptions about the effect size could lead to underpowering or oversizing of a trial. One potential remedy to this problem are multi-stage trials combined with sample size recalculation. In a first stage, a number of patients is recruited and the outcomes of the primary endpoint are examined in an interim analysis. The obtained information from the interim analysis is then used to (re-)calculate the sample size of the following stage. There exists research about sample size recalculation for two-stage trials and the according approaches are already applied in practice (Friede & Kieser, 2006; Bauer et al. 2016). However, for designs with more than two stages, previous literature only examines the potential of sample size reduction due to efficacy and futility stopping, without considering the possibility of sample size recalculation (Chen, 1997; Chen, 2008 ). In our research, we consider three stage trials, with the option for futility and efficacy stopping after the first two stages. We examine under which conditions and to what extent sample size recalculation at the final stage could prevent underpowering or oversizing, if the assumed effect size deviates from the true effect size. While sample size recalculation could yield these potential benefits, it also poses an application challenge due to the uncertainty about the total number of patients to recruit when starting the trial. To measure this disadvantage, we also examine the variance of the sample size.

References

Friede, T., & Kieser, M. (2006). Sample size recalculation in internal pilot study designs: a review. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 537-555.

Bauer, P., Bretz, F., Dragalin, V., König, F., & Wassmer, G. (2016). Twenty‐five years of confirmatory adaptive designs: opportunities and pitfalls. Statistics in Medicine, 35(3), 325-347.

Chen, T. T. (1997). Optimal three‐stage designs for phase II cancer clinical trials. Statistics in medicine, 16(23), 2701-2711.

Chen, K., & Shan, M. (2008). Optimal and minimax three-stage designs for phase II oncology clinical trials. Contemporary Clinical Trials, 29(1), 32-41.



9:10am - 9:30am

Sample size calculations for cluster randomised trials using assurance

Sarah Faye Williamson1, Svetlana V. Tishkovskaya2, Kevin J. Wilson3

1Biostatistics Research Group, Newcastle University, UK; 2Faculty of Health and Care, Lancashire Clinical Trials Unit, University of Central Lancashire, UK; 3School of Mathematics, Statistics & Physics, Newcastle University, UK

Sample size determination for cluster randomised trials (CRTs) is challenging because it requires robust estimation of the intra-cluster correlation coefficient (ICC). Typically, the sample size is chosen to provide a certain level of power to reject the null hypothesis in a two-sample hypothesis test. This relies on the minimal clinically important difference (MCID) and estimates of the overall standard deviation, the ICC and, if cluster sizes are assumed to be unequal, the coefficient of variation of the cluster size. Varying any of these parameters can have a strong effect on the required sample size. The ICC can be particularly challenging to estimate and, if the value used in the power calculation is far from the unknown true value, can lead to trials which are substantially over- or under-powered.

In this talk, we present a hybrid approach which uses the Bayesian concept of assurance (or expected power) to determine the sample size for a CRT in combination with a frequentist analysis. Assurance is a robust alternative to traditional power which incorporates the uncertainty on key parameters through prior distributions. We suggest specifying prior distributions for the overall standard deviation, ICC and coefficient of variation of the cluster size, while still utilising the MCID.

This approach is motivated by a parallel-group CRT in post-stroke incontinence. Although a pilot study was conducted for this trial, the resulting ICC estimate was of low precision and could not be used as a reliable source for the sample size calculation. We illustrate the effects of redesigning this trial using the hybrid approach and compare the results to those obtained from a standard power calculation. The impacts of misspecifying the ICC prior distribution are also considered.



9:30am - 9:50am

May the power be with you? Influence of sample size calculation on replication success.

Collazo Anja, Danziger Meggie

Berlin Institute of Health, Germany

In preclinical replication projects such as the Reproducibility Project: Cancer Biology, the central goal is to solidify evidence with respect to a knowledge claim. To enable reliable conclusions from a replication project, sample size choices are an important yet underrated aspect of study design. First, the choice reflects how well researchers balance resources and potential information gain given ethical constraints inherent to animal experimentation. Second, sample size calculation implicitly presents a decision-criterion on whether a replication study might seem unnecessary or unfeasible. Third, many definitions of replication success are dependent on the precision of effect size estimates which in turn depends on sample size. As there is little established guidance, researchers often resort to the “standard” approach using the original effect size estimate to base their sample size estimation on. It has been shown that this likely results in replication failure due to underpowered studies.

Here, we explore how different conceptual starting points for sample size estimation influence the probability to declare replication success. Building on empirical data of 86 original studies from three preclinical replication projects, we conducted a simulation study contrasting the standard approach to calculate sample size for a replication to three other approaches. One approach employs a smallest effect size of interest (SESOI), the safeguard method uses the lower bound of the confidence interval, and the skeptical p-value is a reverse Bayesian method in which a prior centered around the null is applied to the original effect estimate. The approaches differ in how they incorporate uncertainty of the original study estimate into sample size estimation to increase reliability.

Based on the estimated sample sizes, studies were categorized with regard to whether a replication is feasible, unfeasible (ntotal > 280), or not necessary (ntotal < 4). In the SESOI approach, all 86 experiments were carried forward to replication, whereas with standard approach 78/86, the safeguard 68/86 and skeptical p-value methods 49/86 were selected, respectively.

The standard approach on average advised reducing sample sizes in the replication compared to the original study. In contrast, all other approaches suggested an increase in sample sizes for the replication, in accordance with the goal to increase reliability. In addition, we assessed replication success for each of the approaches. The standard approach fares worst, achieving less than 50% replication success given that a true moderate to large effect is present. While the SESOI and safeguard approaches achieve the highest success rates with over 90%, substantially more animals are needed for the replication effort. The skeptical p-value approach best balances success rates and number of animals invested across the 86 experiments.

Our results reveal that the standard approach to sample size estimation for replication fails to increase reliability and decreases chances to declare replication success compared to all other approaches. We reason that preclinical replication studies are worthwhile only if conducted ethically. This might mean that more animals are needed, however, they are used in studies bound to strengthen evidence robustness rather than wasted in studies bound to be inconclusive.



9:50am - 10:10am

Researcher Degrees of Freedom in Power Analyses and Sample Size Planning

Nicole Ellenbach1, Anne-Laure Boulesteix1, Sabine Hoffmann2, Bruno L. Cadilha3, Sebastian Kobold3,4,5, Juliane C. Wilcke1

1Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians University of Munich, Munich, Germany; 2Department of Statistics Ludwig-Maximilians University of Munich, Munich, Germany; 3Center of Integrated Protein Science Munich and Division of Clinical Pharmacology, Department of Medicine IV, Klinikum der Universität München, Munich, Germany; 4German Center for Translational Cancer Research (DKTK), Partner Site Munich, Germany; 5Einheit für Klinische Pharmakologie (EKLiP), Helmholtz Zentrum München, German Research Center for Environmental Health (HMGU), Neuherberg, Germany

There is an increasing awareness that data analysts face many uncertain choices when analyzing empirical data. These uncertain choices, which are commonly referred to as “researcher degrees of freedom”, lead to a multiplicity of possible analysis strategies that may yield overoptimistic and non-replicable research findings if combined with result-dependent selective reporting. Improvements in statistical planning and study design, such as power analyses and the pre-registration of studies, are often advocated as solutions to improve the replicability and credibility of research findings.

More specifically, appropriate power analyses and sample size calculations are essential to avoid underpowered studies, which have a higher risk of producing false negative findings and are thus more likely to yield misleading results. At the same time, a smaller sample size is often desirable, due to practical and financial aspects for example, and, in the case of preclinical animal studies or clinical studies on humans, even essential from an ethical point of view. The sample size should therefore not be larger than necessary to achieve the desired power.

However, power analyses and sample size calculations require many assumptions concerning parameters such as the expected effect size and the variability of the outcome in the target population, as well as the validity of distributional assumptions. Of course, if researchers had perfect knowledge on all parameters and assumptions, they would not need to conduct the planned study in the first place. Power analyses are thus affected by several researcher degrees of freedom, which, depending on the combination of choices, can lead to very different required sample sizes.

We discuss different researcher degrees of freedom in power analyses and sample size planning and evaluate their impact on the resulting statistical power and the required sample size. The opportunistic use of some of these researcher degrees of freedom is problematic, whereas others can be used in an unproblematic way to ensure the smallest possible sample size while still providing the study with sufficient sensitivity and validity. We illustrate these ideas using several examples from preclinical and clinical research, including a confirmatory preclinical animal study on the effectiveness of CAR T cells in tumour therapy.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: CEN 2023
Conference Software: ConfTool Pro 2.6.149+TC
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany