Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
S42: Simulation studies
Time:
Tuesday, 05/Sept/2023:
4:10pm - 5:50pm

Session Chair: Mark Baillie
Session Chair: Louis Meir Dijkstra
Location: Seminar Room U1.195 hybrid


Show help for 'Increase or decrease the abstract text size'
Presentations
4:10pm - 4:30pm

Towards more practically relevant method comparison studies by generating simulations based on a sample of real data sets

Christina Nießl1,2, Julian Lange1,2, Maria Thurow3, Ina Dormuth3, Markus Pauly3,5, Marc Ditzhaus4, Anne-Laure Boulesteix1,2

1LMU Munich, Germany; 2Munich Center for Machine Learning (MCML); 3TU Dortmund University; 4Otto-von-Guericke-University Magdeburg; 5UA Ruhr, Research Center Trustworthy Data Science and Security

Simulation studies are an essential approach for comparing statistical methods. Two of the key advantages that set them apart from benchmark studies based on real data are (1) the availability of the ground truth and (2) the wide range of parameters that can be explored. However, these features come at a price: Simulation studies are often criticized for being too simplistic and not reflecting reality. Moreover, the infinite parameter space presents researchers with the often difficult decision of choosing realistic and informative parameter values. This process is also prone to the selective reporting of parameter values that lead to favorable results (e.g., a good performance for a specific method), a questionable research practice that threatens the neutrality of simulation studies.

To overcome these drawbacks, several approaches have been proposed to design simulation studies based on real data, for example setting key parameters (e.g., sample sizes, means, variances, or correlations) according to real data examples. However, the number of underlying data sets is usually restricted to one or two, and it is often not clear how these data sets were selected.

In this work, we present the idea of systematically basing simulations on a whole sample of real data sets that were selected according to pre-specified inclusion criteria as a means to obtain comprehensive and practically relevant results. We illustrate this approach using two examples. For the first example, we simulate data reflecting two-arm trials with ordinal endpoints. Here, the parameter of interest is the distribution of the ordinal endpoint in the two treatment groups. We set this parameter by sampling from all articles that were published in selected issues of the New England Journal of Medicine and that analyzed two-arm trials with ordinal endpoints.

For the second example, we consider the comparison of differential gene expression methods that aim to identify genes with differences in their expression levels between two conditions. In this more complex simulation, there are several parameters to be specified, such as the mean expression level or dispersion of each gene. In this application, we specify our sample as all cancer data sets provided in the The Cancer Genome Atlas (TCGA) data base.

For both examples, the results based on the sampled simulation parameters differ from the results of user-specified parameters and parameters based on a single data set, hence suggesting the potential of "simulation-sampling" as a useful complement to standard simulation approaches.



4:30pm - 4:50pm

A simple-to-use R package for mimicking study data by simulations

Giorgos Koliopanos1, Francisco M. Ojeda2, Andreas Ziegler1,2,3

1Cardio-CARE, Medizincampus Davos, Switzerland; 2Department of Cardiology, University Heart & Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany; 3School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa

Background: Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred which mimic the structure but are different from the existing study data.

Objectives: The aim of this work is to introduce the simple-to-use R package modgo which may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.

Methods: The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.

Results: modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo’s flexibility was demonstrated on several expansions.

Conclusions: The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.



4:50pm - 5:10pm

Comparison of methods for quantifying similarity of datasets

Marieke Stolte, Jörg Rahnenführer, Andrea Bommert

Department of Statistics, TU Dortmund University, Germany

Quantifying the similarity between two datasets has widespread applications in statistics and machine learning. Generalizability of a statistical model refers to the performance of the model on new or unseen datasets and depends on the similarity between the dataset used for fitting the model and the new datasets. In meta-learning and in transfer learning, a central component is to exploit or transfer insights between different datasets. In simulation studies, the similarity between distributions assumed in the simulation and distributions of datasets, for which the performance of methods is assessed, is crucial.

Extremely many approaches for quantifying dataset similarity have been proposed in the literature. Here, we present an extensive review and comparison of such methods. We examined more than 70 methods for quantifying the similarity of datasets and classified them into ten subclasses, including comparisons of cumulative distribution functions, comparisons of densities or characteristic functions, kernel- and graph-based discrepancy measures, methods based on inter-point distances, probability metrics, divergences, and comparisons based on binary classification. We compared all methods in terms of their applicability, interpretability, and theoretical properties, in order to provide recommendations for selecting an appropriate data similarity measure based on the goal of the dataset comparison and on the properties of the datasets at hand.

Based on insights from these comparisons, we aim to compare methods for simulating datasets (parametric, nonparametric, plasmode) to design more appropriate simulation studies. We will use the best-suited dataset similarity measures in a comparison of parametric and plasmode simulations for quantifying how similar simulated datasets are to data from the true data generating process. We present preliminary findings from a study for measuring the quality of regression models.



5:10pm - 5:30pm

How to Simulate Realistic Survival Data? A Simulation Study to Compare Realistic Simulation Models

Maria Thurow1, Ina Dormuth1, Christina Nießl2, Anne-Laure Boulesteix2, Marc Ditzhaus3, Markus Pauly1,4

1TU Dortmund University, Germany; 2Ludwig-Maximilians-University of Munich, Germany; 3Otto-von-Guericke-University Magdeburg, Germany; 4UA Ruhr, Research Center Trustworthy Data Science and Security, Dortmund, Germany

In statistics, it is important to have realistic data sets available for a particular context to allow an appropriate and objective method comparison. For many use cases, benchmark data sets for method comparison are already available online. However, in most medical applications and especially for clinical trials in oncology, there is a lack of adequate benchmark data sets, as patient data can be sensitive and therefore cannot be published. Another possible challenge is the need of a larger number of data sets or observations. Furthermore, if methods need to be compared for a specific setting, this may not be covered by the available data sets. A potential solution for this are simulation studies.

However, it is sometimes not clear, which simulation models are suitable for generating realistic data, e.g., for time-to-event analyses. A challenge here is that potentially unrealistic assumptions have to be made about the distributions. Instead, benchmark data sets can be used as a basis for the simulations, which has the following advantages: the actual properties are known and more realistic data can be simulated.
There are several possibilities to simulate realistic data from benchmark data sets. In order to make recommendations on which models are best suited for a specific survival setting, we conducted a simulation study comparing simulation models based upon kernel density estimation, fitted distributions, and two different bootstrap approaches. We used the runtime and different accuracy measures (e.g., the p-values of the log-rank test wrt the benchmark data sets) as criteria for comparison.

Using the example of comparing two-sample procedures for lung cancer studies, we propose a way to simulate realistic survival data in two steps: In a first step, we provide reconstructed benchmark data sets from recent studies on lung cancer patients. To do so, we first searched for adequate data sets. Therefore, we considered phase III clinical studies from oncology in which a log-rank test (our benchmark method) was applied and the Kaplan-Meier estimator was reported. This resulted in 290 potential studies. Restricting our analysis to lung cancer as a gender-independent cancer type and to studies with the required necessities for reconstruction, finally resulted in seven studies. We then reconstructed the data sets using a state-of-the-art reconstruction algorithm. In a second step, we build upon the reconstructed benchmark data sets to propose different realistic simulation models for model comparison.

Besides slight differences in runtime, our results show that, in our setting, simulations based on kernel density estimation and case resampling lead to data sets representing the original data well while simulating data from a fitted distribution does not succeed to do so. This demonstrates that it is possible to simulate realistic survival data when benchmark data sets (or at least the required information to reconstruct them) from real-world studies are available. In subsequent future applications, these results can be used for method comparison or further analyses, e.g., for sample size planning (for follow-up studies).



5:30pm - 5:50pm

Simulation study to compare methods to analyze time-to-event endpoints in trials with delayed treatment effects

Rouven Behnisch, Marietta Kirchner, Meinhard Kieser

Institute of Medical Biometry, University of Heidelberg, Germany

The advance of immuno-oncology therapies comes with the challenge to deal with the unique mechanisms of action of these drugs especially when the primary efficacy endpoint is a time-to-event endpoint. It has been shown that the most powerful methods to compare time-to-event endpoints are weighted log-rank tests with weights proportional to the hazard ratio. This implies that the standard log-rank test, often required by regulatory authorities, is most powerful under proportional-hazards alternatives. This assumption is often violated by the mechanism of action leading to delayed treatment effects or crossing of survival curves resulting in a substantial loss in power. Hence, a rather long follow-up period is required to detect a significant effect in immune-oncology trials when the log-rank test is used. Another way to compensate this loss in power would be to prespecify weights proportional to the hazard ratio but is often not feasible since the exact mechanism is usually not known in advance. Recently, different alternatives have been advocated, including well-known methods such as the family of Fleming-Harrington weighted log-rank statistics, accelerated failure time models or additive hazard models, but also newly developed methods such as the modestly weighted log-rank test, the MaxCombo test and tests based on the restricted mean survival time or combinations of different test statistics.

For a better overview over the multitude of methods that have been proposed so far, we have conducted a systematic literature search. The resulting set of methods was then compared systematically with regard to type I error and power in an extensive simulation study. To incorporate different mechanisms of action, we simulate data based on a generalized linear lag model for varying times of study duration, accrual and delay as well as different treatment effects. For methods where parameters need to be prespecified, the influence of misspecification of these parameters on the power was also assessed.

Most of the methods control type I error and achieve reasonable power in case of proportional hazards. A delayed treatment effect results in a power reduction for all methods, but the extent of this reduction varies between methods.

As expected the performance of the log-rank test decreases with increasing treatment delay and there is no single method that performs best in all scenarios so the choice of the optimal analysis strategy depends on the assumed delay pattern.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: CEN 2023
Conference Software: ConfTool Pro 2.6.149+TC
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany