Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
S54: Software engineering
Time:
Wednesday, 06/Sept/2023:
10:40am - 12:20pm

Session Chair: Oliver Boix
Session Chair: Lukas Widmer
Location: Seminar Room U1.191 hybrid


Show help for 'Increase or decrease the abstract text size'
Presentations
10:40am - 11:00am
ID: 187 / S54: 1
Presentation Submissions - Regular Session (Default)
Topic of Submission: Statistical modelling (regression modelling, prediction models, …)
Keywords: MMRM, PROC MIXED, R, glmmTMB, nlme, mmrm

Comparing R libraries with SAS’s PROC MIXED for the analysis of longitudinal continuous endpoints using MMRM

Gonzalo Duran-Pacheco1, Julia Dedic2, Philippe Boileau3

1Roche, Switzerland; 2Roche, Canada; 3Genentech, US

Mixed-effect models for repeated measures (MMRM) are widely accepted as the statistical approach to analyse longitudinal continuous endpoints in clinical trials. The clinical outcome is observed in study patients over time, making values within subjects more similar than values of other individuals, which implies within-subject correlation. MMRM accounts for such correlation by modeling explicitly the corresponding covariance structure which can be specified in alternative ways (unstructured, compound symmetry, first order autoregressive, Toeplitz, antedependence, and others). SAS’s PROC MIXED has been a broadly accepted software to implement MMRM in clinical trials. However, due to growing popularity in academia, industry and increased compliance with regulations and requirements, the open source software R has become incrementally favored in clinical trial settings. In this study we compare MMRM results obtained by three R libraries: nlme, glmmTMB and mmrm, versus SAS PROC MIXED in an attempt to reproduce clinical study report results of five phase-3 clinical trials and we also conduct simulations. We will report results regarding execution time, marginal expected means, contrasts, corresponding standard errors and p-values. We will also report results of MMRM using various alternative covariance structures.



11:00am - 11:20am
ID: 374 / S54: 2
Presentation Submissions - Regular Session (Default)
Topic of Submission: Statistical modelling (regression modelling, prediction models, …), Software Engineering, Free Contributions
Keywords: simulation, software, review, reproducibility, transparency

An overview of R software tools to support simulation studies: towards standardizing coding practices

Michael Kammer1,2, Lorena Hafermann3, Georg Heinze1

1Medical University of Vienna, Center for Medical Data Science, Institute of Clinical Biometrics, Vienna, Austria; 2Medical University of Vienna, Department of Medicine III, Division of Nephrology and Dialysis, Vienna, Austria; 3Charité–Universitätsmedizin Berlin, corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin , Institute of Biometry and Clinical Epidemiology, Berlin, Germany

Simulation studies comparing different approaches to a particular research task play an important role in establishing the evidence base for biostatistical methods. However, conducting such studies is not a trivial task in itself. Key issues discussed in the biostatistical literature comprise the replicability of the simulations, transparent and complete reporting and neutrality. Well-designed and easy to use software tools can help addressing these concerns. However, while software packages exist for many different types of simulation tasks, there seems to be little consensus on how to standardize the actual coding of a simulation study. Consequently, authors of publications often develop their own ad-hoc simulation code.

As a step towards the standardization of coding practices and code sharing, we provide an overview of existing software packages in the programming language R to support simulation studies, with a focus on the coding of the data generating mechanism. We found that there are many powerful and general simulation packages available, but only few of them were accompanied by peer-reviewed publications. Most packages adopted approaches that explicitly specify the data using distributional assumptions, in contrast to methods that create variations of an existing dataset e.g. similar to fully conditional specification.

In addition, we developed an R package for data generation intended to be easy to use, thus lowering the barrier to conducting proper comparison studies for newly developed methods. To complement the existing ecosystem a key goal of our work is to build a library of interesting data generating models derived from real-world datasets, which are then directly and easily available to other users. Such a library of presets serves as starting point for comparison studies and facilitates full replicability and data protection, as well as the standardization of simulation setups by sharing configurations, rather than by sharing full datasets.

We demonstrate a selection of the identified packages including our own by example analyses using real-world datasets for which we derived plausible data-generating models for simulations through the different approaches. Simulation studies are very diverse and therefore a single tool is not sufficient to perform all kinds of such studies. Nevertheless, software packages may facilitate the standardization and exchange of code, thereby providing a framework essential to design better simulation studies.



11:20am - 11:40am
ID: 394 / S54: 3
Presentation Submissions - Regular Session (Default)
Topic of Submission: Software Engineering
Keywords: Julia, Design of Experiments, Bayesian Statistics

A Julia Package for Bayesian Optimal Design of Experiments

Ludger Sandig

Technische Universität Dortmund, Germany

Suppose a toxicologist wants to study the effects of a drug. But at which concentration(s) and at which point(s) in time should they take measurements? Mathematically, this question can be framed as a problem of optimal experimental design. For the underlying nonlinear regression model this means selecting covariate values and corresponding samples sizes such that the observations are as informative as possible about the unknown model parameters. One way to formalize this is the D-criterion: Maximize the expected gain in Shannon information with respect to some prior density, or equivalently minimize the average volume of a confidence region around the maximum likelihood estimate of the model parameters. Because the optimal number of design points is not known beforehand, designs are represented as probability measures. The resulting optimization problem is computationally intensive, and it is typically solved using particle-based global optimization heuristics.

Existing free and open-source software packages for this task have several drawbacks. Typically, only a small number of response functions and design criteria are implemented. Further ones can be added only by the maintainer of the package, and fiddly interfacing with external C/C++ code is necessary. Where user-defined functions can be supplied, they run much slower than pre-packaged ones. This is especially notable when the covariate has more than one dimension, e.g. in dose-time-response models. Moreover, package code is often not well modularised, making it hard for third parties to contribute extensions. For these reasons it is difficult to adapt existing software for complex experimental setups.

In this talk, we present a Julia package that addresses these issues. Julia is a scientific high-level dynamic programming language with a performance comparable to statically compiled C. Julia's type system and multiple dispatch mechanism allow for a concise implementation that elegantly reflects the optimization problem's mathematical structure. We demonstrate its flexibility on a wide range of examples.



11:40am - 12:00pm
ID: 409 / S54: 4
Presentation Submissions - Regular Session (Default)
Topic of Submission: Software Engineering, Statistical hypothesis testing (covariate adjustment, nonparametric methods, multiple comparisons, …)
Keywords: Sample size calculation, power calculation, online calculator, hypothesis test, exact test

Online sample size calculator

Robin Ristl

Medical University of Viennna, Austria

Deciding on the required sample size for a study is an integral part of study design. The necessary sample size and power calculations are typically performed using dedicated software. Besides commercial software packages, several free to use programs exist, with varying scope of supported hypothesis tests and varying options to accommodate to particular aspects of a planned study such as unequal sample sizes.

A new online sample size calculator was developed with the aim to fit the needs of, both, applied researchers and statisticians involved in study design. The calculator was coded in Java Script and HTML, using distribution functions from the library jStat and adding additional functions for the non-central F-distribution and the non-central hypergeometric distribution. The software is available at https://homepage.univie.ac.at/robin.ristl/samplesize.php and is free to use.

The calculator’s interface was designed with the philosophy that only necessary input options should be encountered at first sight and that users should be allowed to enable extended input options when needed.

Regarding continuous outcomes, the calculator currently features the two-sample t-test, paired t-test, analysis of variance with arbitrary number of groups and the Wilcoxon-Mann-Whitney rank sum test. For binary outcomes, tests comparing two proportions are supported, including asymptotic tests and Fisher’s exact test, as well as one-sample proportion tests and McNemar’s test for paired proportions. Further, calculations for the logrank test for comparing two survival functions and a correlation test based on Fisher’s z-transformation are available.

For all tests comparing parallel groups, the calculator allows to calculate sample sizes for unequal allocation ratios between groups, as well as power for unequally sized groups. The calculator has built-in options to consider a Bonferroni-adjustment and to increase the final sample size to match an assumed drop-out rate.

To validate the calculator, results for selected scenarios, covering all function of the software, where compared to results from established commercial software and to independent simulation results.

A current development step is a general implementation of sample size calculation for exact tests. Exact tests typically have a non-monotone power function, i.e. the power may be reduced when the sample size is increased. Thus, an according sample size calculation will have more than one solution and challenges involve the identification of all solutions and a concise presentation to the user. Major sample size tools typically do not include the option to calculate the sample size for exact tests, rather the user is limited to calculate the power for a given sample size.

The aim of the talk is to present the rationale and challenges in developing a sample size tool, to present the validation study and to make accessible the calculator to a wider audience. Further, the particular challenge of sample size calculation for exact tests will be discussed.



12:00pm - 12:20pm
ID: 199 / S54: 5
Presentation Submissions - Regular Session (Default)
Topic of Submission: Software Engineering, Free Contributions
Keywords: Programming, reporting, multi-platform, collaboration, PDR technology

A multi-platform PDR technology to efficiently build complex tables & reports, flexibly iterate with stakeholders, and easily maintain across workflows

Ming Zou1,2

1RepTik Analytics Solution, Switzerland; 2University Hospital of Basel, Switzerland

Background: Current programming techniques for customized data analysis and reporting, e.g., a clinical study report for regulatory stakeholders, are mostly via a “brute force” approach and have little breakthrough for the last 30 years. Programmers are wasting lots of recurrent efforts on supportive jobs like format/layout/shell programming and associated debug, QC, modify, manual edits, and trainings. Due to the limitation, programmers are reluctant to iterate with report stakeholders for changes. Further, it leads to highly fragmented and constrained practices across different teams/platforms/file types, which is very difficult to collaborate.

Method: We seek out to address the problems by first analyzing different teams’ analysis and reporting workflows across multiple platforms (SAS, R, SQL…) and across multiple report file types (Word, Excel, PowerPoint…). Based on the analysis we designed and developed a multi-platform Presentation Data Referencing (PDR) technology, which enables an efficient, flexible, and inter-operable approach to perform customized analysis and reporting for clinical studies.

Result: From the workflow analysis, we discovered that due to limitations in current reporting techniques, data analysts typically tend to narrow themselves to one preferred report file type and one preferred analysis and scripting software, and then to maximize the efficiency by simply wrap up complex and constrained codes into big functions. For new situations, the recurrent iterative cycles of scripting, debugging, modifying, and quality checking activities cost a lot of efforts. Meanwhile, the value adding jobs are mainly the data analysis and result generating parts, not the format and layout generating parts. Inspired by using an index structure to operate numeric arrays, we designed and developed a multi-platform Presentation Data Referencing (PDR) technology by referencing the result placeholders in a report template (e.g. a Word shell report) and referencing the specific results during the analysis, so that to inject the values directly into the report template without programming it. In this way, users can create, format, and modify the tables & reports with the ease of office software and fill in with data from different platforms automatically through the PDR technology.

Conclusion: Our multi-platform PDR technology is applicable across different programming platforms, different analysis teams, and different report file types. Our technology can boost productivity, collaboration, and inter-operability of clinical studies for different organizations and stakeholders. The PDR technology is now patent-pending.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: CEN 2023
Conference Software: ConfTool Pro 2.6.151+TC
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany