Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available). To only see the sessions for 3 May's Online Day, select "Online" for location.

Please note that all times are shown in the time zone of the conference. The current conference time is: 28th Apr 2024, 02:51:34pm CEST

 
Only Sessions at Location/Venue 
 
 
Session Overview
Session
SES-03 (PANEL): INSTITUTIONAL WEB ARCHIVING INITIATIVES TO SUPPORT DIGITAL SCHOLARSHIP
Time:
Thursday, 11/May/2023:
1:30pm - 2:30pm

Session Chair: Martin Klein, Los Alamos National Laboratory
Location: Theatre 1


Show help for 'Increase or decrease the abstract text size'
Presentations

Institutional Web Archiving Initiatives to Support Digital Scholarship

Martin Klein1, Emily Escamilla2, Sarah Potvin3, Vicky Rampin4, Talya Cooper4

1Los Alamos National Laboratory, United States of America; 2Old Dominion University, United States of America; 3Texas A&M University, United States of America; 4New York University, United States of America

Panel description:
Scholarship happens on the web but unlike more traditional output such as scientific papers in PDF format, we are still lacking comprehensive institutional web archiving approaches to capture increasingly prominent scholarly artifacts such as source code, datasets, workflows, and protocols. This panel will feature scholars from three different institutions - Old Dominion University, Texas A&M University, and New York University - that will provide an overview of their explorations in investigating the use of scholarly artifacts and their (in-)accessibility on the live web. The panelists will further outline how these findings inform institutional collection policies regarding such artifacts, web archiving efforts aligned with institutional infrastructure, and outreach and education opportunities for students and faculty. The panel will conclude with an interactive discussion while welcoming input and feedback from the WAC audience.

Individual:

Emily:

Title: Source Code Archiving for Scholarly Publications

Abstract:

Git Hosting Platforms (GHPs) are commonly used by software developers and scholars to host source code and data to make them available for collaboration and reuse. However, GHPs and their content are not permanent. Gitorious and Google Code are examples of GHPs that are no longer available even though users deposited their code expecting an element of permanence. Scholarly publications are well-preserved due to current archiving efforts by organizations like LOCKSS, CLOCKSS, and Portico; however, no analogous effort has yet emerged to preserve the data and code referenced in publications, particularly the scholarly code hosted online in GHPs. The Software Heritage Foundation is working to archive public source code, but issue threads, pull requests, wikis, and other features that add context to the source code are not currently preserved. Institutional repositories seek to preserve all research outputs which include data, source code, and ephemera; however, current publicly available implementations do not preserve source code and its associated ephemera, which presents a problem for scholarly projects where reproducibility matters. To discuss the importance of institutions archiving scholarly content like source code, we first need to understand the prevalence of source code within scholarly publications and electronic theses and dissertations (ETDs). We analyzed over 2.6 million publications across three categories of sources: preprints, peer-reviewed journals, and ETDs. We found that authors are increasingly referencing the Web in their scholarly publications with an average of five URIs per publication in 2021, and one in five arXiv articles included at least one link to a GHP. In this panel, we will discuss some of the questions that result from these findings such as: Are these GHP URIs still available on the live Web? Are they available in Software Heritage? Are they available in web archives and if so, how often and how well are they archived?

Sarah:

Title: Designing a Sociotechnical Intervention for Reference Rot in Electronic Theses

Abstract:

Intertwined publication and preservation practices have become widespread in the establishment of institutional digital repositories and libraries’ stewardship of institutional research output, including open educational resources and electronic theses and dissertations. Most digital preservation work seeks to preserve a whole text, like a dissertation, in a digital form. This presentation reports on an ongoing research effort - a collaboration with Klein, Potvin, Katherine Anders, and Tina Budzise-Weaver - intended to prevent potential information loss within the thesis, through interventions that can be integrated into trainings and thesis management tools. This approach draws on research into graduate training and citation practices, web archiving, open source software development, and digital collection stewardship with a goal of recommending systematized sociotechnical interventions to prevent reference rot in institutionally-hosted graduate theses. Findings from qualitative surveys and interviews conducted at Texas A&M University on graduate student perceptions of reference rot will be detailed.

Vicky/Talya

Title: Collaborating on Software Archiving for Institutions

Abstract:

Inarguably, software and code are part of our scholarly record. Software preservation is a necessary prerequisite for long-term access and reuse of computational research, across many fields of study. Open research software is shared on the Web most commonly via Git hosting platforms (GHPs), which are excellent for fostering open source communities, transparency of research, and add useful features on top such as wikis, continuous integration, and merge requests and issue threads. However, the source code and the useful scholarly ephemera (e.g. wikis) are archived separately, often by “breadth over depth” approaches. I’ll discuss the Collaborative Software Archiving for Institutions (CoSAI) project from NYU, LANL, ODU, and OCCAM, which is addressing this pressing need to provide machine-repeatable, human-understandable workflows for preserving web-based scholarship, scholarly code in particular, alongside the components that make it most useful. I’ll present the results of ongoing efforts in the three main streams of work: 1) technical development on open source, community-led tools for collecting, curating, and preserving open scholarship with a focus on research software, 2) community building around open scholarship, software collection and curation, and archiving of open scholarship, and 3) optimizing workflows for archiving open scholarship with ephemera, via machine-actionable and manual workflows.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: IIPC WAC 2023
Conference Software: ConfTool Pro 2.6.149
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany