Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available). To only see the sessions for 3 May's Online Day, select "Online" for location.

Please note that all times are shown in the time zone of the conference. The current conference time is: 28th Apr 2024, 01:29:05am CEST

 
Only Sessions at Location/Venue 
 
 
Session Overview
Session
SES-16: PRESERVATION & COMPLEX DIGITAL PUBLICATIONS
Time:
Friday, 12/May/2023:
2:20pm - 3:50pm

Session Chair: Kiki Lennaerts, Sound & Vision
Location: Theatre 1


These presentations will be followed by a 10 min Q&A.

Show help for 'Increase or decrease the abstract text size'
Presentations
2:20pm - 2:40pm

Preservability and Preservation of Digital Scholarly Editions

Michael Kurzmeier1, James O'Sullivan1, Mike Pidd2, Orla Murphy1, Bridgette Wessels3

1University College Cork, Ireland; 2University of Sheffield; 3University of Glasgow

Digital Scholarly Editions (DSE) are web resources, thus subject to data loss. While DSEs are usually the result of funded research, their longevity and preservation is uncertain. DSEs might be partially or completely captured during web archiving crawls, in some cases making web archives the only remaining publicly available source of information about a DSE. Patrick Sahle’s Catalogue of DSEs lists ~800 URLs referring to DSEs, of which 46 refer to the Internet Archive. (2020) This shows the overlap between DSEs and web archives and highlights the need for a closer look at the longevity and archiving of these important resources. This presentation will introduce a recent study on the availability and longevity of DSEs and introduce different preservation models and examples specific to DSEs. Examples of lost and partially preserved editions will be used to illustrate the problem of preservation and preservability of DSEs. This presentation will also outline the specific challenges of archiving DSEs.

The C21 Editions project is a three-year international research collaboration researching the state of the art and the future of DSEs. As part of the project output, this presentation will introduce the main data sources on DSEs and demonstrate the workflow to assess DSE availability over time. It will illustrate the role web archives play in the preservation of DSEs as well as highlight specific challenges DSEs present to web archiving. As DSEs are complex projects, featuring multiple layers of data, transcription and annotation, their full preservation usually includes ongoing maintenance of the often custom-build backend system. Once project funding ends, these structures are very prone to deterioration and loss. Besides ongoing maintenance, other preservation models exist, generally reducing the archiving scope in order to reduce the ongoing work required (Dillen 2019; Pierazzo 2019; Sahle and Kronenwett 2016). Such editions using compatible rather than bespoke solutions are more likely to be fully preserved. Other approaches include a “preservability by design” approach through minimal computing (Elwert n.d.) or standardization through existing services such as DARIAH or GitHub. The presentation will outline these models using examples of successful preservation as well as lost editions.

This presentation is part of the larger C21 Editions project, a three-year international collaboration jointly funded by the Arts & Humanities Research Council (AH/W001489/1) and Irish Research Council (IRC/W001489/1).



2:40pm - 3:00pm

Collecting and presenting complex digital publications

Ian Cooke, Giulia Carla Rossi

The British Library, United Kingdom

'Emerging Formats' is a term that is used by UK legal deposit libraries to describe experimental and innovative digital publications, for which there are no collection management solutions that can operate at scale. They are important to the libraries, and their users, as they document a period of creativity and rapid change, and often include authors and experiences that are less well represented in more mainstream publications, and are at high risk of loss. For six years, the UK legal deposit libraries have been working collaboratively and experimentally to both survey the types of publications, and to test approaches to collection that will support preservation, discovery and access. An important concept in this work has been 'contextual collecting', that seeks to preserve the best possible archival instance of a work, alongside information that documents how a work was created, and how it was experienced by users.

Web archiving has formed an important part of this work, both in providing practical tools to support collection management, including access, and also in supporting the collection of contextual information. An example of this can be seen in the New Media Writing Prize thematic collection https://www.webarchive.org.uk/en/ukwa/collection/2912

In this presentation, we will step back from specific examples, and talk about what we have learned so far from our work as a whole. We will outline how this work, including user research and engagement, has shaped policy at the British Library, through the creation of our 'Content Development Plan' for Emerging Formats, and the role of web archiving within that plan.

This presentation contributes to the Collections themes of 'blurring the boundaries between web archives and other born digital collections' and 'reuse of web archived materials for other born digital collections'. It builds on previous presentations to Web Archive Conference, which have focused on specific challenges related to collecting complex digital publications, to demonstrate how this research has informed the policy direction at the British Library and how web archiving infrastructure will be built in to efforts to collect, assess and make accessible new publications.



3:00pm - 3:20pm

What can web archiving history tell us about preservation risks?

Susanne van den Eijkel, Daniel Steinmeier

KB, National Library of the Netherlands

When people talk about the necessity of preservation, the first thing that comes to mind is the supposed risk of file format obsolescence. Within the preservation community there have been voices raising the concern that this might not be the most pressing risk. If we are actually solving the wrong problem, this means we neglect the real problem. Therefore, it is important to know that the solutions we create are solving demonstrably real problems. Web archiving could be a great source of information for researching the most urgent risks, because developments and standards on the web are very fluid. There are examples of file formats on the web, such as Flash, that are not supported anymore by modern browsers. However, these formats can still be rendered using widely available software. We have also seen that website owners migrated their content from Flash to HTML5. So, can we really say that obsolescence has resulted in loss of data? How can we find out more about this? And more importantly, can we find out which risks are actually more relevant?

At the National Library of the Netherlands, we have been working on building a web collection since 2007. By looking at a few historical webpages we will illustrate where to look for answers and how to formulate better preservation risks using source data and context information. At iPres2022 we have presented a short paper on the importance of context information for web collections. This information helps us in understanding the scope and the creation process of the archived website. In this presentation, we will demonstrate how we use this context information to search out sustainability risks for web collections. This will also give us insight into sustainability risks in general so we can create better informed preservation strategies.



3:20pm - 3:40pm

Towards an effective long-term preservation of the web. The case of the Publications Office of the EU

Corinne Frappart

Publications Office of the European Union, Luxembourg

Much is being written about web archiving in general where new, improving methods to capture the World Wide Web and to facilitate access to the resulting archives are constantly being described and shared. But when it comes to the long-term preservation of web sites, i.e. safeguarding the ARC/WARC files with a proper planning of preservation actions beyond simply bit preservation, literature is much less abundant.

The Publications Office of the EU is responsible for the preservation of the websites authored by the EU institutions. In addition to our activities in harvesting and making accessible the content through our public web archive (https://op.europa.eu/en/web/euwebarchive), we started to delve more deeply into the management of content preserved for the long-term.

Our reflection focused on long-term risks such as obsolescence or loss of file useability, and on the availability of a disaster recovery mechanism for the platform providing access to the web archive. Ingesting web archive files into a long-term preservation system raises many questions:

  • Should we expect different difficulties with ARC and WARC files? Is it worth migrating the ARC files to WARC files, and having a consistent collection on which the same tools can be applied?
  • Does ARC/WARC file compression impact the storage, the processing time, the preservation actions?
  • What is the best granularity for the preservation of web archive?
  • Should the characterization of the numerous files embedded in ARC/WARC files occur during or after ingestion? With which impact on the preservation actions?
  • How can descriptive, technical and provenance metadata be enriched, possibly automatically, and where can they be stored?
  • What kind of information about the context of the crawls, the format description and the data structure should be also preserved to help future users to understand the content of the ARC/WARC files?

To get some advice about all these questions and others, the Publications Office commissioned a study looking at published and grey literature, and supplemented by a series of interviews conducted with leading institutions in field of web archiving. This paper presents the findings and offers recommendations on how to answer the questions above.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: IIPC WAC 2023
Conference Software: ConfTool Pro 2.6.149
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany