Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.

 
 
Session Overview
Session
SESSION #05: Sustainability
Time:
Thursday, 10/Apr/2025:
10:05am - 11:15am

Session Chair: Bjarne Andersen, Royal Danish Library
Location: Store Auditorium (ground floor)

main entrance at street level

Show help for 'Increase or decrease the abstract text size'
Presentations
10:05am - 10:25am

42 Tips to Diminish the CO2 Impact of Websites

Tamara van Zwol2, Lotte Wijsman1, Jasper Snoeren3, Tineke van Heijst4

1National Archives of the Netherlands, Netherlands; 2Dutch Digital Heritage Network, Netherlands; 3Netherlands Institute for Sound and Vision, Netherlands; 4Van Heijst Information Consulting, Netherlands

The internet has become indispensable to modern life, yet its environmental impact is often overlooked. Despite terms like "virtual" and "cloud" suggesting a minimal footprint, the global internet is a significant energy consumer. In 2020, it accounted for approximately 4% of global energy consumption, and if usage trends persist, this figure could rise to 14% by 2040.

Archiving even a small number of websites contributes to the growing carbon footprint of digital archives, which compounds over time.

To address this, the Dutch Digital Heritage Network commissioned research to assess the CO2 impact of current websites across various heritage organizations. The study provided practical recommendations to reduce this impact, such as optimizing image sizes, employing green hosting, and streamlining unnecessary code. These strategies not only benefit the public-facing side of websites but also hold potential for the backend, such as in the harvesting process for archiving.

In our presentation, we will share these research findings and highlight actionable steps organizations can take to create more energy-efficient digital archives. Additionally, we will explore the question of what should be archived: Is every aspect of a website equally essential for long-term preservation?

Lastly, we are investigating incremental archiving as a solution to reduce both storage needs and emissions. This approach, which focuses on capturing specific updates rather than performing full harvests, offers a more sustainable alternative for digital preservation.



10:25am - 10:45am

Building Towards Environmentally Sustainable Web Archiving: The UK Government Web Archive and Beyond

Jane Winters1, Eirini Goudarouli2, Jake Bickford2

1University of London, United Kingdom; 2The National Archives (UK), United Kingdom

There is an urgent need for the fostering of more environmentally sustainable archival methods and approaches that place sustainability frameworks at the centre of archival practice, aiding archiving institutions in their ambitions to achieve Net Zero. This will involve sector-wide collaboration to develop new ways of working and the rethinking of long-established best practice in order to define and adopt ways of working that are ‘good enough’. The challenge is particularly urgent for born-digital archives, which form an increasingly significant (and rapidly growing) part of the archival record. Pendergrass et al. 2019 have argued for fundamental change in ‘practices for appraisal, permanence, and availability of digital content’ (p. 4), and the Digital Preservation Coalition has similarly called for a re-evaluation of all aspects of digital preservation (Kilbride 2023).

This paper will discuss one approach to the development of a framework for more environmentally sustainable web archiving, using the UK Government Web Archive as a case study. First, it will present the findings of a workshop on ‘Archives and the environment’, which was held at The UK National Archives in 2023. One of the main strands of discussion was the environmental cost both of creating and preserving born-digital and digitised archives and of the digital infrastructure, tools and methods used to analyse them. Recommendations arising from the event and subsequent report have informed an action plan for the UK Government Web Archive (UKGWA) as it begins to explore its environmental footprint.

The UKGWA action plan involves four main strands of work: establishing, as far as possible, the current environmental impact of the web archive, drawing on a range of metrics; identifying those aspects of the web archiving workflows that may be streamlined or redeveloped in order to reduce that impact; designing and prototyping new and more sustainable processes within the UKGWA; and producing recommendations for good practice that may be adopted and/or adapted by other national and international web archives. The planned research is concerned not just with environmentally sustainable practice within the UKGWA but also with Scope 3 carbon emissions (that is, emissions that are produced not by an organisation itself but by those for whom it is indirectly responsible, in this case users and suppliers).

The research is at an early stage, but we hope that the development of an extensible and customisable framework, accompanied by a toolkit that builds on the work of the Digital Humanities Climate Coalition, will provide an opportunity for wider collaboration. The work presented here is grounded in the experience and practice of the UK Government Web Archive, but it will benefit enormously from being placed in dialogue with the work of the IIPC and other national and institutional web archives concerned with the impact of climate change on digital archival practice and of digital archiving and preservation on climate change.

K. Pendergrass et al., ‘Toward environmentally sustainable digital preservation’, The American Archivist (2019), 82:1, 165-206

W. Kilbride, ‘The Anthropocene remembered: digital memory after the climate crisis’, Digital Preservation Coalition Blog (2019)



10:45am - 11:05am

Preservation of Historical Data: Using Warchaeology to Process 20 Years of Harvesting

Andreas Børsheim, Marius André Elsfjordstrand Beck

National Library of Norway, Norway

The National Library of Norway have been harvesting the internet since the beginning of the millennium, with a primary focus and priority on the collection and storage of data. Over 25 years, web harvesting methods and preservation systems have changed. Consequently, the collection is composed of various file types, including ARC, WARC, and files produced by NEDLIB[1].

In more recent years our focus has shifted towards access and quality assurance, and the need to include the older data has increased. But how do we utilize this data, which by now is poorly structured, has little to no documentation, and is hard to read by modern software?
In addition, the National Library of Norway is migrating to a new digital preservation system, so all of our data is expected to be moved, providing us an opportunity to clean, index and organize our collection.

To address and resolve these issues and move toward the ultimate goal of making the collections fully discoverable and available, the National Library of Norway developed an open-source tool, Warchaeology[2], capable of converting, validating and deduplicating web archive collections data.

This presentation will outline how we have used this tool to process 2PB of data, harvested since 2001. The objective is better management and preservation, including to identify collections and groupings of data, parse and sort metadata, identify formats and how these should be processed or converted, deduplicate files, and gather insight about collections generally.


We will talk about the challenges in deduping, converting, and maintaining large web archive collections, including infrastructural issues like securing sufficient storage space to complete the work. This will be a time-intensive process; we estimate several months will be required for shuffling files between storage solutions, converting and deduplicating our data. The goals of this work are a collection of data that is cleaner, smaller, easier to maintain, and, at the end of the day, accessible for our users.


[1] https://web.archive.org/web/20040604032621/http://www.kb.nl/coop/nedlib/
[2] https://github.com/nlnwa/warchaeology/



11:05am - 11:10am

Analysing the Publications Office of the European Union Web Archive for the Rationalisation of Digital Content Generation

Alexandre Angers

Publications Office of the European Union, Luxembourg

More and more information from EU institutions, bodies and agencies is only made available on their public websites. However, web content often has a short lifespan, and this information is at risk of getting lost when websites are updated, substantially redesigned or taken offline. As part of its different preservation activities, the Publications Office of the EU crawls, curates and preserves the content and design of these websites, making them available for current and future generations. We also prepare an ingestion of this collection into our digital archive, to ensure its long-term preservation.

We have recently performed a full export of the most recent crawls from our web archive collection, spanning from March 2019 to September 2024, as a set of WARC files. We have extracted relevant information regarding all the “response” and “revisit” records in the collection and inserted it into a relational database, allowing efficient custom analyses. In this presentation, we will show various interesting statistics we have generated about the content of our web archive. These include the analysis of large response payloads (more than 100 Mb), as well as the relative footprint of crawled video files. We also investigate the amount of duplication of records - those that were avoided through ‘revisit’ records, as well as duplicate ‘response’ records is still present in the archive.

We also explain how we have used this information to refine our crawling strategies in order to rationalise our digital content generation going forward. We also define potential policies to curate the existing archive prior to ingestion in a long-term digital repository, where the impact on the carbon footprint may be even more significant.