Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.
|
Session Overview |
| Session | ||
SHORT TALKS
| ||
| Presentations | ||
9:20am - 9:30am
Environmentally-friendly digital preservation policies and infrastructure at the National Library of Norway National Library of Norway, Norway The National Library of Norway has been a certified environmental “lighthouse” organization since 2015, indicating that it complies with a set number of environmentally-friendly criteria. This has required the library to implement and sustain many environmentally-friendly policies, including several related to digital preservation and storage, that may be of interest to the international community. One core aspect of this work is energy efficiency. The library’s digital collections currently total more than 18 petabytes of data. This data is regularly checked for bit rot and is preserved using the 3-2-1 standard of digital preservation, wherein we preserve 3 copies of each file, on 2 different storage technologies, including 1 file copy stored at a different geographical location. To reduce our energy use in this work, the library uses an energy-efficient technology for our disk systems, called MAID (Massive Array of Idle Disks). This storage technology reduces power consumption by only allowing disks to spin when they are in active use, so that most hard drives are kept inactive and turned off to save energy and extend their lifespan. Although it affects application performance during data access, MAID is effective for storing data that is rarely used, such as archival data that does not change and is rarely accessed. This provides an almost 60% energy savings. Another aspect of the library’s sustainable data storage practices focuses on data minimization. The library stores material in filetypes that meet international standards and that can also be compressed to reduce the total volume of information we store, such as the JPEG2000 file format. Our data is also stored in what is often referred to as a “cold climate” data storage facility. The northern location of the National Library is based in Mo i Rana, a city 30 kilometers south of the arctic circle. The storage facilities are built into the side of Mofjellet mountain. For seven months of the year, the monthly average temperature is below 0 degrees Celsius. This stable, even, cold climate requires less energy to keep the storage servers cool. Finally, the library uses 98% renewable energy sources, including from wind and hydroelectric sources, to maintain this infrastructure. There are still more measures the library can take to improve sustainability in our operations. For example, soon we plan to further optimize our energy use by recycling the heat from the data center to warm buildings. Another area of improvement is our file degradation systems, which are not as efficient as they could be. We use Checksum technology to check for bit rot. All preserved files are assigned a checksum, or fingerprint. To handle checksums, computing power is needed every time a check is run and to confirm that a file has not changed. We compare the stored checksum against the calculated checksum for a file each time it is retrieved from our digital preservation system, but this is processing that could be avoided if we used technology that more effectively maintained the integrity of a file. 9:30am - 9:38am
Environmental Issues on the Web: Building and Promoting a Thematic Archive National Library of France, France In 2020, our institution took part in the Climate Change IIPC Collaborative collection and drew inspiration from this initiative to set up its own collection on environmental issues. We felt it was essential to include these major issues for our contemporary society in our collections. That is why, since 2020, we have been launching an annual collection entitled ‘Environmental Issues’. The aim of this collection is to highlight expressions, reactions, actions, representations or reflections relating to environmental issues on the internet. It comprises eight themes, in order to cover the multiple aspects of these issues (scientific, economic, artistic, etc.) as well as the different types of website producers. It currently has more than 800 selections made internally by librarians, as well as by partner libraries in the regions. In this lightning talk, we would like to present this collaborative collection on a national scale, as well as the various initiatives implemented to promote it to the public. We have published in December 2023 a thematic and edited selection of archived pages (also known as “guided tour”) about “The environment on the web”. This tour is divided into 14 themes such as “Issues, Concepts and Theories”, “Biodiversity and Species Extinction”, “Urban Planning and Land Use”, and “Everyday Citizen Action”. As our collections can only be accessed within the research rooms of our library, we have also published on our website the seeds list of this collection as well as a version of the tour with screenshots, for which we asked the websites owners' authorizations. This collection and its promotion are a good example of how we build and develop a thematic collection in our library and how we can help the public to better understand the challenges posed by climate change. 9:38am - 9:46am
Storing URLs, targets, and other time-varying entities in a database as a path to sustainable recordkeeping Hungarian National Museum Public Collection Centre National Széchényi Library, Hungary A recurring problem with mass web archiving, e.g., at the top-domain level, is how to record the targeted content and the changes in the associated URL(s) over time. This issue is related to seed list maintenance, as in the case of larger harvests, it is necessary to exclude websites that were previously saved but are no longer functional, meaning that there is no longer any content behind a given URL, or it no longer belongs to that website. The lightning talk presents a flexible concept that can be used to manage the relationships between URLs of different structures (with or without http or https protocol, with or without www), their changes over time, and their connection to the website as an entity. The essence of the solution is an entity-based SQL database that is capable of recording all changes over time in a non-redundant manner by ensuring 3rd Normal Form (3NF). The main entities stored in the database, such as target and URL, are linked to each other, to themselves, and to tables containing information about them using junction tables. This solution ensures scalability, e.g., the information stored about each entity can be expanded arbitrarily, and the 'date_from' and 'date_to' fields in the junction tables can be used to record when the given relations were valid. Linking the entity tables to themselves allows us to link alternative URLs to each other in time, for example. The information stored about each entity allows for complex queries. For example, in the case of the target, the type (website, web page, file, etc.), or in the case of URLs, the status code is stored in a separate table. The junction tables also ensure that changes over time are recorded, so that, for example, it is possible to query which URL belonged to a given entity (e.g., a file on a website) during a given period. All this contributes greatly to sustainability, as it provides a much more economical, easier to use, and more flexible query solution than previous data storage methods, such as Google spreadsheets. 9:46am - 9:54am
Web archiving automation at the Mexico Digital Preservation Group: error assessment and quality control 1National Library of Mexico, Mexico; 2Digital Preservation Group, Mexico In Mexico, progress continues to be made in web archiving, which has become a fundamental strategy for preserving digital heritage, especially given the volatile and ephemeral nature of online content. In this context, the Digital Preservation Group of Mexico (GPD) has experimented with an automated web archiving system to capture, store, and preserve digital resources relevant to the country's collective memory. This study focuses on detecting errors during the capture processes and in the strategies applied to ensure the quality of the resulting archives. Using an empirical-applied approach, combining observation and experimentation to address practical problems, the automated tool Browsertrix (from Webrecorder) was used, along with systematic reviews of the files generated in WARC format. Twenty-four websites were captured in 2025, including catalogs, databases, and repositories. The analysis focused on the frequency, type, and cause of detected errors (e.g., broken links, missing sitemaps, uncaptured dynamic content, JavaScript issues, or multimedia format problems) and the effectiveness of the applied quality control mechanisms. The results reveal that while automation allows for a significant increase in archiving coverage, it also introduces considerable technical challenges, which we will discuss in the lightning talk. Recurring error patterns were identified, linked to highly dynamic sites with complex structures, highlighting the need for specialized configurations and iterative validation processes. The importance of establishing contextualized quality criteria, beyond purely technical parameters, is also discussed, integrating aspects of cultural, institutional, and legal relevance. The lightning talk concludes with a series of practical recommendations for similar projects in Latin American contexts, emphasizing the importance of a flexible technical infrastructure, automated monitoring capabilities, and a clear policy for collaborative digital preservation. This work contributes to the development of standards and best practices for institutional web archiving in the region, and opens the door to future research on automated curation and preservation of emerging content such as social networks, alternative media and ephemeral resources. 9:54am - 10:02am
Sustainable and systematic: building a search index of research and practice in web archiving and digital preservation 1Digital Preservation Coalition, United Kingdon; 2IIPC, United States of America; 3Cartlann Digital Services, Ireland Over the years, through events such as the IIPC Web Archiving Conference, iPRES - International Conference on Digital Preservation, and various collaborative projects, the digital preservation and web archiving communities have built an extensive repository of knowledge. However, a persistent challenge has been to provide a single, citable point of access to these dispersed resources. Our project introduces the Awesome Indexer1, which brings together digital preservation and web archiving resources into a single search interface and database. Our key argument is that centralised discovery is crucial for the long-term sustainability of these resources, encouraging reuse and investment in those resources rather than attempting to replace them. This tool works by accepting a range of standardised bookmark and bibliographic sources, such as Awesome Lists, Zotero2, and Zenodo collections. Zotero is a particularly powerful source, as the established tools and workflows around Zotero collection management make it easy to pull in records from a wide range of sources, from traditional publisher websites through to YouTube playlists and content hosted by digital libraries3. The Awesome Indexer combines the data from these sources to generate a dedicated faceted search system, built using off-the-shelf tools and packaged as a simple static website. It also creates SQLite and Apache Parquet versions of the same data, allowing richer exploration and analysis of the sources in the index. The Indexer is an open source tool that can be used by anyone to build their own index. This “work-in-progress” short talk will briefly trace the development of the Indexer, detailing the steps it required and the challenges posed by its underlying resources. The current version of the Digital Preservation Publications Index (DPPI) will be demonstrated to highlight how the Indexer consolidates decades of content from across multiple platforms into a single, comprehensive entry point. This significantly improves discoverability, facilitates citation, contributes to training, and maximises the impact of our collective knowledge for practitioners and researchers. References 3 This is an example of a web archiving collection hosted by the University of North Texas Digital Library https://digital.library.unt.edu/explore/partners/IIPC/ 10:02am - 10:10am
Querying the archived web with an AI assistant 1Aarhus University, Denmark; 2Macquarie University, Australia The archived web is a indescribably rich primary source for contemporary history. However, only a handfull of historians have started including the archived web as part of their source material when investigating phenomenons from the 1990's and 2000's (Mackinnon, 2022; Millward, 2025; Winters, 2017).This lightning talk presents exploratory work on exploring and discovering content from web archives through an *AI Research Assistant* and research questions from the discipline of history. 10:10am - 10:18am
Online annotation platform for web archives Arquivo.pt, Portugal Search engine evaluation relies heavily on high-quality test collections that reflect user information needs and relevance judgments. However, building such collections is resource-intensive, requiring systematic annotation of queries and results. The service is a web-based platform designed to streamline this process by enabling the annotation of search engine results in a user-friendly and collaborative environment. The tool allows assessors to annotate retrieved documents according to predefined relevance criteria, supporting the creation of standardized datasets for training, tuning, and benchmarking retrieval models. Our web archive is a research infrastructure that provides tools to preserve and exploit data from the web to meet the needs of scientists and ordinary citizens and our mission is to provide digital infrastructures to support the academic and scientific community. However, until now, our web archive has focused on collecting data from websites hosted under the .PT domain, which is not enough to guarantee the preservation of relevant content for the academic and scientific community. Our web archive provides a “Google-like” service that enables searching pages and images collected from the web since the 1990s. Notice that our web archive search complements live-web search engines because it enables temporal search over information that is no longer available online on its original websites. Developed within the context of our web archive, the service facilitates the generation of reliable ground truth data, while remaining adaptable to different domains and languages. By lowering the barriers to annotation, this platform contributes to the reproducibility, scalability, and improvement of search technologies. The main objective is to provide in the future a dataset with public access to support researchers.This contributes to comparing users’ search behavior between live-web and web-archive search engines. 10:18am - 10:26am
Warc School - fellowship & training program update 1College of Wooster Libraries, United States of America; 2Shift Collective Archiving the Black Web was founded on a commitment to create pathways for underrepresented voices and marginalized communities to access web archiving skills, knowledge, and networks. Our work addresses not only “ensuring equitable access to archived web content,” but also ensuring equitable access to who gets to participate in the practice of web archiving and what gets privileged to be part of a web archive collection. At IIPC WAC 2024, Archiving the Black Web shared details about our project’s efforts to reduce these disparities with the upcoming launch of our fellowship and training program, Warc School. Developed for memory workers dedicated to collecting and preserving Black history and culture online, the fellowship offers web archiving training to enhance their memory work or digital content creation practice. In April 2025, Warc School welcomed 22 fellows representing traditional archives, community-based archives, Historically Black Colleges and Universities, public libraries, and independent scholars and creators to complete our 10-month training program, which includes five courses and a practicum. In this session, join Archiving the Black Web for a brief update on lessons learned while developing a training program and its curriculum, recruiting fellows and faculty, as well as highlights from student practicum projects. Attendees will also hear about our new initiative to strengthen social sustainability, with details about the launch of our second cohort. This cohort will include fellowship opportunities not only for memory workers but also for journalists at Black newspapers interested in digital preservation through web archiving training. Information integrity and ethical considerations related to artificial intelligence will be incorporated into the 2026 Warc School curriculum. 10:26am - 10:34am
Organizing the 'Social Mess': a comprehensive Tool for Social Media and Instant Messaging Archiving 1University of Pavia, Italy; 2University of Bologna, Italy The exponential growth of digital content through social media and instant messaging platforms presents critical challenges for digital preservation. Born-digital communications—created in fragmented, proprietary environments where personal and public spheres overlap—remain excluded mainly from systematic archival practices despite their historical and cultural significance. Within the national archival context, there are no comprehensive tools to preserve and manage these materials for individuals, institutions, or public figures whose digital traces hold substantial value for future research. This gap affects personal archives of political and institutional figures and collections of broader cultural relevance. As part of a collaborative research initiative on preserving contemporary digital archives, we are developing a software tool for individual users and institutional archivists. This collaborative effort, which includes our professional experience, highlights an urgent need to address technical and methodological shortcomings in this field. Existing tools—typically command-line utilities or platform-specific applications—allow for the separate management of content from social media, messaging services, and email, etc., but do not provide integrated support within a unified solution. Our framework, in contrast, is comprehensive in its capacity to manage the complete spectrum of digital materials: traditional files alongside social media content, instant messages, and emails within a unified environment. This comprehensive approach addresses the complexity of contemporary digital archives. The software enables users to reorganize their materials systematically, making it valuable for a variety of contexts. Whether it's individuals managing personal digital heritage, prominent figures preparing materials for donation, or institutions controlling and facilitating access to collections. Our Java-based solution integrates core modules, ensuring usability and data integrity. Operating through manual download and ingest processes—not APIs—it provides user control while supporting standard formats (JSON, CSV) for interoperability. The embedded database and exclusive use of open-source libraries enable platform-independent installation without external dependencies. Key functionalities include AES-256 encryption, automatic backups, metadata extraction, device synchronization, and granular permissions. Critically, access settings apply at both file and individual message levels—essential for managing diverse privacy requirements and enabling selective disclosure within complex digital collections. Currently under active development, the project aims to support institutions in visualizing and managing heterogeneous digital materials, enhance accessibility for researchers through reorganization and categorization tools, and foster inter-institutional collaboration. This session will provide participants—particularly archivists and records managers—with an overview of a collaborative project and its outcomes, highlighting an integrated approach that offers significant advancements for digital preservation practice and academic scholarship. 10:34am - 10:42am
Social media archiving, right now Digital Preservation Coalition, United Kingdom As funding cuts bite, some organisations have had to shut down offices and services at very short notice. These closures put history at risk, especially where social media is concerned. These interactions with their patrons and the wider public are a crucial part of the function of any modern organisation, and the content, comments and context are important historical records. These should not be lost simply because the funding has been pulled on short notice. Unfortunately, in situations like this, already cash-strapped archives are rapidly swamped, and are struggling to cope with the deluge of digital records and requests for assistance. The individuals with access to the social media accounts are often not the archivists themselves, nor have the archival or technical skill required to archive things alone. Short of time and resources, what should they do? And with little hope of booming budgets anytime soon, what are the most sustainable approaches for the safe keeping of these complex records? This presentation will present a wide range of lessons learned while attempting to assist organisations as they rush to capture what they can from Facebook, Instagram, LinkedIn, X/Twitter and Flickr. This investigation considered and experimented with a range of strategies, including direct web archiving, API access, third-party archiving services and data exports, combined with tools like Browsertrix, ArchiveBox and wget. The advantages and limitations of these approaches will be explored and compared, highlighting the gaps between what is possible and what is practical, in the context of an urgent shutdown operation. | ||
