Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.

 
 
Session Overview
Date: Tuesday, 08/Apr/2025
9:00am - 9:40amREGISTRATION: General Assembly (For IIPC members only)
9:40am - 9:50amOpening Remarks
Location: Målstova (upstairs)
9:50am - 10:00amChair Address
Location: Målstova (upstairs)
10:00am - 10:45amIIPC Strategic Plan 2026-2030
Location: Målstova (upstairs)
10:45am - 11:15amBREAK
Location: Folkestova (upstairs)
If you signed up for a guided exhibition tour, please be in the exhibition room at 10:45. To know if you signed up for a tour, check your registration details in ConfTool.
11:15am - 12:45pmFramework for Tools Sustainability
Location: Målstova (upstairs)
11:15am - 12:45pmContent Development Working Group Meeting
Location: Slottsbiblioteket (ground floor)
11:15am - 12:45pmTBC
Location: VIP - rommet (upstairs)
12:45pm - 2:00pmLUNCH
Location: CREDO Restaurant | Kantine (downstairs)
If you signed up for a guided exhibition tour, please be in the exhibition room at 12:50. To know if you signed up for a tour, check your registration details in ConfTool.
2:00pm - 3:30pmResearch Working Group Meeting
Location: Målstova (upstairs)
2:00pm - 3:30pmTraining Working Group Meeting
Location: Slottsbiblioteket (ground floor)
Actual session length: 60 minutes
2:00pm - 3:30pmTBC
Location: VIP - rommet (upstairs)
3:30pm - 4:00pmBREAK
Location: Folkestova (upstairs)
If you signed up for a guided exhibition tour, please be in the exhibition room at 3:30. To know if you signed up for a tour, check your registration details in ConfTool.
4:00pm - 5:30pmCrawling National Domain: Towards Best Practices
Location: Målstova (upstairs)
4:00pm - 5:30pmTWG WORKSHOP: Case Studies ‘Write-a-thon’ - Documenting Best Practices
Location: Slottsbiblioteket (ground floor)
4:00pm - 5:30pmTBC
Location: VIP - rommet (upstairs)
7:00pm - 9:00pmWELCOME RECEPTION
Location: Folkestova (upstairs)
[IIPC Members Only] Includes light refreshments and drinks. Attendees are encouraged to have dinner beforehand.
Date: Wednesday, 09/Apr/2025
9:00am - 9:40amREGISTRATION: Web Archiving Conference (WAC)
9:40am - 9:50amOpening Remarks
Location: Målstova (upstairs)
Streamed to Store Auditorium.
9:50am - 10:45amOpening Keynote: Libraries, Copyright, and Language Models
Location: Målstova (upstairs)
Session Chair: Andrew Jackson, Digital Preservation Coalition
Streamed to Store Auditorium.
10:45am - 10:55amSHORT BREAK
Streaming video from Målstova to Store Auditorium ends. Lightning Talk Session 2 will begin in the Store Auditorium after the break.
10:55am - 11:00amLIGHTNING TALK SESSION 1: INTRODUCTION
Location: Målstova (upstairs)
Session Chair: Ben Els, National Library of Luxembourg
10:55am - 11:00amLIGHTNING TALK SESSION 2: INTRODUCTION
Location: Store Auditorium (ground floor)
Session Chair: Sawood Alam, Internet Archive
11:00am - 11:25amLIGHTNING TALK SESSION 1
Location: Målstova (upstairs)
Session Chair: Ben Els, National Library of Luxembourg
11:00am - 11:25amLIGHTNING TALK SESSION 2
Location: Store Auditorium (ground floor)
Session Chair: Sawood Alam, Internet Archive
11:25am - 11:55amBREAK
Location: Folkestova (upstairs)
Participants in the 2025 Mentoring Program can meet at the top of the old granite stairs outside of Målstova. Sitting places are available in the cafeteria/bar (upstairs) and library hallways (upstairs and ground floor). If the weather is nice, there are also small parks immediately in front of and behind the National Library building.
11:55am - 1:00pmPANEL #01: Engaging Audiences
Location: Målstova (upstairs)
Session Chair: Eveline Vlassenroot, University of Ghent
11:55am - 1:00pmSESSION #01: Tools Under Construction: Lessons Learned (National Library Perspective)
Location: Store Auditorium (ground floor)
Session Chair: Katherine Boss, National Library of Norway
11:55am - 1:00pmWORKSHOP #01: Exploring Dilemmas in the Archiving of Legacy Webportals: An Exercise in Reflective Questioning
Location: Slottsbiblioteket (ground floor)
Since 2023 the National Library of the Netherlands (KBNL) is proud to curate a digital collection that has become UNESCO world heritage: the Digital City (De Digitale Stad, henceforth: DDS). Material belonging to this collection consists of an original freeze from 1996, as well as two student projects and miscellaneous material that was contributed by users and founders over the course of multiple events. The two student projects were the first attempt to revive the portal of DDS and store it as a disk image. The two groups of students used two methods for this reviving: one based on emulation, the other based on migration. But what choices were made during restoration and which version is more authentic? Furthermore, KBNL has several websites, scientific articles and newspaper clippings in its collections that might serve as context information. Do we consider this context information crucial for understanding DDS or do we rather leave users to find these resources by themselves if they are interested?

 

Even without considering the plethora of archival material that currently is DDS, the original portal already was a mixed bag of different protocols. Most of them are currently not mainstream anymore like IRC and Usenet newsgroups and were never part of DDS itself but only linked to. The portal also consisted of links to offsite websites not archived, like some of the users homepages or ‘houses’. The original hardware – not part of the collection - was running on proprietary software that is now thoroughly obsolete. There was a multi-user dungeon where users could program their own objects but this depended on real-time user interaction. Some of the functionality depended on live data which isn’t available anymore, like who was logged in. The original software was command-line and based on Freenet-software. Shortly after the initial launch an HTML-interface was introduced. Even then the command-line interface stayed available for less-privileged users. The navigation of the HTML-version relied heavily on image maps that require a binary executable to function correctly. From newspaper evidence we can gather that sometimes functionality wasn’t available or stopped working. There was both a general part of the portal and a personalized part based on login, the latter also containing email. There have also been cases of harmful or polarizing content being published in newsgroups. At the time the norm was self-regulation by the community and laissez-faire but time has moved on and our users may have come to expect a more active approach of regulation, or at least some form of acknowledgement, from us as heritage organizations.

 

As can be seen from this description, there is a lot of complexity when we consider archiving DDS and making it accessible to our users. We can think of a lot of difficult dilemmas when making decisions on what to archive and how to present it. Do we want users to experience how it is to create a homepage in DDS or do we want to present a historically correct picture of the homepages existing at the time? What should be considered part of the object and what part of the context? Is the migrated or the emulated version more authentic? What is more important, the privacy of the original users or providing full access to researchers? What do we consider belonging to DDS and what not? Only the HTML? Or also any news group material that might still be online but isn’t part of the archival material? Do users want a real authentic experience or rather a convenient way of viewing the content?

 

Even though DDS was a Dutch portal, it was based on software of the American Free-nets and inspired other cities in Europe and Asia. Therefore, we think this case might have a lot of recognizable features that also apply to the archiving of other legacy portals. Arguably, there are no right or wrong answers. They are typically dilemmas where multiple options have both benefits and drawbacks. In our workshop we want to present a couple of these real-world dilemmas to participants to stimulate discussion based on principles of reflective questioning and open dialogue. The idea is that we present a few cases related to DDS that participants can discuss in groups. Each group has to choose a preferred solution and present their reasoning to the group. People are encouraged to explore the reasons for choosing one or the other, for instance by reflecting on their own organizational context or personal assumptions regarding digital preservation. We try to stay away from providing clear cut answers or guidance but rather provide participants with the opportunity to explore these questions together. Participants will learn how to ask the right questions to delve deeper into their own reasoning process during decision making, based on our method of reflective questioning. Participants should be able to use this method and the cases presented to benefit their own curatorial decision making process regarding legacy webportals in their own collections. For KBNL, the group discussions may provide important community input and food for thought on some of the decisions we are going to be making regarding DDS in the near future.
1:00pm - 2:00pmLUNCH
Location: CREDO Restaurant | Kantine (downstairs)
If you signed up for a guided exhibition tour, please be in the exhibition room at 13:05. To know if you signed up for a tour, check your registration details in ConfTool.
2:05pm - 3:40pmSESSION #02: Crawling Tools
Location: Målstova (upstairs)
Session Chair: László Tóth, National Library of Luxembourg
2:05pm - 3:40pmSESSION #03: Advocacy & User Engagement
Location: Store Auditorium (ground floor)
Session Chair: Mark Phillips, University of North Texas Libraries
2:05pm - 3:40pmWORKSHOP #02: Web Archive Collections As Data
Location: Slottsbiblioteket (ground floor)
3:40pm - 4:10pmBREAK
Location: Folkestova (upstairs)
Participants in the 2025 Mentoring Program can meet at the top of the old granite stairs outside of Målstova. Sitting places are available in the cafeteria/bar (upstairs) and library hallways (upstairs and ground floor). If the weather is nice, there are also small parks immediately in front of and behind the National Library building.
4:10pm - 4:20pmPOSTER SLAM INTRO
Location: Målstova (upstairs)
Session Chair: Olga Holownia, IIPC
Streamed to Store Auditorium.
4:20pm - 4:40pmPOSTER SLAM
Location: Målstova (upstairs)
Session Chair: Olga Holownia, IIPC
Streamed to Store Auditorium.
4:40pm - 6:00pmPOSTER SESSION
Location: Folkestova (upstairs)
7:30pm - 9:30pmDINNER
Location: CREDO Restaurant | Kantine (downstairs)
Date: Thursday, 10/Apr/2025
9:00am - 9:20amMORNING COFFEE
Location: Folkestova (upstairs)
9:20am - 9:25amLIGHTNING TALK SESSION 3: INTRODUCTION
Location: Målstova (upstairs)
Session Chair: Helena Byrne, British Library
9:20am - 9:25amLIGHTNING TALK SESSION 4: INTRODUCTION
Location: Store Auditorium (ground floor)
Session Chair: Dorothée Benhamou-Suesser, National Library of France
9:25am - 9:55amLIGHTNING TALK SESSION 3
Location: Målstova (upstairs)
Session Chair: Helena Byrne, British Library
9:25am - 9:55amLIGHTNING TALK SESSION 4
Location: Store Auditorium (ground floor)
Session Chair: Dorothée Benhamou-Suesser, National Library of France
9:55am - 10:05amSHORT BREAK
10:05am - 11:15amSESSION #04: Discovery & Access (News/Newspapers)
Location: Målstova (upstairs)
Session Chair: Tita Enstad, National Library of Norway
10:05am - 11:15amSESSION #05: Sustainability
Location: Store Auditorium (ground floor)
Session Chair: Bjarne Andersen, Royal Danish Library
10:05am - 11:15amWORKSHOP #03: Introduction to Web Graphs
Location: Slottsbiblioteket (ground floor)

The workshop will begin with a brief introduction to the concept of the webgraph or hyperlink graph - a directed graph whose nodes correspond to web pages and whose edges correspond to hyperlinks from one web page to another. We will also look at aggregations of the page-level webgraph at the level of Internet hosts or pay-level domains. The host-level and domain-level graphs are at least an order of magnitude smaller than the original page-level graph, which makes them easier to study.

 

To represent and process webgraphs, we utilize the WebGraph framework, which was developed at the Laboratory of Web Algorithms (LAW) of the University of Milano. As a "framework for graph compression aimed at studying web graphs," it allows very large webgraphs to be stored and accessed efficiently. Even on a laptop computer, it's possible to store and explore a graph with 100 million nodes and more than 1 billion edges. The WebGraph framework is also used to compress other types of graphs, such as social network graphs or software dependency graphs. In addition, the framework and related software projects include tools for the analysis of web graphs and the computation of their statistical and topological properties. The WebGraph framework implements a number of graph algorithms, including PageRank and other centrality measures. It is an open-source Java project, but a re-implementation in the Rust language has recently been released. Over the past two decades, the WebGraph format has been widely used by researchers, for example those at LAW or Web Data Commons, to distribute graph dumps. It has also been used by open data initiatives, including the Common Crawl Foundation and the Software Heritage project.

 

The workshop focuses on interactive exploration of one of the precompiled and publicly available webgraphs. We look at graph properties and metrics, learn how to map node identifiers (just numbers) and node labels (URLs), and compute the shortest path between two nodes. We also show how to detect "cliques", i.e. densely connected subgraphs, or how to run PageRank and related centrality algorithms to rank the nodes of our graph. We share our experiments on how these applications are used for collection curation: how cliques can be used to discover sites with content in a regional language, how link spam is detected or how global domain ranks are used to select a representative sample of websites. Finally, we will build a small webgraph from scratch using crawl data.

 

Participants will learn how to explore webgraphs (even large ones) in an interactive way and learn how graphs can be used to curate collections. Basic programming skills and basic knowledge of the Java programming language are a plus but not required. Since this is an interactive workshop, attendees should bring their own laptops, preferably with the Java 11 (or higher) JDK and Maven installed. Nevertheless, it will be possible to follow the steps and explanations without having to type them into a laptop. We will provide download and installation instructions, as well as all teaching materials, prior to the workshop.

11:15am - 11:45amBREAK
Location: Folkestova (upstairs)
11:45am - 1:15pmPANEL #02: Cross-Institutional Collaborations
Location: Målstova (upstairs)
Session Chair: Abbie Grotke, Library of Congress
11:45am - 1:15pmSESSION #06: Curating Social Media
Location: Store Auditorium (ground floor)
Session Chair: Tom Smyth, Library and Archives Canada
11:45am - 1:15pmWORKSHOP #04: How to Develop a New Browsertrix Behavior
Location: Slottsbiblioteket (ground floor)

Behaviors are a key part of Browsertrix and Browsertrix Crawler, as they make it possible to automatically have the crawler browsers take certain actions on web pages to help capture important content. This tutorial will walk attendees through the process of creating a new behavior and using it with Browsertrix Crawler.

 

Browsertrix Crawler includes a suite of standard behaviors, including auto-scrolling pages, auto-playing videos, and capturing posts and comments on particular social media sites. By default, all of the standard set of behaviors are enabled for each crawl. Users have the ability to instead disable behaviors entirely or select only a subset of the standard set of behaviors to use on a crawl.

 

At times, users may need additional custom behaviors to navigate and interact with a site in specific ways automatically during crawling if they want the resulting web archive and replay to reflect the full experience of the live site. For instance, a new behavior could click on interactive buttons in a particular order, “drive” interactive components on a page, or open up posts sequentially on a new social media site and load comments.

 

This tutorial will walk through the process of creating a new behavior step by step, using the existing written tutorial for creating new behaviors on GitHub as a model. In addition to demonstrating how to write a behavior’s code (using JavaScript), the tutorial will also discuss how to know when a behavior is the appropriate solution for a given crawling problem, how to test behaviors during development, how to use custom behaviors with Browsertrix Crawler running locally in Docker, and finally how to use custom behaviors from the Browsertrix web interface (a feature that is currently planned and will be completed by the conference date).

 

Participants will not be expected to write any code or follow along on their own laptops in real time during the tutorial. The purpose is instead to demonstrate how one would approach developing a new behavior, lower the barrier to entry for developers and practitioners who may be interested in doing so, and to give attendees the opportunity to ask questions of Webrecorder developers in real time. We would additionally love to foster a conversation about how to develop a community library of available behaviors moving forward to make it easier than ever for users to find and use behaviors that meet their needs.

 

The tutorial will be led by Ilya Kreymer and Tessa Walsh, developers at Webrecorder with intimate knowledge of the Browsertrix ecosystem. The target audience is technically-minded web archiving practitioners and developers - in other words, people who could either themselves write new custom behaviors or communicate the salient points to developers at their institutions. Because this is not a hackathon-style workshop, the tutorial could have as many participants as the venue allows. By the conclusion of the tutorial, attendees should understand the concept of how Browsertrix Behaviors work, when developing a new behavior is a good solution to their problems, the steps involved in developing and testing a new behavior, and where to find additional resources to help them along the way. Our hope is to foster a decentralized community of practice around behaviors to the entire IIPC community’s benefit.

1:15pm - 2:15pmLUNCH
Location: CREDO Restaurant | Kantine (downstairs)
If you signed up for a guided exhibition tour, please be in the exhibition room at 13:20. To know if you signed up for a tour, check your registration details in ConfTool.
2:15pm - 3:40pmSESSION #07: Research & Access
Location: Målstova (upstairs)
Session Chair: Marie Roald, National Library of Norway
2:15pm - 3:40pmSESSION #08: Handling What You Captured
Location: Store Auditorium (ground floor)
Session Chair: Meghan Lyon, Library of Congress
2:15pm - 3:40pmPANEL #03: Cross-Institutional Collaboration: the End of Term Archive
Location: Slottsbiblioteket (ground floor)
Session Chair: Jeffrey van der Hoeven, National Library of the Netherlands (KB)
3:40pm - 4:10pmBREAK
Location: Folkestova (upstairs)
4:10pm - 5:05pmClosing Keynote: Quantifying Complexity: Using Web Data to Decode Online Public Debate
Location: Målstova (upstairs)
Session Chair: Jon Carlstedt Tønnessen, National Library of Norway
Streamed to Store Auditorium.
5:05pm - 5:30pmClosing Remarks: Closing Remarks
Location: Målstova (upstairs)
Streamed to Store Auditorium.

 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: IIPC WAC 2025
Conference Software: ConfTool Pro 2.6.154
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany