JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at events@netpreserve.org.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.

Only Sessions at Date / Time

Session Overview

Date: Tuesday, 08/Apr/2025

9:00am - 9:40am

REGISTRATION: General Assembly (For IIPC members only)

9:40am - 9:50am

Opening Remarks
Location: Målstova (upstairs)

9:50am - 10:00am

Chair Address
Location: Målstova (upstairs)

10:00am - 10:45am

IIPC Strategic Plan 2026-2030
Location: Målstova (upstairs)

10:45am - 11:15am

BREAK
Location: Folkestova (upstairs)

If you signed up for a guided exhibition tour, please be in the exhibition room at 10:45. To know if you signed up for a tour, check your registration details in ConfTool.

11:15am - 12:45pm

Framework for Tools Sustainability
Location: Målstova (upstairs)

11:15am - 12:45pm

Content Development Working Group Meeting
Location: Slottsbiblioteket (ground floor)

11:15am - 12:45pm

TBC
Location: VIP - rommet (upstairs)

12:45pm - 2:00pm

LUNCH
Location: CREDO Restaurant | Kantine (downstairs)

If you signed up for a guided exhibition tour, please be in the exhibition room at 12:50. To know if you signed up for a tour, check your registration details in ConfTool.

2:00pm - 3:30pm

Research Working Group Meeting
Location: Målstova (upstairs)

2:00pm - 3:30pm

Training Working Group Meeting
Location: Slottsbiblioteket (ground floor)

Actual session length: 60 minutes

2:00pm - 3:30pm

TBC
Location: VIP - rommet (upstairs)

3:30pm - 4:00pm

BREAK
Location: Folkestova (upstairs)

If you signed up for a guided exhibition tour, please be in the exhibition room at 3:30. To know if you signed up for a tour, check your registration details in ConfTool.

4:00pm - 5:30pm

Crawling National Domain: Towards Best Practices
Location: Målstova (upstairs)

4:00pm - 5:30pm

TWG WORKSHOP: Case Studies ‘Write-a-thon’ - Documenting Best Practices
Location: Slottsbiblioteket (ground floor)

4:00pm - 5:30pm

TBC
Location: VIP - rommet (upstairs)

7:00pm - 9:00pm

WELCOME RECEPTION
Location: Folkestova (upstairs)

[IIPC Members Only] Includes light refreshments and drinks. Attendees are encouraged to have dinner beforehand.

Date: Wednesday, 09/Apr/2025

9:00am - 9:40am

REGISTRATION: Web Archiving Conference (WAC)

9:40am - 9:50am

Opening Remarks
Location: Målstova (upstairs)

Streamed to Store Auditorium.

9:50am - 10:45am

Opening Keynote: Libraries, Copyright, and Language Models
Location: Målstova (upstairs)
Session Chair: Andrew Jackson, Digital Preservation Coalition

Streamed to Store Auditorium.

10:45am - 10:55am

SHORT BREAK

Streaming video from Målstova to Store Auditorium ends. Lightning Talk Session 2 will begin in the Store Auditorium after the break.

10:55am - 11:00am

LIGHTNING TALK SESSION 1: INTRODUCTION
Location: Målstova (upstairs)
Session Chair: Ben Els, National Library of Luxembourg

10:55am - 11:00am

LIGHTNING TALK SESSION 2: INTRODUCTION
Location: Store Auditorium (ground floor)
Session Chair: Sawood Alam, Internet Archive

11:00am - 11:25am

LIGHTNING TALK SESSION 1
Location: Målstova (upstairs)
Session Chair: Ben Els, National Library of Luxembourg

11:00am - 11:25am

LIGHTNING TALK SESSION 2
Location: Store Auditorium (ground floor)
Session Chair: Sawood Alam, Internet Archive

11:25am - 11:55am

BREAK
Location: Folkestova (upstairs)

Participants in the 2025 Mentoring Program can meet at the top of the old granite stairs outside of Målstova. Sitting places are available in the cafeteria/bar (upstairs) and library hallways (upstairs and ground floor). If the weather is nice, there are also small parks immediately in front of and behind the National Library building.

11:55am - 1:00pm

PANEL #01: Engaging Audiences
Location: Målstova (upstairs)
Session Chair: Eveline Vlassenroot, University of Ghent

11:55am - 1:00pm

SESSION #01: Tools Under Construction: Lessons Learned (National Library Perspective)
Location: Store Auditorium (ground floor)
Session Chair: Katherine Boss, National Library of Norway

11:55am - 1:00pm

WORKSHOP #01: Exploring Dilemmas in the Archiving of Legacy Webportals: An Exercise in Reflective Questioning
Location: Slottsbiblioteket (ground floor)

Since 2023 the National Library of the Netherlands (KBNL) is proud to curate a digital collection that has become UNESCO world heritage: the Digital City (De Digitale Stad, henceforth: DDS). Material belonging to this collection consists of an original freeze from 1996, as well as two student projects and miscellaneous material that was contributed by users and founders over the course of multiple events. The two student projects were the first attempt to revive the portal of DDS and store it as a disk image. The two groups of students used two methods for this reviving: one based on emulation, the other based on migration. But what choices were made during restoration and which version is more authentic? Furthermore, KBNL has several websites, scientific articles and newspaper clippings in its collections that might serve as context information. Do we consider this context information crucial for understanding DDS or do we rather leave users to find these resources by themselves if they are interested?

Even without considering the plethora of archival material that currently is DDS, the original portal already was a mixed bag of different protocols. Most of them are currently not mainstream anymore like IRC and Usenet newsgroups and were never part of DDS itself but only linked to. The portal also consisted of links to offsite websites not archived, like some of the users homepages or ‘houses’. The original hardware – not part of the collection - was running on proprietary software that is now thoroughly obsolete. There was a multi-user dungeon where users could program their own objects but this depended on real-time user interaction. Some of the functionality depended on live data which isn’t available anymore, like who was logged in. The original software was command-line and based on Freenet-software. Shortly after the initial launch an HTML-interface was introduced. Even then the command-line interface stayed available for less-privileged users. The navigation of the HTML-version relied heavily on image maps that require a binary executable to function correctly. From newspaper evidence we can gather that sometimes functionality wasn’t available or stopped working. There was both a general part of the portal and a personalized part based on login, the latter also containing email. There have also been cases of harmful or polarizing content being published in newsgroups. At the time the norm was self-regulation by the community and laissez-faire but time has moved on and our users may have come to expect a more active approach of regulation, or at least some form of acknowledgement, from us as heritage organizations.

As can be seen from this description, there is a lot of complexity when we consider archiving DDS and making it accessible to our users. We can think of a lot of difficult dilemmas when making decisions on what to archive and how to present it. Do we want users to experience how it is to create a homepage in DDS or do we want to present a historically correct picture of the homepages existing at the time? What should be considered part of the object and what part of the context? Is the migrated or the emulated version more authentic? What is more important, the privacy of the original users or providing full access to researchers? What do we consider belonging to DDS and what not? Only the HTML? Or also any news group material that might still be online but isn’t part of the archival material? Do users want a real authentic experience or rather a convenient way of viewing the content?

Even though DDS was a Dutch portal, it was based on software of the American Free-nets and inspired other cities in Europe and Asia. Therefore, we think this case might have a lot of recognizable features that also apply to the archiving of other legacy portals. Arguably, there are no right or wrong answers. They are typically dilemmas where multiple options have both benefits and drawbacks. In our workshop we want to present a couple of these real-world dilemmas to participants to stimulate discussion based on principles of reflective questioning and open dialogue. The idea is that we present a few cases related to DDS that participants can discuss in groups. Each group has to choose a preferred solution and present their reasoning to the group. People are encouraged to explore the reasons for choosing one or the other, for instance by reflecting on their own organizational context or personal assumptions regarding digital preservation. We try to stay away from providing clear cut answers or guidance but rather provide participants with the opportunity to explore these questions together. Participants will learn how to ask the right questions to delve deeper into their own reasoning process during decision making, based on our method of reflective questioning. Participants should be able to use this method and the cases presented to benefit their own curatorial decision making process regarding legacy webportals in their own collections. For KBNL, the group discussions may provide important community input and food for thought on some of the decisions we are going to be making regarding DDS in the near future.

1:00pm - 2:00pm

LUNCH
Location: CREDO Restaurant | Kantine (downstairs)

If you signed up for a guided exhibition tour, please be in the exhibition room at 13:05. To know if you signed up for a tour, check your registration details in ConfTool.

2:05pm - 3:40pm

SESSION #02: Crawling Tools
Location: Målstova (upstairs)
Session Chair: László Tóth, National Library of Luxembourg

2:05pm - 3:40pm

SESSION #03: Advocacy & User Engagement
Location: Store Auditorium (ground floor)
Session Chair: Mark Phillips, University of North Texas Libraries

2:05pm - 3:40pm

WORKSHOP #02: Web Archive Collections As Data
Location: Slottsbiblioteket (ground floor)

3:40pm - 4:10pm

BREAK
Location: Folkestova (upstairs)

4:10pm - 4:20pm

POSTER SLAM INTRO
Location: Målstova (upstairs)
Session Chair: Olga Holownia, IIPC

Streamed to Store Auditorium.

4:20pm - 4:40pm

POSTER SLAM
Location: Målstova (upstairs)
Session Chair: Olga Holownia, IIPC

Streamed to Store Auditorium.

4:40pm - 6:00pm

POSTER SESSION
Location: Folkestova (upstairs)

7:30pm - 9:30pm

DINNER
Location: CREDO Restaurant | Kantine (downstairs)

Date: Thursday, 10/Apr/2025

9:00am - 9:20am

MORNING COFFEE
Location: Folkestova (upstairs)

9:20am - 9:25am

LIGHTNING TALK SESSION 3: INTRODUCTION
Location: Målstova (upstairs)
Session Chair: Helena Byrne, British Library

9:20am - 9:25am

LIGHTNING TALK SESSION 4: INTRODUCTION
Location: Store Auditorium (ground floor)
Session Chair: Dorothée Benhamou-Suesser, National Library of France

9:25am - 9:55am

LIGHTNING TALK SESSION 3
Location: Målstova (upstairs)
Session Chair: Helena Byrne, British Library

9:25am - 9:55am

LIGHTNING TALK SESSION 4
Location: Store Auditorium (ground floor)
Session Chair: Dorothée Benhamou-Suesser, National Library of France

9:55am - 10:05am

SHORT BREAK

10:05am - 11:15am

SESSION #04: Discovery & Access (News/Newspapers)
Location: Målstova (upstairs)
Session Chair: Tita Enstad, National Library of Norway

10:05am - 11:15am

SESSION #05: Sustainability
Location: Store Auditorium (ground floor)
Session Chair: Bjarne Andersen, Royal Danish Library

10:05am - 11:15am

WORKSHOP #03: Introduction to Web Graphs
Location: Slottsbiblioteket (ground floor)

The workshop will begin with a brief introduction to the concept of the webgraph or hyperlink graph - a directed graph whose nodes correspond to web pages and whose edges correspond to hyperlinks from one web page to another. We will also look at aggregations of the page-level webgraph at the level of Internet hosts or pay-level domains. The host-level and domain-level graphs are at least an order of magnitude smaller than the original page-level graph, which makes them easier to study.

To represent and process webgraphs, we utilize the WebGraph framework, which was developed at the Laboratory of Web Algorithms (LAW) of the University of Milano. As a "framework for graph compression aimed at studying web graphs," it allows very large webgraphs to be stored and accessed efficiently. Even on a laptop computer, it's possible to store and explore a graph with 100 million nodes and more than 1 billion edges. The WebGraph framework is also used to compress other types of graphs, such as social network graphs or software dependency graphs. In addition, the framework and related software projects include tools for the analysis of web graphs and the computation of their statistical and topological properties. The WebGraph framework implements a number of graph algorithms, including PageRank and other centrality measures. It is an open-source Java project, but a re-implementation in the Rust language has recently been released. Over the past two decades, the WebGraph format has been widely used by researchers, for example those at LAW or Web Data Commons, to distribute graph dumps. It has also been used by open data initiatives, including the Common Crawl Foundation and the Software Heritage project.

The workshop focuses on interactive exploration of one of the precompiled and publicly available webgraphs. We look at graph properties and metrics, learn how to map node identifiers (just numbers) and node labels (URLs), and compute the shortest path between two nodes. We also show how to detect "cliques", i.e. densely connected subgraphs, or how to run PageRank and related centrality algorithms to rank the nodes of our graph. We share our experiments on how these applications are used for collection curation: how cliques can be used to discover sites with content in a regional language, how link spam is detected or how global domain ranks are used to select a representative sample of websites. Finally, we will build a small webgraph from scratch using crawl data.

Participants will learn how to explore webgraphs (even large ones) in an interactive way and learn how graphs can be used to curate collections. Basic programming skills and basic knowledge of the Java programming language are a plus but not required. Since this is an interactive workshop, attendees should bring their own laptops, preferably with the Java 11 (or higher) JDK and Maven installed. Nevertheless, it will be possible to follow the steps and explanations without having to type them into a laptop. We will provide download and installation instructions, as well as all teaching materials, prior to the workshop.

11:15am - 11:45am

BREAK
Location: Folkestova (upstairs)

11:45am - 1:15pm

PANEL #02: Cross-Institutional Collaborations
Location: Målstova (upstairs)
Session Chair: Abbie Grotke, Library of Congress

11:45am - 1:15pm

SESSION #06: Curating Social Media
Location: Store Auditorium (ground floor)
Session Chair: Tom Smyth, Library and Archives Canada

11:45am - 1:15pm

WORKSHOP #04: How to Develop a New Browsertrix Behavior
Location: Slottsbiblioteket (ground floor)

Behaviors are a key part of Browsertrix and Browsertrix Crawler, as they make it possible to automatically have the crawler browsers take certain actions on web pages to help capture important content. This tutorial will walk attendees through the process of creating a new behavior and using it with Browsertrix Crawler.

Browsertrix Crawler includes a suite of standard behaviors, including auto-scrolling pages, auto-playing videos, and capturing posts and comments on particular social media sites. By default, all of the standard set of behaviors are enabled for each crawl. Users have the ability to instead disable behaviors entirely or select only a subset of the standard set of behaviors to use on a crawl.

At times, users may need additional custom behaviors to navigate and interact with a site in specific ways automatically during crawling if they want the resulting web archive and replay to reflect the full experience of the live site. For instance, a new behavior could click on interactive buttons in a particular order, “drive” interactive components on a page, or open up posts sequentially on a new social media site and load comments.

This tutorial will walk through the process of creating a new behavior step by step, using the existing written tutorial for creating new behaviors on GitHub as a model. In addition to demonstrating how to write a behavior’s code (using JavaScript), the tutorial will also discuss how to know when a behavior is the appropriate solution for a given crawling problem, how to test behaviors during development, how to use custom behaviors with Browsertrix Crawler running locally in Docker, and finally how to use custom behaviors from the Browsertrix web interface (a feature that is currently planned and will be completed by the conference date).

Participants will not be expected to write any code or follow along on their own laptops in real time during the tutorial. The purpose is instead to demonstrate how one would approach developing a new behavior, lower the barrier to entry for developers and practitioners who may be interested in doing so, and to give attendees the opportunity to ask questions of Webrecorder developers in real time. We would additionally love to foster a conversation about how to develop a community library of available behaviors moving forward to make it easier than ever for users to find and use behaviors that meet their needs.

The tutorial will be led by Ilya Kreymer and Tessa Walsh, developers at Webrecorder with intimate knowledge of the Browsertrix ecosystem. The target audience is technically-minded web archiving practitioners and developers - in other words, people who could either themselves write new custom behaviors or communicate the salient points to developers at their institutions. Because this is not a hackathon-style workshop, the tutorial could have as many participants as the venue allows. By the conclusion of the tutorial, attendees should understand the concept of how Browsertrix Behaviors work, when developing a new behavior is a good solution to their problems, the steps involved in developing and testing a new behavior, and where to find additional resources to help them along the way. Our hope is to foster a decentralized community of practice around behaviors to the entire IIPC community’s benefit.

1:15pm - 2:15pm

LUNCH
Location: CREDO Restaurant | Kantine (downstairs)

If you signed up for a guided exhibition tour, please be in the exhibition room at 13:20. To know if you signed up for a tour, check your registration details in ConfTool.

2:15pm - 3:40pm

SESSION #07: Research & Access
Location: Målstova (upstairs)
Session Chair: Marie Roald, National Library of Norway

2:15pm - 3:40pm

SESSION #08: Handling What You Captured
Location: Store Auditorium (ground floor)
Session Chair: Meghan Lyon, Library of Congress

2:15pm - 3:40pm

PANEL #03: Cross-Institutional Collaboration: the End of Term Archive
Location: Slottsbiblioteket (ground floor)
Session Chair: Jeffrey van der Hoeven, National Library of the Netherlands (KB)

3:40pm - 4:10pm

BREAK
Location: Folkestova (upstairs)

4:10pm - 5:05pm

Closing Keynote: Quantifying Complexity: Using Web Data to Decode Online Public Debate
Location: Målstova (upstairs)
Session Chair: Jon Carlstedt Tønnessen, National Library of Norway

Streamed to Store Auditorium.

5:05pm - 5:30pm

Closing Remarks: Closing Remarks
Location: Målstova (upstairs)

Streamed to Store Auditorium.