Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.
|
Session Overview |
Date: Tuesday, 08/Apr/2025 | |
9:00am - 9:40am | REGISTRATION: General Assembly (For IIPC members only) |
9:40am - 9:50am | Opening Remarks Location: Målstova (upstairs) |
9:50am - 10:00am | Chair Address Location: Målstova (upstairs) |
10:00am - 10:45am | IIPC Strategic Plan 2026-2030 Location: Målstova (upstairs) |
10:45am - 11:15am | BREAK Location: Folkestova (upstairs) If you signed up for a guided exhibition tour, please be in the exhibition room at 10:45. To know if you signed up for a tour, check your registration details in ConfTool. |
11:15am - 12:45pm | Framework for Tools Sustainability Location: Målstova (upstairs) |
11:15am - 12:45pm | Content Development Working Group Meeting Location: Slottsbiblioteket (ground floor) |
11:15am - 12:45pm | TBC Location: VIP - rommet (upstairs) |
12:45pm - 2:00pm | LUNCH Location: CREDO Restaurant | Kantine (downstairs) If you signed up for a guided exhibition tour, please be in the exhibition room at 12:50. To know if you signed up for a tour, check your registration details in ConfTool. |
2:00pm - 3:30pm | Research Working Group Meeting Location: Målstova (upstairs) |
2:00pm - 3:30pm | Training Working Group Meeting Location: Slottsbiblioteket (ground floor) Actual session length: 60 minutes |
2:00pm - 3:30pm | TBC Location: VIP - rommet (upstairs) |
3:30pm - 4:00pm | BREAK Location: Folkestova (upstairs) If you signed up for a guided exhibition tour, please be in the exhibition room at 3:30. To know if you signed up for a tour, check your registration details in ConfTool. |
4:00pm - 5:30pm | Crawling National Domain: Towards Best Practices Location: Målstova (upstairs) |
4:00pm - 5:30pm | TWG WORKSHOP: Case Studies ‘Write-a-thon’ - Documenting Best Practices Location: Slottsbiblioteket (ground floor) |
4:00pm - 5:30pm | TBC Location: VIP - rommet (upstairs) |
7:00pm - 9:00pm | WELCOME RECEPTION Location: Folkestova (upstairs) [IIPC Members Only] Includes light refreshments and drinks. Attendees are encouraged to have dinner beforehand. |
Date: Wednesday, 09/Apr/2025 | |
9:00am - 9:40am | REGISTRATION: Web Archiving Conference (WAC) |
9:40am - 9:50am | Opening Remarks Location: Målstova (upstairs) Streamed to Store Auditorium. |
9:50am - 10:45am | Opening Keynote: Libraries, Copyright, and Language Models Location: Målstova (upstairs) Session Chair: Andrew Jackson, Digital Preservation Coalition Streamed to Store Auditorium. |
10:45am - 10:55am | SHORT BREAK Streaming video from Målstova to Store Auditorium ends. Lightning Talk Session 2 will begin in the Store Auditorium after the break. |
10:55am - 11:00am | LIGHTNING TALK SESSION 1: INTRODUCTION Location: Målstova (upstairs) Session Chair: Ben Els, National Library of Luxembourg |
10:55am - 11:00am | LIGHTNING TALK SESSION 2: INTRODUCTION Location: Store Auditorium (ground floor) Session Chair: Sawood Alam, Internet Archive |
11:00am - 11:25am | LIGHTNING TALK SESSION 1 Location: Målstova (upstairs) Session Chair: Ben Els, National Library of Luxembourg |
11:00am - 11:25am | LIGHTNING TALK SESSION 2 Location: Store Auditorium (ground floor) Session Chair: Sawood Alam, Internet Archive |
11:25am - 11:55am | BREAK Location: Folkestova (upstairs) Participants in the 2025 Mentoring Program can meet at the top of the old granite stairs outside of Målstova. Sitting places are available in the cafeteria/bar (upstairs) and library hallways (upstairs and ground floor). If the weather is nice, there are also small parks immediately in front of and behind the National Library building. |
11:55am - 1:00pm | PANEL #01: Engaging Audiences Location: Målstova (upstairs) Session Chair: Eveline Vlassenroot, University of Ghent |
11:55am - 1:00pm | SESSION #01: Tools Under Construction: Lessons Learned (National Library Perspective) Location: Store Auditorium (ground floor) Session Chair: Katherine Boss, National Library of Norway |
11:55am - 1:00pm | WORKSHOP #01: Exploring Dilemmas in the Archiving of Legacy Webportals: An Exercise in Reflective Questioning Location: Slottsbiblioteket (ground floor) Since 2023 the National Library of the Netherlands (KBNL) is proud to curate a digital collection that has become UNESCO world heritage: the Digital City (De Digitale Stad, henceforth: DDS). Material belonging to this collection consists of an original freeze from 1996, as well as two student projects and miscellaneous material that was contributed by users and founders over the course of multiple events. The two student projects were the first attempt to revive the portal of DDS and store it as a disk image. The two groups of students used two methods for this reviving: one based on emulation, the other based on migration. But what choices were made during restoration and which version is more authentic? Furthermore, KBNL has several websites, scientific articles and newspaper clippings in its collections that might serve as context information. Do we consider this context information crucial for understanding DDS or do we rather leave users to find these resources by themselves if they are interested?
Even without considering the plethora of archival material that currently is DDS, the original portal already was a mixed bag of different protocols. Most of them are currently not mainstream anymore like IRC and Usenet newsgroups and were never part of DDS itself but only linked to. The portal also consisted of links to offsite websites not archived, like some of the users homepages or ‘houses’. The original hardware – not part of the collection - was running on proprietary software that is now thoroughly obsolete. There was a multi-user dungeon where users could program their own objects but this depended on real-time user interaction. Some of the functionality depended on live data which isn’t available anymore, like who was logged in. The original software was command-line and based on Freenet-software. Shortly after the initial launch an HTML-interface was introduced. Even then the command-line interface stayed available for less-privileged users. The navigation of the HTML-version relied heavily on image maps that require a binary executable to function correctly. From newspaper evidence we can gather that sometimes functionality wasn’t available or stopped working. There was both a general part of the portal and a personalized part based on login, the latter also containing email. There have also been cases of harmful or polarizing content being published in newsgroups. At the time the norm was self-regulation by the community and laissez-faire but time has moved on and our users may have come to expect a more active approach of regulation, or at least some form of acknowledgement, from us as heritage organizations. As can be seen from this description, there is a lot of complexity when we consider archiving DDS and making it accessible to our users. We can think of a lot of difficult dilemmas when making decisions on what to archive and how to present it. Do we want users to experience how it is to create a homepage in DDS or do we want to present a historically correct picture of the homepages existing at the time? What should be considered part of the object and what part of the context? Is the migrated or the emulated version more authentic? What is more important, the privacy of the original users or providing full access to researchers? What do we consider belonging to DDS and what not? Only the HTML? Or also any news group material that might still be online but isn’t part of the archival material? Do users want a real authentic experience or rather a convenient way of viewing the content? Even though DDS was a Dutch portal, it was based on software of the American Free-nets and inspired other cities in Europe and Asia. Therefore, we think this case might have a lot of recognizable features that also apply to the archiving of other legacy portals. Arguably, there are no right or wrong answers. They are typically dilemmas where multiple options have both benefits and drawbacks. In our workshop we want to present a couple of these real-world dilemmas to participants to stimulate discussion based on principles of reflective questioning and open dialogue. The idea is that we present a few cases related to DDS that participants can discuss in groups. Each group has to choose a preferred solution and present their reasoning to the group. People are encouraged to explore the reasons for choosing one or the other, for instance by reflecting on their own organizational context or personal assumptions regarding digital preservation. We try to stay away from providing clear cut answers or guidance but rather provide participants with the opportunity to explore these questions together. Participants will learn how to ask the right questions to delve deeper into their own reasoning process during decision making, based on our method of reflective questioning. Participants should be able to use this method and the cases presented to benefit their own curatorial decision making process regarding legacy webportals in their own collections. For KBNL, the group discussions may provide important community input and food for thought on some of the decisions we are going to be making regarding DDS in the near future. |
1:00pm - 2:00pm | LUNCH Location: CREDO Restaurant | Kantine (downstairs) If you signed up for a guided exhibition tour, please be in the exhibition room at 13:05. To know if you signed up for a tour, check your registration details in ConfTool. |
2:05pm - 3:40pm | SESSION #02: Crawling Tools Location: Målstova (upstairs) Session Chair: László Tóth, National Library of Luxembourg |
2:05pm - 3:40pm | SESSION #03: Advocacy & User Engagement Location: Store Auditorium (ground floor) Session Chair: Mark Phillips, University of North Texas Libraries |
2:05pm - 3:40pm | WORKSHOP #02: Web Archive Collections As Data Location: Slottsbiblioteket (ground floor) |
3:40pm - 4:10pm | BREAK Location: Folkestova (upstairs) Participants in the 2025 Mentoring Program can meet at the top of the old granite stairs outside of Målstova. Sitting places are available in the cafeteria/bar (upstairs) and library hallways (upstairs and ground floor). If the weather is nice, there are also small parks immediately in front of and behind the National Library building. |
4:10pm - 4:20pm | POSTER SLAM INTRO Location: Målstova (upstairs) Session Chair: Olga Holownia, IIPC Streamed to Store Auditorium. |
4:20pm - 4:40pm | POSTER SLAM Location: Målstova (upstairs) Session Chair: Olga Holownia, IIPC Streamed to Store Auditorium. |
4:40pm - 6:00pm | POSTER SESSION Location: Folkestova (upstairs) |
7:30pm - 9:30pm | DINNER Location: CREDO Restaurant | Kantine (downstairs) |
Date: Thursday, 10/Apr/2025 | |
9:00am - 9:20am | MORNING COFFEE Location: Folkestova (upstairs) |
9:20am - 9:25am | LIGHTNING TALK SESSION 3: INTRODUCTION Location: Målstova (upstairs) Session Chair: Helena Byrne, British Library |
9:20am - 9:25am | LIGHTNING TALK SESSION 4: INTRODUCTION Location: Store Auditorium (ground floor) Session Chair: Dorothée Benhamou-Suesser, National Library of France |
9:25am - 9:55am | LIGHTNING TALK SESSION 3 Location: Målstova (upstairs) Session Chair: Helena Byrne, British Library |
9:25am - 9:55am | LIGHTNING TALK SESSION 4 Location: Store Auditorium (ground floor) Session Chair: Dorothée Benhamou-Suesser, National Library of France |
9:55am - 10:05am | SHORT BREAK |
10:05am - 11:15am | SESSION #04: Discovery & Access (News/Newspapers) Location: Målstova (upstairs) Session Chair: Tita Enstad, National Library of Norway |
10:05am - 11:15am | SESSION #05: Sustainability Location: Store Auditorium (ground floor) Session Chair: Bjarne Andersen, Royal Danish Library |
10:05am - 11:15am | WORKSHOP #03: Introduction to Web Graphs Location: Slottsbiblioteket (ground floor) The workshop will begin with a brief introduction to the concept of the webgraph or hyperlink graph - a directed graph whose nodes correspond to web pages and whose edges correspond to hyperlinks from one web page to another. We will also look at aggregations of the page-level webgraph at the level of Internet hosts or pay-level domains. The host-level and domain-level graphs are at least an order of magnitude smaller than the original page-level graph, which makes them easier to study.
To represent and process webgraphs, we utilize the WebGraph framework, which was developed at the Laboratory of Web Algorithms (LAW) of the University of Milano. As a "framework for graph compression aimed at studying web graphs," it allows very large webgraphs to be stored and accessed efficiently. Even on a laptop computer, it's possible to store and explore a graph with 100 million nodes and more than 1 billion edges. The WebGraph framework is also used to compress other types of graphs, such as social network graphs or software dependency graphs. In addition, the framework and related software projects include tools for the analysis of web graphs and the computation of their statistical and topological properties. The WebGraph framework implements a number of graph algorithms, including PageRank and other centrality measures. It is an open-source Java project, but a re-implementation in the Rust language has recently been released. Over the past two decades, the WebGraph format has been widely used by researchers, for example those at LAW or Web Data Commons, to distribute graph dumps. It has also been used by open data initiatives, including the Common Crawl Foundation and the Software Heritage project.
The workshop focuses on interactive exploration of one of the precompiled and publicly available webgraphs. We look at graph properties and metrics, learn how to map node identifiers (just numbers) and node labels (URLs), and compute the shortest path between two nodes. We also show how to detect "cliques", i.e. densely connected subgraphs, or how to run PageRank and related centrality algorithms to rank the nodes of our graph. We share our experiments on how these applications are used for collection curation: how cliques can be used to discover sites with content in a regional language, how link spam is detected or how global domain ranks are used to select a representative sample of websites. Finally, we will build a small webgraph from scratch using crawl data.
Participants will learn how to explore webgraphs (even large ones) in an interactive way and learn how graphs can be used to curate collections. Basic programming skills and basic knowledge of the Java programming language are a plus but not required. Since this is an interactive workshop, attendees should bring their own laptops, preferably with the Java 11 (or higher) JDK and Maven installed. Nevertheless, it will be possible to follow the steps and explanations without having to type them into a laptop. We will provide download and installation instructions, as well as all teaching materials, prior to the workshop. |
11:15am - 11:45am | BREAK Location: Folkestova (upstairs) |
11:45am - 1:15pm | PANEL #02: Cross-Institutional Collaborations Location: Målstova (upstairs) Session Chair: Abbie Grotke, Library of Congress |
11:45am - 1:15pm | SESSION #06: Curating Social Media Location: Store Auditorium (ground floor) Session Chair: Tom Smyth, Library and Archives Canada |
11:45am - 1:15pm | WORKSHOP #04: How to Develop a New Browsertrix Behavior Location: Slottsbiblioteket (ground floor) Behaviors are a key part of Browsertrix and Browsertrix Crawler, as they make it possible to automatically have the crawler browsers take certain actions on web pages to help capture important content. This tutorial will walk attendees through the process of creating a new behavior and using it with Browsertrix Crawler.
Browsertrix Crawler includes a suite of standard behaviors, including auto-scrolling pages, auto-playing videos, and capturing posts and comments on particular social media sites. By default, all of the standard set of behaviors are enabled for each crawl. Users have the ability to instead disable behaviors entirely or select only a subset of the standard set of behaviors to use on a crawl.
At times, users may need additional custom behaviors to navigate and interact with a site in specific ways automatically during crawling if they want the resulting web archive and replay to reflect the full experience of the live site. For instance, a new behavior could click on interactive buttons in a particular order, “drive” interactive components on a page, or open up posts sequentially on a new social media site and load comments.
This tutorial will walk through the process of creating a new behavior step by step, using the existing written tutorial for creating new behaviors on GitHub as a model. In addition to demonstrating how to write a behavior’s code (using JavaScript), the tutorial will also discuss how to know when a behavior is the appropriate solution for a given crawling problem, how to test behaviors during development, how to use custom behaviors with Browsertrix Crawler running locally in Docker, and finally how to use custom behaviors from the Browsertrix web interface (a feature that is currently planned and will be completed by the conference date).
Participants will not be expected to write any code or follow along on their own laptops in real time during the tutorial. The purpose is instead to demonstrate how one would approach developing a new behavior, lower the barrier to entry for developers and practitioners who may be interested in doing so, and to give attendees the opportunity to ask questions of Webrecorder developers in real time. We would additionally love to foster a conversation about how to develop a community library of available behaviors moving forward to make it easier than ever for users to find and use behaviors that meet their needs.
The tutorial will be led by Ilya Kreymer and Tessa Walsh, developers at Webrecorder with intimate knowledge of the Browsertrix ecosystem. The target audience is technically-minded web archiving practitioners and developers - in other words, people who could either themselves write new custom behaviors or communicate the salient points to developers at their institutions. Because this is not a hackathon-style workshop, the tutorial could have as many participants as the venue allows. By the conclusion of the tutorial, attendees should understand the concept of how Browsertrix Behaviors work, when developing a new behavior is a good solution to their problems, the steps involved in developing and testing a new behavior, and where to find additional resources to help them along the way. Our hope is to foster a decentralized community of practice around behaviors to the entire IIPC community’s benefit. |
1:15pm - 2:15pm | LUNCH Location: CREDO Restaurant | Kantine (downstairs) If you signed up for a guided exhibition tour, please be in the exhibition room at 13:20. To know if you signed up for a tour, check your registration details in ConfTool. |
2:15pm - 3:40pm | SESSION #07: Research & Access Location: Målstova (upstairs) Session Chair: Marie Roald, National Library of Norway |
2:15pm - 3:40pm | SESSION #08: Handling What You Captured Location: Store Auditorium (ground floor) Session Chair: Meghan Lyon, Library of Congress |
2:15pm - 3:40pm | PANEL #03: Cross-Institutional Collaboration: the End of Term Archive Location: Slottsbiblioteket (ground floor) Session Chair: Jeffrey van der Hoeven, National Library of the Netherlands (KB) |
3:40pm - 4:10pm | BREAK Location: Folkestova (upstairs) |
4:10pm - 5:05pm | Closing Keynote: Quantifying Complexity: Using Web Data to Decode Online Public Debate Location: Målstova (upstairs) Session Chair: Jon Carlstedt Tønnessen, National Library of Norway Streamed to Store Auditorium. |
5:05pm - 5:30pm | Closing Remarks: Closing Remarks Location: Målstova (upstairs) Streamed to Store Auditorium. |
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: IIPC WAC 2025 |
Conference Software: ConfTool Pro 2.6.154 © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |