JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at events@netpreserve.org.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.

Agenda Overview

Session

SOCIAL MEDIA PANEL

Time:

Wednesday, 22/Apr/2026:

9:20am - 10:45am

Session Chair: Katrien Weyns, Vlaams Architectuurinstituut

Location: AUDITORIUM [-2]

Floor: -2 [Ground Floor | Main entrance]

Presentations

9:20am - 10:25am

Social media archiving in small institutions: working alone together

Katrien Weyns¹, Sophie Bossaert², Jeroen Fernandez-Alonso³, Nastasia Vanderperren⁴, Julie Birkholz⁵, Eva Van den Hurk-Van 't Klooster⁶, Elise Storme⁷

¹Vlaams Architectuurinstituut, Belgium; ²Archive for national movements, Belgium; ³Amsab-Instituut voor Sociale Geschiedenis, Belgium; ⁴meemoo, Belgium; ⁵KBR | Royal Library of Belgium & Ghent University, Belgium; ⁶Regionaal Historisch Centrum Eindhoven, Netherlands; ⁷UGent, Belgium

This panel takes a critical look at the work of several private cultural archives. In recent years, the findings of a joint research project on social media archiving have been incorporated into the organisations. To make progress with limited resources a specific approach to social media archiving was rolled out. The cultural archives are subsidised at regional level and build collections that complement the national collections of [INSTITUTION] and the [ARCHIVES]. These small private institutions work on a different scale and from a different perspective on social media archiving. During this session, three cultural archives will briefly present concrete steps that have been taken in the areas of selection, knowledge sharing and archiving to further embed the project in regular operations supported by a community of practice. This will be followed by a panel discussion of 40 minutes that will delve deeper into the challenges for each component and evaluate the steps taken. Based on propositions, experts from different backgrounds (research, regional public archive abroad, technical profile, national heritage institution) and institutions will engage in a discussion that will (hopefully) yield new insights. The presenters (different private archives) will moderate the panel. Some example propositions:

- The small steps being taken by cultural archives, alongside those of national heritage institutions, are valuable. Social media must be archived at various levels by heritage institutions (national, regional, local).

(What should be the role of large archives and libraries? Should there be coordination and how?)

- It is more important to secure and preserve the data than to make it available. (Should we be concerned about our ecological footprint?)

- It is not worthwhile to archive comments on posts. They mainly contain nonsense and rarely relevant information.
(How unique is the information on social media: what would we miss if we did not preserve it?)

- Archiving incomplete datasets is not worthwhile and therefore irresponsible. (What minimum criteria should heritage institutions use to determine what is worthwhile?)

- We must ask permission from all parties involved before archiving.

- We must better convince our archive creators to export their data themselves. (What are the arguments for and against? How do we do that? )

Small scale selection of social media (presentation)

When you are a small archive with only half to one digital archivist, you have to be happy with small steps. After all, that archivist is responsible for setting up a digital preservation system, acquiring, preserving and giving access to a multitude of complex digital file formats. Despite the many tasks, it is necessary to start archiving social media before the data becomes inaccessible. A first step is to map and select the social media you want to archive. We recently started drawing up a seed list and establishing selection criteria. We use our own collection plan, websites and the MOSCOW principles to determine priorities. In a short presentation some examples illustrate this approach, the challenges (i.e. deduplication) and the gaps (i.e. randomness and bias) to feed later panel discussions.

The community of social media archiving in practice (presentation)

A community of practice social media archiving developed various initiatives to safeguard her knowledge and experiences. Working groups were set up to share best practices (Twitter/X research) and test results of replay tools (SolrWayback). We organized edit-a-thons to update existing manuals and created new ones for a diverse scala of archiving tools. Developing a sustainable network helps us to ensure our knowledge and expertise is not lost but can be embedded within our small archival private institutions. But how is the balance between between effort and output? What roles do we take as an institution and archivist within that network?

The inherent incompleteness of archived social media data (presentation)

Regardless of the method used to preserve social media content, archived datasets will almost always be incomplete or imperfect. With participatory archiving -where the archival creator uses the platform’s export function to obtain a copy of their own data- significant contextual information is lost. For example, we only receive comments of the archival creator, without the surrounding interactions that give them meaning. The web scraping methods will also lead to imperfect archived datasets. For instance, depending on the tools used, the visual appearance and user experience of the original platform are often not preserved. Certain elements, such as comments or embedded media, are in practice also difficult or impossible to capture in full. These limitations are not solely technical; human factors also contribute. Delays in initiating the archiving process, particularly when event-driven archiving, can result in the loss of valuable content that has already been removed from the web. This raises a difficult question for web archivists: how should we address these imperfect conditions? By examining a series of cases where the archiving process went wrong, we propose a pragmatic approach that demonstrates how even flawed or partial efforts can still yield historically valuable data.

Panel of external voices from different organisations and backgrounds (names were removed on request of the WAC program committee) They are available in the remarks for the program committee and chair:

Panelist-technician
Panelist-national heritage institution
Panelist-public regional public archive abroad
Panelist-research

10:25am - 10:45am

Making 1.2 billion social media posts accessible: a user-centric search interface for large-scale Twitter archives

Mehdi Bourgeois

INA - Institut national de l'audiovisuel, France

Archiving social media platforms represents a major scientific, documentary, and civic challenge. In order to secure our digital heritage, our institution has undertaken the task of collecting and archiving content from Twitter and, more recently, Bluesky. Over the past decade, the chosen strategy has resulted in an archive of 1.2 billion tweets and posts from 16,000 accounts and 3,200 thematic hashtags, accompanied by 25 million archived videos.

While the resulting massive scale of these archives creates a multitude of opportunities, it also comes with new challenges. How do we design access systems that remain sustainable as archives scale from millions to billions of items? How can such a vast archive be made accessible, intelligible, and useful? Researchers require sophisticated filtering capabilities to construct meaningful corpora as simple keyword searches on collections of this magnitude return overwhelming and unusable results. The general public needs intuitive and reliable tools to explore topics of interest, such as media events, cultural trends, and political and societal discussions.

This presentation demonstrates a production-ready consultation interface designed to address these challenges. Built as a JavaScript web application with an Elasticsearch cluster backend, it provides multiple access points tailored to diverse research methodologies:

- Faceted Search Engine: Full-text search combined with progressive filters for media type, language, hashtags, emojis, and engagement metrics (likes, retweets, replies, citations), enabling users to refine queries across multiple dimensions simultaneously
- Data Visualization: Interactive visualizations including word clouds, temporal histograms, distribution charts, and image galleries that provide immediate corpus overviews and facilitate iterative query refinement through visual feedback
- Metadata Transparency: Complete metadata visibility for each archived post, supporting reproducible research practices and proper citation
- Progressive Disclosure: Researchers can begin with broad queries and iteratively narrow their focus using visual feedback from result distributions

The presentation will include a live demonstration highlighting real research use cases that illustrate how preserved archived content enables important scholarly investigations.

Beyond demonstrating the interface, this contribution aims to foster discussion about broader sustainability challenges in social media archiving. Platform migrations — such as the ongoing transition from Twitter to Bluesky — raise other fundamental questions: how can we design interfaces and data models that adapt to evolving platform ecosystems while maintaining data integrity and access? How can we ensure these archives serve as sustainable tools for research communities and the public?