Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.
|
Session Overview |
| Session | ||
SOCIAL MEDIA
| ||
| Presentations | ||
9:20am - 10:25am
Social media archiving in small institutions: working alone together 1Vlaams Architectuurinstituut, Belgium; 2ADVN, Belgium; 3Amsab-isg, Belgium; 4meemoo, Belgium; 5KBR-UGent, Belgium; 6Regionaal Historisch Centrum Eindhoven, Netherlands; 7UGent, Belgium This panel takes a critical look at the work of several private cultural archives. In recent years, the findings of a joint research project on social media archiving have been incorporated into the organisations. To make progress with limited resources a specific approach to social media archiving was rolled out. The cultural archives are subsidised at regional level and build collections that complement the national collections of [INSTITUTION] and the [ARCHIVES]. These small private institutions work on a different scale and from a different perspective on social media archiving. During this session, three cultural archives will briefly present concrete steps that have been taken in the areas of selection, knowledge sharing and archiving to further embed the project in regular operations supported by a community of practice. This will be followed by a panel discussion of 40 minutes that will delve deeper into the challenges for each component and evaluate the steps taken. Based on propositions, experts from different backgrounds (research, regional public archive abroad, technical profile, national heritage institution) and institutions will engage in a discussion that will (hopefully) yield new insights. The presenters (different private archives) will moderate the panel. Some example propositions: - The small steps being taken by cultural archives, alongside those of national heritage institutions, are valuable. Social media must be archived at various levels by heritage institutions (national, regional, local). (What should be the role of large archives and libraries? Should there be coordination and how?) - It is more important to secure and preserve the data than to make it available. (Should we be concerned about our ecological footprint?) - It is not worthwhile to archive comments on posts. They mainly contain nonsense and rarely relevant information. - Archiving incomplete datasets is not worthwhile and therefore irresponsible. (What minimum criteria should heritage institutions use to determine what is worthwhile?) - We must ask permission from all parties involved before archiving. - We must better convince our archive creators to export their data themselves. (What are the arguments for and against? How do we do that? ) Small scale selection of social media (presentation) When you are a small archive with only half to one digital archivist, you have to be happy with small steps. After all, that archivist is responsible for setting up a digital preservation system, acquiring, preserving and giving access to a multitude of complex digital file formats. Despite the many tasks, it is necessary to start archiving social media before the data becomes inaccessible. A first step is to map and select the social media you want to archive. We recently started drawing up a seed list and establishing selection criteria. We use our own collection plan, websites and the MOSCOW principles to determine priorities. In a short presentation some examples illustrate this approach, the challenges (i.e. deduplication) and the gaps (i.e. randomness and bias) to feed later panel discussions. The community of social media archiving in practice (presentation) A community of practice social media archiving developed various initiatives to safeguard her knowledge and experiences. Working groups were set up to share best practices (Twitter/X research) and test results of replay tools (SolrWayback). We organized edit-a-thons to update existing manuals and created new ones for a diverse scala of archiving tools. Developing a sustainable network helps us to ensure our knowledge and expertise is not lost but can be embedded within our small archival private institutions. But how is the balance between between effort and output? What roles do we take as an institution and archivist within that network? The inherent incompleteness of archived social media data (presentation) Regardless of the method used to preserve social media content, archived datasets will almost always be incomplete or imperfect. With participatory archiving -where the archival creator uses the platform’s export function to obtain a copy of their own data- significant contextual information is lost. For example, we only receive comments of the archival creator, without the surrounding interactions that give them meaning. The web scraping methods will also lead to imperfect archived datasets. For instance, depending on the tools used, the visual appearance and user experience of the original platform are often not preserved. Certain elements, such as comments or embedded media, are in practice also difficult or impossible to capture in full. These limitations are not solely technical; human factors also contribute. Delays in initiating the archiving process, particularly when event-driven archiving, can result in the loss of valuable content that has already been removed from the web. This raises a difficult question for web archivists: how should we address these imperfect conditions? By examining a series of cases where the archiving process went wrong, we propose a pragmatic approach that demonstrates how even flawed or partial efforts can still yield historically valuable data. Panel of external voices from different organisations and backgrounds (names were removed on request of the WAC program committee) They are available in the remarks for the program committee and chair:
10:25am - 10:45am
Making 1.2 billion social media posts accessible: a user-centric search interface for large-scale Twitter archives INA - Institut national de l'audiovisuel, France Archiving social media platforms represents a major scientific, documentary, and civic challenge. In order to secure our digital heritage, our institution has undertaken the task of collecting and archiving content from Twitter and, more recently, Bluesky. Over the past decade, the chosen strategy has resulted in an archive of 1.2 billion tweets and posts from 16,000 accounts and 3,200 thematic hashtags, accompanied by 25 million archived videos. While the resulting massive scale of these archives creates a multitude of opportunities, it also comes with new challenges. How do we design access systems that remain sustainable as archives scale from millions to billions of items? How can such a vast archive be made accessible, intelligible, and useful? Researchers require sophisticated filtering capabilities to construct meaningful corpora as simple keyword searches on collections of this magnitude return overwhelming and unusable results. The general public needs intuitive and reliable tools to explore topics of interest, such as media events, cultural trends, and political and societal discussions. This presentation demonstrates a production-ready consultation interface designed to address these challenges. Built as a JavaScript web application with an Elasticsearch cluster backend, it provides multiple access points tailored to diverse research methodologies: - Faceted Search Engine: Full-text search combined with progressive filters for media type, language, hashtags, emojis, and engagement metrics (likes, retweets, replies, citations), enabling users to refine queries across multiple dimensions simultaneously The presentation will include a live demonstration highlighting real research use cases that illustrate how preserved archived content enables important scholarly investigations. Beyond demonstrating the interface, this contribution aims to foster discussion about broader sustainability challenges in social media archiving. Platform migrations — such as the ongoing transition from Twitter to Bluesky — raise other fundamental questions: how can we design interfaces and data models that adapt to evolving platform ecosystems while maintaining data integrity and access? How can we ensure these archives serve as sustainable tools for research communities and the public? | ||
