11:45am - 12:05pmDeveloping Social Media Archiving Guidelines at the National Archives of the Netherlands
Lotte Wijsman, Geert Leloup, Susanne Van den Eijkel, Sander Wellens
National Archives of the Netherlands, Netherlands
At the beginning of 2024, we started a project to develop a nationwide guideline for archiving public social media content. This project aimed to address the increasing use of social media by Dutch governments and the current lack of archiving there is. Our presentation at the Web Archiving Conference 2025 will focus on the process of creating this guideline and presenting the final version.
The primary target audience for this guideline are the information professionals, who play a vital role in managing and preserving the archived social media content. However, we also recognise communication professionals as an important target audience, given their role in setting up and using the accounts.
The guideline is structured into six modules
This module provides a definition of social media and identifies what constitutes as public information on the various platforms.
In this module, we examine the Dutch and European legal requirements and constraints related to archiving public social media content. Understanding the legal landscape is essential to ensure compliance and address any legal challenges.
- Recommendations for communication professionals
This module provides practical recommendation on using social media in a way that facilities easier archiving. Aimed at those managing the social media accounts, it includes tips on account settings and content creation.
This module addresses how content can be appraised and selected. Also to ensure that historically important information will be transferred to the Dutch National Archives at a certain moment.
In this module we establish quality criteria for archiving social media content and explore various techniques to archive social media. Methods discussed include screen capturing and API usage. This module aims to equip professionals with the knowledge to choose the most effective archiving methods.
The final module presents real-world examples from the Netherlands and abroad. These case studies illustrate diverse methods and results, providing practical insights and lessons learned from other practitioners in the field.
The creation of this guideline was a collaborative and intensive year-long process. We systematically engaged with a wide range of stakeholders and incorporated their feedback to ensure the guideline is comprehensive and practical. Our goal is to support government agencies in archiving their social media communications effectively.
We are excited to share our journey and the outcomes of this project with our colleagues at the Web Archiving Conference. By presenting our experiences and insights, we hope to contribute to the ongoing discourse on social media archiving and inspire others in the field.
12:05pm - 12:25pmArchiving the Social Media Profiles of Members of Government
Ben Els
National Library of Luxembourg, Luxembourg
As part of the 2023 national elections, the National Library of Luxembourg, in collaboration with the National Archives and the Ministry of State, launched a pilot project to archive the social media profiles of members of the government. The technical obstacles to archiving social platforms are becoming increasingly problematic, resulting in the situation that none of the major platforms can currently be archived effectively by our harvesters and service providers. Since most social media platforms are practically inaccessible by web crawlers and conventional web archiving methods, we decided to try a more direct approach, by asking the members of government directly to download the data from their profiles and hand them over to the National Library and National Archives.
With the help of the Ministry of State, we sent out a call for participation, with specific guidelines to exporting datasets from social networks to the archive delegates and communication departments of each ministry, as well as to the ministers themselves. The response to this first call for participation was very positive - despite the pressure of time, between the election and formation of a new government, with a high chance of many ministers leaving their offices.
In addition to elaborating the guidelines for downloading datasets from different platforms, we offered direct technical support to the people involved in the ministries, even the members of government themselves and retrieved the data individually on site.
We were able to retrieve the majority of profiles of the government, for the time span of the 5 years of their term. This pilot project represents a direct and effective method, to secure the data of profiles of high public interest. The National Library and National Archives of Luxembourg are looking to repeat the same collection process by the end of 2024 and hope to move to a regular operation after that.
This presentation will cover the different steps of the collection process, the lessons learned from the pilot project and the second operation end of 2024. We will conclude with an outlook to the changes we hope to implement in the future, a possible extension of the collection scope and our plans in terms of public access to the collections.
12:25pm - 12:45pmFrom Posts to Archives: The National Library of Singapore’s Journey in Collecting Social Media
Shereen Tay, Meiyu Lee
National Library Board Singapore, Singapore
Social media plays a huge role in our everyday life today. It is used for a myriad of activities such as communication, entertainment, business, and even as personal diaries. In Singapore, about 85% of the population uses social media, the most popular ones being Facebook, Instagram, YouTube, and TikTok. Besides individuals, many organisations have also turned to social media to engage and communicate with their followers. With such prevalence use, social media is becoming an important source of information about the lives and stories of our country and people.
Recognising this, the National Library of Singapore (NLS) began looking at collecting social media. Our journey started in 2017, and the initial years focused on research and experiments, such as conducting environmental scan of other heritage institutions’ experiences in collecting social media, proof-of-concept using web archiving and available APIs, and trialling commercial vendors’ solutions. Our experience was similar to many institutions around the world. Collecting social media is complex and poses many technical, legal, and ethical challenges such as limited access to APIs and needing to manage personal data and third-party content.
Despite these challenges, we knew that we had to start collecting social media given its increasing significance. This was not only to meet our mandate of collecting and preserving our countries’ digital memories, but to also gain practical experience on how to collect, organise, and manage this format.
Putting together what we have learnt, we developed a social media collecting framework in 2023 to provide guidance on how to collect social media amidst these challenges while ensuring that a representative set of social media content can be collected for future generations and research. Our framework covered the selection criteria, the collecting methods, and our collecting approach for key social media platforms that are widely used in Singapore.
We piloted our first social media collecting in the same year, under NLS’ new 2-year project to collect contemporary materials on Singapore food and youth. The purpose was to assess individuals and organisations’ receptiveness to contribute their social media accounts to us, which was more forthcoming than we anticipated. In 2024, we made collecting social media as part of our operational work. Our collection strategy was three-prong: 1) outsourcing the archiving of significant persons/organisations’ social media accounts to a commercial vendor; 2) approaching identified organisations based on subjects to contribute their social media accounts; and 3) engaging and promoting social media collecting through advocates and an annual public call to nominate favourite Singapore social media accounts, YouTube and TikTok videos, as well as websites.
This presentation will highlight NLS’ journey in collecting social media, our collecting framework and strategy, as well as learning points and future plans.
12:45pm - 1:05pmInnovative Web Archiving Amid Crisis: Leveraging Browsertrix and Hybrid Working Models to Capture the UK General Election 2024
Nicola Bingham, Jennie Grimshaw
British Library, United Kingdom
The British Library, in collaboration with the National Libraries of Scotland and Wales, the Bodleian Library and Cambridge University Library, has created collections of archived websites for all UK general elections since 2005. This time series shows how internet use in political communication has evolved, and how the fortunes of political parties have changed. The 2024 general election was called unexpectedly on May 22nd, and took place on July 4th, at a time when the UK Web Archive was inaccessible, and our Web Archiving and Curation Tool was unavailable following a devastating ransomware attack on the British Library on October 29th 2023. Working together, we nevertheless created a collection of 2253 archived websites covering candidates' campaign sites, social media feeds of significant politicians and journalists, local and national party sites, comment by think tanks, community engagement, news sources, and manifestos of a plethora of interest groups seeking to influence the new government. To facilitate use by researchers tracking change over time, we have organised the material into these same sub-collections since 2005. We collected campaign websites for a sample of English candidates for the same counties and urban areas as we have covered since 2005, but all Scottish and Welsh candidates’ sites were gathered as numbers are manageable. We also targeted marginal constituencies which had increased in numbers dramatically since 2019. The 2024 general election saw the rise of formerly minor parties such as Reform UK to national prominence, a Liberal Democrat resurgence, growing influence of independent candidates, and the rise of identity politics with groups encouraged to vote as a bloc on issues such as the war in Gaza, and an increasingly sophisticated use of social media.
The technical outage caused by the ransomware attack necessitated a unique approach due to the disruption in our usual workflows. Despite the challenges, websites continued to be archived using Heritrix on AWS servers rather than the Library's in-house infrastructure. This shift required a new workflow, involving the use of simple spreadsheets and collaborative efforts to quickly refine metadata definitions and crawl scope, aiming to replicate our existing curatorial software as closely as possible. In addition, the British Library secured a free-trial subscription to Browsertrix, which allowed us to explore and learn this new tool’s capabilities ahead of a more formal subscription. Despite the challenges, we successfully captured 1,600 snapshots of social media content, including posts from X (formerly Twitter), Facebook, and Instagram.
This experience introduced library staff to working within data and time constraints, enhancing our understanding of how to effectively scope crawls, monitor them in real-time, and implement new quality assurance practices. The project resulted in a hybrid collecting model, utilising both Heritrix and Browsertrix for the same thematic collection.
The presentation will discuss the challenges and opportunities encountered during this project, providing valuable insights for those interested in Browsertrix’s capabilities and in executing web archiving with a mixed-model approach across different institutions with diverse interests and expertise in unusually challenging circumstances within the framework provided by a historic time series.
|