2:05pm - 2:25pmInsufficiency of Human-Centric Ethical Guidelines in the Age of AI: Considering Implications of Making Legacy Web Content Openly Accessible
Gaja Zornada, Boštjan Špetič
Computer History Museum Slovenia (Računališki muzej), Slovenia
While the preservation of web history is crucial for maintaining a cultural and informational record of our age, reconstructing and resurfacing legacy content without appropriate context nowadays presents new ethical concerns.
Legacy content may be misleading to users when consumed in isolation, as it often reflects outdated norms, technologies, and information that are no longer relevant. Moreover, individuals featured in such content may be unfairly subjected to scrutiny based on past actions or statements that, in today's context, could harm their personal or professional reputation. The consequences of resurfacing this content without adequate contextualization are amplified when AI technologies are involved.
AI’s ability to synthesize and amplify such data across platforms can create a ripple effect, where even content that does not explicitly reveal personal information can still have far-reaching consequences. By connecting disparate data points, AI may draw conclusions or inferences about individuals, influencing public perception and potentially affecting career prospects or even legal outcomes. Unlike the human reader who would be able to contextually infer that a piece of reconstructed online content is part of a legacy web segment intended to be presented as a historical monument to the online world of times past, AI will not be able to distinguish such content from contemporary sources and will misplace the weights system on it’s analysis of such content. The ethical challenge here lies not just in the publication of legacy content and archival access, but in AI’s ability to endlessly circulate and reinterpret it in ways that were never intended by the original authors.
This proposal explores the delicate balance between the preservation of historical digital records and respecting individuals' right to be forgotten (RTBF) in the age of AI. It seeks to question how AI-powered tools reshaping the reading and presentation of web archives challenge existing ethical norms. By examining potential frameworks for responsible digital archiving, the proposal aims to identify solutions that mitigate the risks posed by AI-driven resurfacing of legacy content in the public domain.
2:25pm - 2:45pmWeb Archives for Music Research
Andreas Lenander Ægidius
Royal Danish Library, Denmark
The Royal Danish Library has set a strategic goal to make more of its cultural heritage materials accessible and engaging for researchers by 2027. In this paper, we present findings from an advocacy initiative targeted at researchers at national universities in music-related fields. The national web archive provides primary sources and contextual information relevant to music researchers as they engage with our music collections. However, there is room for improvement in the connection between these collections and our understanding of user needs.
Reports by Healy et al. (2022) and Healy & Byrne (2023) explore the challenges researchers face when using web archives, highlighting the ongoing need to examine the skills, tools, and methods associated with web archiving. Additionally, the sounds of the web—from MIDI to streaming—are an integral part of its history, yet this aspect is often overlooked by tools like the Internet Archive's Wayback Machine (Morris, 2019).
Through semi-structured interviews with fellow curators and music researchers at universities, we identify current barriers to access and user requirements for improved utilization of web archival resources. Our advocacy initiative also allows us to summarize current research trends as feedback for web curators. In conclusion, we describe how the web curators processed our findings into suggestions for updates and refinements to web crawling strategies and the built-in tools in the SolrWayBack installation.
References
Healy, S., & Byrne, H. (2023). Scholarly Use of Web Archives Across Ireland: The Past, Present & Future(s) (WARCnet Special Reports). Aarhus University. https://cc.au.dk/fileadmin/dac/Projekter/WARCnet/Healy_Byrne_Scholarly_Use_01.pdf
Healy, S., Byrne, H., Schmid, K., Bingham, N., Holownia, O., Kurzmeier, M., & Jansma, R. (2022). Skills, Tools, and Knowledge Ecologies in Web Archive Research (WARCnet Special Reports). Aarhus University. https://cc.au.dk/fileadmin/dac/Projekter/WARCnet/Healy_et_al_Skills_Tools_and_Knowledge_Ecologies.pdf .
Morris, J. W. (2019). Hearing the Past: The Sonic Web from MIDI to Music Streaming. In N. Brügger & I. Milligan (Eds.), The SAGE Handbook of Web History (pp. 491–510). Sage.
2:45pm - 3:05pmIXP History Collection: Recording the Early Development of the Core of the Public Internet
Sharon Healy1, Gerard Best1, Lara Díaz Martínez2
1Independent Researcher, Ireland; 2University of Barcelona, Spain
The IXP History Collection is an ongoing project which seeks to record and document histories of the Internet exchange points (IXPs) which form the core of the Internet’s topology. An IXP is the point at which Internet Service Providers and Content Delivery Networks connect and exchange data with each other (“peering”). IXPs form the topological core of the Internet backbone, their histories are inextricably linked to the commercialization of the Internet, and their development is a significant milestone in the global history of media and communications. Efforts should therefore be made to ensure that we preserve IXP histories for future generations.
The main purpose of the project is to collect and preserve networking and IXP histories due to valid concerns that these histories will be lost from the global record unless attempts are made to start preserving them now. In particular, the project is concerned with the fragility of electronic information and born digital documents, records, and multimedia, otherwise known as born digital heritage. As a starting point, the project utilizes the Internet Exchange Directory which is maintained by Packet Clearing House, an intergovernmental treaty organization responsible for providing operational support and security to critical Internet infrastructure, including Internet exchange points. The PCH IX Directory is one of the earliest organized efforts to develop and maintain a database for recording and tracking the establishment, development and global growth of IXPs.
The project then focuses on documenting IXP histories through as many online sources as possible (e.g., websites/pages, reports, journals, magazines/newspaper articles, old emails on public mail lists). The project relies on the use of web archives as a research tool for tracing IXP histories, as well as a preservation tool using the Save Page functions in the Wayback Machine and Arquivo.pt.
In this presentation we discuss our approach and methodology for developing the collection and making it available online as a reference resource, and we offer an overview of the importance of using web archives for documenting and preserving Internet and IXP histories. By presenting our approach, we hope to offer a case study that demonstrates how web archive research can be integrated with traditional research methods (Healy et al., 2022), and promote more widespread use of web archives as research tools for historical inquiry, and the long-term preservation of digital research (Byrne et al., 2024).
Resources:
Arquivo.pt: https://arquivo.pt/
IXP History Collection - Information Directory | Zotero: https://www.zotero.org/groups/4944209/ixp_history_collection_-_information_directory/library
Packet Clearing House, Internet Exchange Directory: https://www.pch.net/ixp/dir
Wayback Machine: https://web.archive.org/
References:
Healy, S., Byrne, H., Schmid, K., Bingham, N., Holownia, O., Kurzmeier, M. and Jansma, R. (2022). Skills, Tools, and Knowledge Ecologies in Web Archive Research. WARCnet Special Report, Aarhus, Denmark: https://web.archive.org/web/20221003215455/https://cc.au.dk/fileadmin/dac/Projekter/WARCnet/Healy_et_al_Skills_Tools_and_Knowledge_Ecologies.pdf
Byrne, H., Boté-Vericad, J-J, and Healy, S. (2024) Exploring Skills and Training Requirements for the Web Archiving Community. In: Aasman, S., Ben-David, A., and Brügger, N., eds. The Routledge Companion to Transnational Web Archive Studies. Routledge.
3:05pm - 3:25pmLost, but Preserved - A Web Archiving Perspective on the Ephemeral Web
Sawood Alam, Mark Graham
Internet Archive, United States of America
The World Wide Web, our era's most dynamic information ecosystem, is characterized by its transient nature. Recent studies have highlighted the alarming rate at which web content disappears or changes, a phenomenon known as "link-rot". A 2024 Pew Research Center study revealed that 38% of webpages from 2013 were inaccessible a decade later. Even more striking, Ahrefs, an SEO company, reported that at least 66.5% of links to sites created in the last nine years are now dead. These findings echo earlier research by Zittrain et al., which uncovered significant link-rot in journalistic references from New York Times articles.
While these statistics paint a grim picture of digital impermanence, they often overlook a crucial factor: the role of web archives. This talk aims to reframe the link-rot discussion by considering the preservation efforts of various web archiving institutions.
Our research revisiting the Pew dataset yielded a surprising discovery: only one in nine URLs from the original study were truly missing, the remaining bulk had at least one capture in a web archive. This finding suggests that the digital landscape, when viewed through the lens of web archiving, may be less ephemeral than commonly perceived.
Key points we will explore:
1. The state of link-rot: We will review recent studies and their methodologies, discussing the implications of their findings for digital scholarship, journalism, and information access.
2. Web archives as digital preservationists: We will introduce major web archiving initiatives and explain their crucial role in maintaining the continuity of online information.
3. Reassessing link rot with archives in mind: We will present our methodology and findings from reexamining the Pew dataset, demonstrating how web archives mitigate content loss.
4. Challenges and limitations of web archiving: Despite their importance, web archives face significant technical, legal, and resource constraints. We will discuss these challenges and their impact on preservation efforts.
5. The future of web preservation: We will explore emerging technologies and strategies in web archiving, including machine learning approaches to capture dynamic content and efforts to preserve the context of web pages.
6. Call to action: We will emphasize the importance of supporting and expanding web archiving efforts, discussing how researchers, institutions, and individuals can contribute to these initiatives.
This talk aims to provide a more nuanced understanding of digital impermanence and preservation. While acknowledging the real challenges posed by link-rot, we will highlight the often-overlooked role of web archives in maintaining our digital heritage. By doing so, we hope to foster greater appreciation for web archiving efforts and encourage increased support for these crucial initiatives.
Our goal is to leave the audience with a renewed perspective on the state of the web's preservability and a clear understanding of why supporting web archiving is essential for ensuring the longevity and accessibility of our shared digital knowledge. As we navigate an increasingly digital world, recognizing that much of what seems lost may actually be preserved is vital for researchers, educators, journalists, lawyers, and anyone who values the continuity of online information.
|