Wednesday, 03/May/2023:
7:25pm - 7:55pm

Session Chair: Tom Smyth, Libraries and Archives Canada
Virtual location: Online

Querying Queer Web Archives

Di Yoong1, Filipa Calado1, Corey Clawson2

1The Graduate Center, CUNY, USA; 2Rutgers University, USA

Our paper explores the intersections of querying and queerness as it interacts with and is informed by web spaces and their development across time. Working with hundreds of gigabites of web archival records on queer and queer-ish online spaces, we are developing new methodsfor search and discovery. as well as for the ethical access and use of web archives. This paper reflects on our process pursuing methodologies that accommodate diverse perspectives for querying web-based datasets and embrace the qualities of play and pliancy to respond to a host of research questions and investments.

For example, one central concern explores ethical methods for cleaning web archival data to maintain privacy and anonymity. While queer spaces have historically existed in the margins, confidential information is easily shared and retained in the process of collecting data. Given that we are looking into queer spaces across 30 or so years, we are also mindful of the ethical consideration for privacy and anonymity in twofolds: first, in the sense of anonymity that has shifted since early internet days; and second, on the uses of collected sites in repositories. For example, in 1995 only 0.4% of world population had access to the internet (Mendel, 2012), compared to 60% in 2020 (The World Bank, n.d.). The sense of anonymity and smaller internet community means that users were likely to share more private information than they might share today. Our research therefore has to consider how to remove private information in large amounts using tools such as bulk_extractor (Garfinkel, 2013) and bulk_reviewer (Walsh & Baggett, 2019). In addition, we also work with repositories of archived websites whose original collection was obtained through informed consent. This means that while we may have the ability to access the collection, ethical secondary use requires additional consideration. Given the small size of the collection, we have been able to reach out to the original creators, but this approach will need to be reconsidered for larger collections.

Beyond the Affidavit: Towards Better Standards for Web Archive Evidence

Nicholas Taylor

The Internet Archive (IA) standard legal affidavit is used in litigation both frequently and reliably for the authentication and admission of evidence from the Wayback Machine (WM). While the affidavit has enabled the regular and relatively confident application of IAWM evidence by the legal community, their understanding of the contingencies of web archives - including qualifications to which the affidavit itself calls attention - is limited.

The tendency to conflate IA's attestation as to the authenticity of IAWM /records/ with the authenticity of /historical webpages/ will eventually have material consequences in litigation, which we may reasonably suppose will undermine confidence in the trustworthiness of web archives generally and to a greater extent than likely merited. The ever-increasing complexity of the web and the unfortunately growing investment in disinformation only increase the probability that this will happen sooner as versus later.

In response to the looming (or present, but as yet undiscovered) threat to the current IA affidavit-favored regime for authentication of IAWM evidence, the web archiving community would do well to champion better, more institutionally-agnostic standards for evaluating and affirming the authenticity of archived web content. Some modest efforts have been made on this front, and there are a few places we can consult for tacitly indicated frameworks. Collectively, these include judicial precedents, e-discovery community guidance, and the marketing of services by commercial archiving companies. I would argue that these do not get us far enough, though.

To that end, I would like to elaborate a more expansive set of criteria that could serve as a basis for the authenticity of web archives for evidentiary purposes. Some of these traits are foundational to web archiving in the main, and help to distinguish web archives from other forms of web content capture. Some reflect the affordances of our standards and tools that we as a community already have in place. Some reflect under-addressed technical challenges, for which continued investment in mitigation will be necessary to maintain the trustworthiness of our archives for legal use. Together, they may better provide for the sustained and trustworthy use of web archives for evidentiary purposes.

