Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.

 
 
Session Overview
Session
SESSION #06: Legal & Ethical
Time:
Thursday, 25/Apr/2024:
3:30pm - 4:30pm

Session Chair: Jeffrey van der Hoeven, National Library of the Netherlands
Location: Petit Auditorium [François-Mitterrand site]


Show help for 'Increase or decrease the abstract text size'
Presentations
3:30pm - 3:50pm

Intellectual Property & Privacy Concerns of Web Harvesting in the EU

Anastasia Nefeli Vidaki

Vrije Universiteit Brussel, Belgium

With data being confronted as the “oil of the 21st century” special attention on a global scale has been drawn to its more technologically efficient, fast and reliable accumulation methods. At the same time, new disruptive technologies such as, but not limited to Artificial Intelligence (AI) and machine learning have appeared and gradually dominated the data scene, assisting and facilitating even more data collection. Often based on them web harvesting is playing a leading role in obtaining digital data.

Web harvesting tools utilise software in many cases built upon AI so as to methodically browse the internet and extract the data carrying desired information. The process is rather easy to follow. A set of webpages is made available to the web harvester, which fetches further the other pages made accessible via this initial set. Through this procedure some parts of the webpage are stored and the content, possibly along with metadata in it, are downloaded and stored as well. The use of the data obtained afterwards can vary from archiving to data analytics and distribution to third-parties.

The paper focuses on the legal issues to which the aforementioned re-use of data gives birth. The most pertinent, at least for the European Union (EU) sphere, are considered the constraints imposed by the intellectual property and data protection legislation. On the one hand, the crawling of websites that include copyright-protected content and its extraction and reproduction without authorization violates EU copyright law and leads to the imposition of serious sanctions. However, the EU legislator has catered for offering some statutory exceptions, namely the one for private use, the one for temporary copies and the most recent one for Text and Data Mining (TDM) purposes. They will be explored theoretically and practically in order to figure out whether they provide a solution to the problem or if they demand deeper interpretation in light of the jurisprudence and the continuous technological development. The same observations can be made regarding the sui generis database right.

On the other hand, as long as the crawled content might consist of personal data, the matters of privacy and data protection come into play. With its strict regulatory framework EU has struggled to combat the unlawful processing of personal data. Therefore, data-intensive technologies like web scrapers should be designed and operated taking into account the principles prescribed in General Data Protection Regulation (GDPR). Nonetheless, compliance with data minimization, storage limitation, anonymization and lawfulness of the processing might be needed on behalf of data controller and processor along with complex organisational and technical measures. Questions arise concerning the set of rules for data access and scrutiny imposed by the Digital Services Act (DSA), which has not yet completely entered into force.

Finally, by bearing in mind the costs and burdens caused, a balancing between the obligations enshrined by the EU law and an aspiration for a more technological deterministic open data policy, powered by practices like web harvesting is suggested. There is need for transnational and interdisciplinary debate and cooperation.



3:50pm - 4:10pm

DSM to the Rescue? Implications of the new EU Copyright Directive for Social Media Archiving: the Case of the Belgian Transposition and the Cultural Heritage Archives in Flanders.

Ellen Van Keer, Rony Vissers

meemoo, Belgium

Main topic : legal context

Keywords : social media archiving, copyright, reproduction, text- and data mining

Social media archiving presents various legal obstacles for cultural heritage institutions (CHI’s). The aim of this contribution is to clarify how recent developments in EU copyright legislation can lower the barriers for social media archiving projects in the heritage sector.

Much content on social media is protected by copyright. Rightsholders hold exclusive rights over the use of their works and users need to gain their prior permission for using them. However, due to the large scale and wide diversity of social media content it is not realistic for heritage institutions to get permissions of all potentially involved rightsholders before engaging with social media archiving practices. Of course, this is not a completely new problem. Many items in heritage collections are protected by copyright, which generally lasts until 70 years after the author's death - this term is harmonised across the EU.

In order to keep a fair balance with competing fundamental rights such as the right of information and access to culture, copyright systems foresee a number of exceptions allowing certain uses in the public interest without the burden of prior authorization. CHI’s fulfil public tasks and are an important category of beneficiaries. While copyright remains a national competence, exceptions have been harmonised at European level. A first milestone in this development was the so-called InfoSoc-directive from 2001 (1), which included a closed list of 20 facultative exceptions (Art. 5 InfoSoc). A significant update came with the so-called DSM-directive in 2019 (2) which has been transposed into (most) national law systems over the last few years. In particular two provisions in the DSM are relevant here. First, an exception for preservation of cultural heritage has become mandatory (Art. 6 DSM). It provides a legal solution for digitising and preserving digital cultural content. Secondly, a new exception for text and data mining has been introduced (Art. 3 DSM). This creates a legal framework for large scale digital data collection and the application of AI.

This contribution will discuss the relevant provisions in more detail and clarify how they apply to social media archiving practices and projects in CHI’s, both in view of capture and preservation as well as access and valorization of social media content. As a case in point we will be looking at the Belgian transposition and its implications for the cultural heritage archives in Flanders, but the legislative framework and archival questions addressed bear relevance to the broader international web archiving community.

(1) https://eur-lex.europa.eu/legal-content/NL/TXT/?uri=CELEX%3A32001L0029

(2) https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32019L0790&qid=1695379375092



4:10pm - 4:30pm

Digital Legal Deposit beyond the web

Vladimir Tybin

National Library of France

Under the law on copyright and related rights in the information society passed in France in 2006, digital legal deposit was introduced at the Bibliothèque nationale de France along with web legal deposit. At the time, BnF was responsible for collecting all "signs, signals, writings, images, sounds and messages communicated to the public by electronic means in France". In reality, the library had begun archiving the web and building up collections of French websites long before, since our web archive collections have a historical depth dating back to 1996 and to date represent more than 48 billion URL for 2 petabytes of data. In this way, the heritage mission of collecting everything that is disseminated on the French web in order to build up a national digital memory has gradually developed and been strengthened to the point where it is now an essential component of BnF's historical legal deposit mission. However, it soon became apparent that many digital objects distributed electronically were escaping the automatic harvesting carried out by our robots for technical reasons or simply because of the commercial barriers behind which they were hiding. This is the case for digital books and scores on the market; for digital periodicals, journals, magazines and newspapers; for digital maps and plans; for digital photographs distributed by agencies and authors and videos distributed on streaming and VOD platforms; for applications, software and egames on the market, but also for all the production of born-digital audio documents, music production distributed on platforms. A solution had to be found to guarantee continuity from the physical to the digital world, and to avoid any disruption to BnF's legal deposit collections and heritage gaps. We therefore decided to set up a new system for collecting all born digital documents in the same way as the historical legal deposit system, under which publishers, producers, authors and distributors would deposit their digital files along with their metadata to guarantee long-term preservation, record in our general catalogue and access for consultation in the reading rooms. From digital books to digital sound, not forgetting all other types of digital document, all these channels for entry, cataloguing, preservation and access are gradually being put in place, and are part of a major strategic challenge for the Bibliothèque nationale de France. The aim of this presentation is to describe the various projects that have gone into setting up these systems: changes to the legal framework, technical development of specific tools and workflows, organisational work required to integrate these new processes and the implementation of a scientific policy for legal deposit.