Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Session Overview |
Date: Monday, 07/Oct/2024 | |
8:30am - 9:30am | Registration - Registro Entrance USAL Rectorado Building - Planta Baja Rectorado USAL
Rodríguez Peña Street, 640 |
9:30am - 1:00pm | Workshop 1 Location: Aula 2 - Primer piso En español |
|
ID: 172
/ WS 1: 1
Workshop Keywords: Handwritten Text Recognition, edición digital académica, Transkribus (READ Coop), XML-TEI, EVT (Edition Visualization Technology) Sistemas de HTR, ediciones digitales y acceso al patrimonio textual: taller sobre Transkribus (READ Coop) y EVT (Edition Visualization Technology) University of Verona, Italy El taller que se propone se inserta en el contexto de los estudios sobre transcripción automática y edición digital y tiene como objetivo el de introducir los participantes al uso de la plataforma Transkribus (READ Coop), actualmente el sistema de HTR más fiable y que cuenta con la comunidad de usuarios más extendida. Entre las finalidades del taller subsiste la idea de que el uso consciente de Transkribus pueda proporcionar informaciones muy valiosas para el filólogo digital, impulsando la creación de ediciones digitales académicas de documentos impresos de forma simplificada. En concreto se tratarán los siguientes temas:
|
9:30am - 1:00pm | Workshop 2 Location: Aula 6 - Segundo piso En español |
|
ID: 127
/ WS 2: 1
Workshop Keywords: Codificación musical, Music Encoding Initiative, MEI Introducción a la codificación de música utilizando el formato de la Music Encoding Initiative (MEI) Universidade NOVA de Lisboa, Lisbon, Portugal La comunidad de la “Music Encoding Initiative” o MEI fue inspirada en la “Text Encoding Intiative” o TEI. La comunidad MEI trabaja en desarrollar las mejores prácticas para la codificación de diversos documentos musicales, respondiendo a las distintas necesidades de bibliotecas, archivos musicales, musicólogos, etc. El formato MEI, al igual que el TEI, se basa en etiquetas XML. Dichas etiquetas definen los distintos elementos de codificación musical por medio de sus atributos. En este taller, la audiencia será introducida al formato MEI, su estructura básica y las etiquetas utilizadas para la codificación de elementos musicales como notas, silencios, compases, claves, etc. También codificaremos juntos nuestro primer archivo MEI usando la webapp de mei-friend (https://mei-friend.mdw.ac.at/). Mei-friend puede ser utilizado en cualquier explorador de internet (con la excepción de Safari). La aplicación tiene diferentes páneles, incluido un panel de edición donde el usuario puede ingresar el código MEI y un panel de visualización donde el usuario puede ver la música desplegada por Verovio (herramienta de visualización de MEI). Además de codificar, la audiencia será introducida a varias de las herramientas desarrolladas alrededor de MEI, algunos de los usos de este formato, y lugares donde encontrar más documentación sobre MEI para poder explorar en el futuro (las Directrices o “Guidelines” de MEI y tutoriales disponibles en español). Para participar en este taller, no es necesario tener conocimiento previo de XML, TEI o MEI. Conocimientos básicos de música—como saber cuáles son los nombres de las notas en el pentagrama dada una clave de sol y distinguir las distintas figuras de las notas (blanca, negra, corchea)—no son indispensables, pero sí sumamente beneficiosos para entender plenamente el tutorial. |
11:00am - 11:30am | Break 1 - Pausa Café 1 |
1:00pm - 2:00pm | Lunch - Almuerzo |
2:00pm - 5:00pm | Workshop 3 Location: Aula 2 - Primer piso En español e Inglés - In Spanish and English |
|
ID: 141
/ WS 3: 1
Workshop Keywords: TEI XML, Digital Editions, Semantic Encoding, LEAF-Writer, LEAF-VRE Producing Semantic Digital Editions Using LEAF Commons Tools / Creación de ediciones digitales semánticas con las herramientas de LEAF Commons 1Bucknell University, USA; 2Newcastle University, United Kingdom; 3Universidad de Buenos Aires [See uploaded PDF for full details and Spanish Translation] This half-day in-person workshop introduces textual scholars and practitioners to the LEAF Commons suite of tools, a set of web-based, easy-to-use tools that support text encoding, named entity recognition, web annotation, text analysis and publication without users having to learn complex encoding languages, and supports the easy movement from one interoperable tool to the other depending on users’ needs. These freely available tools support digital scholarly workflows for the collaborative production and publication of scholarly and documentary texts, editions, and collections on the web, without the need for software installation, while promoting best practices for text encoding, annotation, and metadata standards. The LEAF Commons suite enables the use of individual tools for specific purposes, as well as supporting an end-to-end workflow beginning with outputs of optical character or handwritten text recognition systems, transcriptions, or born-digital texts and ends with publication on the web, allowing it to serve a wide range of research and pedagogical uses. LEAF stands for the Linked Editing Academic Framework, a collaborative software suite that provides both a comprehensive virtual research environment but also a set of ‘Commons’ modular tools for text editing and publication. The LEAF Commons tools constitute an accessible, low-barrier, no-cost infrastructure for the production of online texts, editions, or collections, whether for teaching or for undertaking research and collaboration on a sustainable basis. The Commons makes LEAF tools freely available in the browser, enabling collaboration and publication through Github, in addition to local storage. Promoting the reuse of data in keeping with the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, LEAF uses open-source software, open-access platforms, and open international standards for best practices in text-encoding (TEI-XML) and web annotation (RDF). LEAF Commons offers communities of researchers, teachers, and students the opportunity to take part in digital knowledge production and open collaboration. The workshop will end with an open discussion about pursuing such forms of open knowledge production and collaboration. The LEAF Commons tools introduced will include: LEAF-Writer: an open-source, open-access Extensible Markup Language (XML) editor that runs in a web browser and offers scholars and their students a rich textual editing experience without the need to download, install, and configure proprietary software, pay ongoing subscription fees, or learn complex coding languages. This user-friendly editing environment incorporates Text Encoding Initiatives (TEI) and Resource Description Framework (RDF) standards, meaning that texts edited in LEAF-Writer are interoperable with other texts produced by the scholarly editing community and with other materials produced for the Semantic Web. It also incorporates Named Entity Recognition and reconciliation with, or linking to, linked open data identifiers through the incorporation of the NERVE tool. LEAF-Writer is particularly valuable for pedagogical purposes, allowing instructors to teach students best practices for encoding texts without also having to teach students how to code. LEAF-Writer is designed to help bridge the gap by providing access to all who want to engage in new and important forms of textual production, analysis, and discovery. DToC: (the Dynamic Table of Contexts) provides an online interactive reading and publication environment for digital scholarly texts where the two conventional overviews provided in print editions - the table of contents and the index - have been dynamically merged to provide an interactive online e-reading experience that leverages the power of XML markup. Users can build a DToC edition from one or more TEI-XML files, then curate and label the underlying elements and attributes in order to understand where named entities, topics, and concepts can be traced within the edition. Editions can be stored using URLs and shared with readers as published or teaching texts. LEAF-TE: (The LEAF Turning Engine) is a web interface that enables users to easily and automatically transform documents between formats. It converts HTR/OCR output (from various sources including Trankribus) to TEI-XML for importing into LEAF-Writer or other editors.) It converts TEI-XML to HTML, Markdown, and plain text for exporting to web publishing and text analysis environments, including the Dynamic Table of Contexts. NERVE: (the Named Entity Relationship and Vetting Environment) is an application that performs Named Entity Recognition (NER) on machine-readable texts, allowing users to identify candidate entities in a document, review, and correct the results . NERVE suggests relevant Uniform Resource Identifiers (URIs) for entities, so users can reconcile data to an authority such as Wikidata or the Virtual International Authority File to provide the basis for Linked Open Data (LOD) Web Annotations. Users can export their reconciled data in TEI-XML or HTML formats to an online repository or to their desktop. NERVE will be able to be used from within LEAF-Writer, or as a stand-alone tool and will be demonstrated during the workshop. |
5:00pm - 8:00pm | Guided tour BA -Visita guiada BA |
Date: Tuesday, 08/Oct/2024 | |
8:30am - 9:30am | Registration - Registro Entrance USAL Rectorado Building - Planta Baja Rectorado USAL
Rodríguez Peña Street, 640 |
9:30am - 1:00pm | Workshop 5 Location: Aula 6 - Segundo piso In English |
|
ID: 161
/ WS 5: 1
Workshop Keywords: digital publication, minimal computing, JavaScript, infrastructure Introduction to publishing TEI with static sites and front-end technologies University of Maryland, United States of America Scope of the workshop This half-day workshop will introduce strategies for handling TEI when publishing with static site generators and front-end technologies. The workshop will focus on isomorphic approaches to publishing TEI on the web or, in other words, publishing TEI with little or no transformation, or with a structure-preserving mapping that allows working with the output as if it were the initial data source. In particular, attendees will be introduced to CETEIcean as a way of publishing TEI with minimal (or no) transformation (Cayless and Viglianti 2018). The main focus of the workshop will be learning how this approach can be used in conjunction with static site generators and will work on examples in “vanilla” JavaScript, React, and Gatsby. This workshop is aimed at attendees who already have some experience with programming (including XSLT) and the command line; however, all are welcome and will be supported as much as possible throughout the workshop. A version of this workshop was previously given at the Text Encoding Initiative conference in 2022 and 2023. The 2023 workshop had an attendance of about 25 individuals, including grad students, faculty at various stages of their career, and research software developers. This version of the workshop will be lightly but significantly updated with a new template and new examples. Instruction will be in English with bilingual slides English and Spanish. Motivation Digital humanities projects that result in the creation of digital output—typically a website—digital editions are prone to what Smithies et al. call the “digital entropy of software and digital infrastructure” (2019). Static sites have become a common choice for archiving legacy projects that risk going offline (Smithies et al. 2019, Summers 2016) because they only require the absolute minimum from hosting infrastructure: a server to distribute documents at a given address. The sites themselves, once created, require no active maintenance and can be easily moved and transferred like any other collection of files. The Endings project at the University of Victoria, British Columbia (https://endings.uvic.ca/), for example, recommends static sites as a viable strategy for ensuring the longevity of Digital Humanities project publications. The Endings Principles for Digital Longevity include, among other strategies, the reduction of both software complexity and dependency on infrastructure. On the other hand, static sites cannot support features that would require an active server, such as large scale text search and user management; these features, therefore, are removed when projects are archived into static sites. Deriving static sites from an end-of-life project is the clear choice when access to infrastructure becomes limited. But this workshop addresses the question: What does it take to adopt static sites from the start? *Schedule and requirements* After an introduction on static sites and the motivations for using them, the workshop will cover the following topics:
In order to account for multiple levels of expertise, we may break into multiple groups for attendee-led collaborative work.
Example TEI documents and a Gatsby site template will be provided. Attendees are encouraged to bring their own TEI to work with. Participants must bring their own laptop and be able to install (free) software on it. Internet access will be required. The tutor will require a projector. References Cayless, Hugh, and Raffaele Viglianti. “CETEIcean: TEI in the Browser.” Presented at Balisage: The Markup Conference 2018, Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21 (2018). https://doi.org/10.4242/BalisageVol21.Cayless01. Smithies, James, Carina Westling, Anna-Maria Sichani, Pam Mellen, and Arianna Ciula. 2019. “Managing 100 Digital Humanities Projects: Digital Scholarship & Archiving in King’s Digital Lab.” Digital Humanities Quarterly 013 (1). Summers, Edward. 2016. “The Web’s Past Is Not Evenly Distributed.” Maryland Institute for Technology in the Humanities (blog). May 27, 2016. https://mith.umd.edu/webs-past-not-evenly-distributed. Biography Dr. Raffaele (Raff) Viglianti is a Senior Research Software Developer at the Maryland Institute for Technology in the Humanities, University of Maryland. His research is grounded in digital humanities and textual scholarship, where “text” includes musical notation. He researches new and efficient practices to model and publish textual sources as innovative and sustainable digital scholarly resources. Dr. Viglianti is currently an elected member of the Text Encoding Initiative technical council and the Technical Editor of the Scholarly Editing journal. |
9:30am - 5:00pm | Workshop 4 Location: Aula 2 - Primer piso En español |
|
ID: 150
/ WS 4: 1
Workshop Keywords: publicación, edición académica digital, anotación asistida por ordenador, modelo de procesamiento, interfaz TEI Publisher: ediciones sofisticadas sin necesidad de programar 1Jinntec, Germany; 2e-editiones, Switzerland TEI Publisher fue inicialmente concebido como una herramienta con la que cubrir la brecha entre los archivos TEI con la codificación de las fuentes y su publicación como edición académica digital. TEI Publisher es software libre y de código abierto cuyo desarrollo es respaldado y coordinado por la sociedad internacional sin ánimo de lucro e-editiones. La piedra angular de TEI Publisher es la implementación del Modelo de Procesamiento de TEI (TEI Processing model), el cual define de manera descriptiva cómo los diferentes elementos de un documento TEI deben ser presentados para su publicación. Sin embargo, el desarrollo de una edición digital va más allá de la mera transformación de elementos, por lo que TEI Publisher permite generar fácilmente la interfaz de la edición (navegación, paginación, presentación de diferentes versiones incluida la edición facsimilar, y un largo etcétera). En su versión más reciente, la 9.0.0, TEI Publisher ofrece numerosas funcionalidades que facilitan cada una de las etapas de la elaboración de una edición digital: aparte de incluir nuevos componentes que permiten una presentación del texto crítico altamente sofisticada (sin necesidad de programar), TEI Publisher asiste al equipo editorial en la creación de las fuentes TEI, en la anotación editorial, analítica y/o semántica, así como en la exploración de la edición a través de un motor de búsqueda facetada y de texto integral, y de registros de entidades. El objetivo de este taller es mostrar cómo podemos utilizar TEI Publisher para generar ediciones digitales que respondan a las necesidades informativas del equipo editorial. Enseñaremos cómo construir una interfaz adaptada a cada tipo de edición, con el fin de que esta transmita el discurso histórico-filológico deseado. Haremos hincapié en las nuevas funcionalidades que facilitan la anotación editorial, analítica y semántica (esta última adaptada al español). Los puntos más relevantes del programa son:
No se requiere que los y las participantes sepan programar, pero sí que estén familiarizadas con XML-TEI. Se recomienda una cierta familiaridad con el Modelo de Procesamiento de TEI y con XPath. Materiales en inglés de talleres previos están disponibles en un repositorio en GitHub, que iremos completando con materiales en español. |
11:00am - 11:30am | Break 1 - Pausa Café 1 |
1:00pm - 2:00pm | Lunch - Almuerzo |
2:00pm - 5:00pm | Workshop 6 Location: Aula 6 - Segundo piso In English |
|
ID: 169
/ WS 6: 1
Workshop Keywords: XPath, XSLT, data modeling, visualization, sustainability Navigating and Processing Data from the TEI with XSLT 1Penn State Erie, United States of America; 2University of Graz, Austria; 3Heidelberg University, Germany Knowing how to locate and explore data in your encoding can help to learn how to work with TEI and XML generally. This workshop is designed for people who have some experience with TEI and seek to learn how to work with XML markup for analysis and research. Participants will gain a working, practical knowledge of the query language XPath and the transformation language XSLT, and learn how these can help to reduce reliance on software, packages and plugins that may become obsolete without warning. Further, XSLT's functional programming can serve as a way of articulating research questions around a document data model expressed in XML. The emphasis of our workshop is “pull-processing”: that is, extracting data and metadata from markup documents for analysis, as opposed to providing the reading view of a digital scholarly edition. Markup in documents supplies structures and contexts that are especially useful for processing data, beyond what we can do with so-called "plain text". We will demonstrate some basic XPath navigation and calculation functions, and then show how XPath is applied in XSLT templates to address specific nodes that hold data of interest for visualization. We will process TEI documents composed in Spanish and in languages represented by our workshop members' projects, to show that the code we write is transferable to multiple projects across language and cultural borders. Workshop instructors will collaborate and seek advice from the conference organizers on preparing Spanish-language source materials and documentation to establish an international foundation for this workshop. Participants will learn how to "pull" data from TEI and output text formats required for simple online tools, where the structure of the output data is transferable to many different online calculation programs and amenable to statistical processing. During the workshop we will produce some simple structured documents for storing, sharing, and visualizing data: HTML lists and tables as well as plain text tabulated data (CSV or TSV files), and (if we have time) simple SVG bar or line graphs. We hope to process some participant-supplied XML before, during, and after the workshop. We will carefully document the XSLT that we supply during the workshop to assist participants with revising and adapting the code to their own projects. Outline Review and refresh understanding of XML tree structures Orientation to XPath Teach basic XSLT to produce simple outputs ready for analysis and visualization Room/Materials Required Instructors need: projector that can connect to laptop, network access helpful! Participants should bring laptop computers if possible. If classroom with computers is available: provide guest login access to computers and install oXygen XML Editor. Instructors can provide complementary 90- or 120-day licenses for the oXygen XML Editor. Limit to 25 participants so instructors can connect with everyone. Workshop instructors Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College An active member of the Text Encoding Initiative (TEI), Dr. Beshero-Bondar serves as an elected member and now chair of the TEI Technical Council, an eleven-member international committee that supervises amendments to the TEI Guidelines. She has been teaching humanities in web-savvy ways since the 1990s, and began teaching markup languages and XML stack processing almost as soon as she began learning them in the 2010s. Before moving to direct the DIGIT program at Penn State Erie, she directed Pitt-Greensburg's Center for the Digital Text. She has led TEI data modeling of the Frankenstein Variorum project, the Digital Mitford Project and other digital research projects involving TEI XML to build editions and prepare structured analyses of variants and collocations in texts. Find her on GitHub at https://github.com/ebeshero and on her development site named for her pet firebelly newts at https://newtfire.org. Dr. Martina Scholger Centre for Information Modelling - Austrian Centre for Digital Humanities, University of Graz Martina Scholger has a PhD in Digital Humanities and holds a Senior Scientist position at the Centre for Information Modelling – Austrian Centre for Digital Humanities at the University of Graz. Her main research field is digital scholarly editing, the application of digital methods and semantic technologies to humanities’ source material, and text mining. In addition to teaching data modelling, text encoding and X-technologies, her work at the centre involves the conceptual design, development and implementation of numerous cooperation projects in the field of digital humanities (see http://gams.uni-graz.at). She has been an elected member and past chair of the TEI Technical Council since 2016, and a member of the Institute for Documentology and Scholarly Editing since 2014. She has been teaching at a number of Summer Schools and workshops in the context of digital scholarly editing, e.g. at the Digital Humanities at Oxford Summer School and Schools organised by the Institute for Documentology and Scholarly Editing (IDE). |
5:00pm - 7:00pm | Visit National Library - Visita Guiada BN |
Date: Wednesday, 09/Oct/2024 | |
8:30am - 9:30am | Registration - Registro Entrance USAL Rectorado Building - Planta Baja Rectorado USAL
Rodríguez Peña Street, 640 |
9:30am - 11:00am | Long Papers 1 Location: Aula 2 - Primer piso Session Chair: Clayton McCarl, Univ. of North Florida TEI Good Practices - Buenas Prácticas en TEI |
|
ID: 145
/ LP 1: 1
Long Paper Keywords: historia de libro, bibliografía, inventarios, bibliotecas privadas, libreros El lenguaje de los libros: analizando los inventarios de libros en Navarra en los siglos XVI y XVII Indiana University Bloomington, United States of America Para los historiadores del libro, los inventarios que documentaban el contenido de las bibliotecas privadas y las tiendas de libreros e impresores son entre las fuentes más importantes para entender cuáles libros circulaban en un periodo y lugar determinado. Estos inventarios tienen entradas que enumeran los libros de una colección, y el proceso normal para los historiadores del libro es tratar de descifrar las descripciones de los libros hechas por los escribanos, profesionales de libros, u otros individuos para identificar un libro en específico. A pesar de ser fuentes muy cruciales, todavía no entendemos muy bien cómo se hacían los inventarios y especialmente si existían normas de descripción debido a que la información dada en una entrada de un inventario puede variar a sobremanera. Hay evidencia que había por lo menos algunas normas ya que unas obras y unos autores populares se referían por nombres estándares. Por ejemplo, en un pleito entre profesionales de libro en Navarra en España en 1665, la viuda de un librero declaró que las obras de Pedro de Barbosa “ordinariamente se rotulan petri barbossa opera” (AGN, n. 189680, fol. 580r), insinuando que había una forma regular de referirse a este autor y sus obras. No obstante, la aplicación de TEI a estos inventarios nos ofrece un método para entender mejor el lenguaje bibliográfico formado por estas normas. En esta ponencia, voy a demostrar la utilidad de TEI para el estudio de los inventarios de libros a través de un corpus de 15 inventarios de bibliotecas privadas de clérigos y 3 inventarios de libreros de Navarra en los siglos XVI y XVII transcritos en TEI. Primero, argumento que el uso de TEI permite que los académicos compartan sus transcripciones más fácilmente no solo como documentos a través de ediciones digitales sino también como datos de los inventarios, especialmente en comparación con los métodos tradicionales para diseminar los inventarios transcritos. Segundo, afirmo el uso de TEI nos permite empezar a desvelar el lenguaje bibliográfico ya mencionado porque con el marcado descriptivo de TEI es posible especificar los diferentes tipos de información en una entrada y cómo los individuos se referían a diferentes obras y autores. Mi análisis preliminar de estas entradas como datos estructurados revela que existía un lenguaje bibliográfico más estandarizado entre los profesionales del libro en cuanto a la estructura de las entradas y las referencias a ciertas obras y autores. Sin embargo, los inventarios hechos por escribanos u otros individuos no manifiestan la misma consistencia en su estructura, pese a que sí usan más frecuentemente las denominaciones normativas para obras y autores. Fundamentalmente, esta presentación evidencia el papel crucial de TEI para apoyar una comprensión mayor de un tipo de fuente común en la historia del libro. Referencias Pamplona, Archivo General y Real de Navarra (AGN), n. 189680 ID: 139
/ LP 1: 2
Long Paper Keywords: Project management, Authority files, Linked data, Digital humanities training, Correspondence Encoding Edgeworth: TEI Development for Correspondence and Entity Indices University of Tennessee, Knoxville, United States of America The Maria Edgeworth Letters Project (MELP) aims to make an open-access digital archive containing the complete correspondence of the Anglo-Irish Regency author Maria Edgeworth (1768-1849) and her circle. TEI is central to this project as it is used to encode the text of the letters, create indices of named persons, places, and works, and ensure interoperability of the project long term. Letters in both English and French are present in the corpus. To date, over 200 letters have been encoded and made available on our beta site - https://melp.dh.tamu.edu/. In the next three years, this inter-institutional collaborative project plans to encode an additional 500 letters and provide minimal metadata for all remaining digitized letters using elements with the teiHeader. This presentation will cover the workflows currently being used to achieve this work as well as challenges the team is currently working through. For current workflows, the majority of encoding is completed by graduate assistants who are aiming to advance their skills in the digital humanities. These students use a TEI template to guide their letter encoding while spreadsheet data entry is the foundation for the indices. Adding entities to the indices involves in depth authority research using VIAF, GeoNames, Wikidata, and the Library of Congress Name Authority File (LCNAF). Separate TEI files for persons, places, and works are subsequently generated from the spreadsheets using GREL in OpenRefine. GitHub is key to managing the TEI contributions of all project contributors for version control purposes. As many in the humanities have not used GitHub previously, training is essential. Possible solutions for future work will also be featured. As the team continues to add entities to our indices, opportunities to further capitalize upon the content negotiable authorities we have added needs to be explored. At present on our beta site the links are simply displayed when an established entity is present in a letter. In order to ensure the validity of our TEI, a goal is to set up automated testing in GitHub. Special considerations when encoding the French correspondence also need to be established and incorporated into existing documentation. Solutions for the idiosyncrasies we uncover within the letters while encoding, such as letters within letters, notes added by librarians, and the inclusion of unique diagrams or figures, also need to be continually developed. |
9:30am - 11:00am | Long Papers 2 Location: Aula 3 - Primer piso Session Chair: Elisa Beshero-Bondar, Penn State Erie Large Language Models and TEI |
|
ID: 159
/ LP 2: 1
Long Paper Keywords: large language models, generative ai, letter edition Empowering Text Encoding with Large Language Models: Benefits and Challenges University of Graz, Austria This contribution will discuss how Large Language Models (LLMs) can be used to support and enhance text encoding with the standard of the Text Encoding Initiative, demonstrating an exemplary workflow – from model creation to data extraction to data analysis to presentation – in the context of a letter edition. ID: 163
/ LP 2: 2
Long Paper Keywords: Knowledge acquisition, Technical documentation, RAG, LLM, Information retrieval Enhancing Technical Knowledge Acquisition with RAG Systems: the TEI use case 1MandaNetwork, France; 2ENSTA Paris, France In an era dominated by an explosion of technical documentation across diverse domains, the need for effective knowledge acquisition mechanisms has become paramount. The assimilation of the “Text Encoding Initiative” (TEI) guidelines, for instance, presents challenges for organizations and individuals seeking to effectively adopt its encoding principles. Retrieval-Augmented Generation (RAG) systems emerge as a promising paradigm to address this challenge, seamlessly integrating information retrieval with natural language generation to facilitate the acquisition of technical knowledge from large documentation material. In this publication, we explore how RAG systems can mitigate these challenges while maximizing the benefits of TEI adoption, particularly in the context of learning and implementation. Challenges in learning and adopting TEI guidelines revolve around the complexity of the markup language and the diverse skill levels of users. Mastering TEI requires familiarity with its intricate syntax, encoding conventions, and domain-specific applications, posing a steep learning curve for novices. Furthermore, the extensive volume of its published guidelines poses another challenge, even for experienced users, in efficiently retrieving relevant information. RAG systems provide a novel approach to technical knowledge acquisition by seamlessly integrating the power of Large Language Models (LLMs) and specialized knowledge. The logic behind a RAG system lies in its ability to leverage pre-trained LLMs to generate informative and contextually relevant responses based on retrieved information. This approach has emerged to tackle the hallucination issue observed in generative models. It achieves this by enriching the context necessary for these models with knowledge sourced directly from relevant documents. Consequently, RAG systems offer a promising solution to TEI adoption and learning challenges by providing a more adaptive and interactive content. Through advanced natural language generation capabilities, RAG systems can generate tailored explanations, examples, and walkthroughs of TEI encoding practices, catering to the specific needs and skill levels of users. By leveraging retrieval mechanisms, RAG systems can retrieve relevant TEI guidelines and examples from its extended documentation, facilitating self-paced learning and knowledge acquisition. Such interactive systems can empower users by providing assistance in the creation of TEI-compliant markup, streamlining the encoding process, and thereby reducing errors and inconsistencies. Furthermore, RAG-generated summaries and explanations elucidate the rationale behind TEI encoding decisions, enhancing transparency and reproducibility in digital humanities research. Upon experimentation with state of the art models, we observed the persistence of some technical challenges for this ultimate goal. First, a pre-processing of the documentation is necessary to overcome issues related to the tokenization process. Moreover, a chunking strategy for such a rich documentation has to be carefully defined to enable more precise information retrieval and complete response from the AI assistant. In addition, choosing the right prompt remains crucial to frame the context and the expected outcome in order to generate accurate responses. In conclusion, RAG systems offer a transformative approach to learning and adopting TEI guidelines, mitigating challenges and maximizing benefits for knowledge acquisition. By leveraging RAG systems to facilitate TEI learning, organizations can empower users to unlock the full potential of TEI for data interoperability, scholarly communication and digital humanities research. ID: 137
/ LP 2: 3
Long Paper Keywords: Digital edition, TEI-XML markup, Named Entity Recognition (NER), standardized metadata From Catullus to Wikidata: Language Models, Metadata Schemes, and Ontologies in a Digital Edition in XML TEI 1UNLP; 2CONICET; 3AAHD; 4UNSL El presente trabajo detalla diferentes tareas de marcado y procesamiento del lenguaje natural realizadas en el proyecto Aetatis Amoris, dedicado a la creación de un sitio web que explora la poesía amorosa a través de la historia literaria. El proyecto se centró en la edición digital enriquecida de obras de poetas latinos clásicos como Cayo Valerio Catulo, Albio Tibulo y Sexto Propercio utilizando el estándar XML-TEI para el estructurado y marcado de los textos. Las tareas iniciales incluyeron la generación automática de los elementos principales del documento, tales como header y body, además del conteo y etiquetado automático de versos y estrofas. Posteriormente, utilizando el modelo avanzado de spaCy para latín, LatinCy, se extrajeron y marcaron automáticamente nombres de personas y lugares con sus etiquetas correspondientes. Además, se llevaron a cabo búsquedas de metadatos normalizados utilizando API externas, consultando bases de datos como Virtual International Authority File (VIAF), el proyecto Pleiades y Wikidata. Esto permitió recuperar identificadores normalizados, información rica y curada, así como imágenes de lugares y personajes históricos. Mientras que las herramientas automáticas facilitaron significativamente el proceso de edición digital, la vasta cantidad de información recuperada también plantea desafíos significativos en la curación de datos y la evaluación de calidad, redefiniendo el papel del editor en el entorno digital. This paper details various tasks of markup and natural language processing conducted in Aetatis Amoris, a project dedicated to exploring love poetry throughout literary history. The project focuses on the enriched digital edition of works by classical Latin poets such as Gaius Valerius Catullus, Albius Tibullus, and Sextus Propertius, using the TEI-XML standard for encoding the texts. Initial tasks included the automatic generation of the main document elements, such as header and body, and the counting and automatic tagging of verses and stanzas. Subsequently, using the advanced spaCy model for Latin, LatinCy, names of people and places were automatically extracted and tagged with the corresponding labels. In addition, searches for standardized metadata were carried out using external APIs, and consulting databases such as Virtual International Authority File (VIAF), Pleiades project, and Wikidata. This allowed for the retrieval of standardized identifiers, rich and curated information, and images of historical places and characters. As a conclusion, we can state that while the automatic tools significantly facilitated the digital editing process, the vast amount of information recovered also posed significant challenges in data curation and quality assessment, redefining the digital scholarly editor role in the process. |
9:30am - 11:00am | Long Papers 3 Location: Aula 6 - Segundo piso Session Chair: Raff Viglianti, University of Maryland Non western cultures |
|
ID: 164
/ LP 3: 1
Long Paper Keywords: right-to-left scripts, computing history, Unicode, Arabic TEI and Right to Left Scripts: How We Got Here and What We Can Do About It Duke University, United States of America A variety of problems face those who wish to use TEI with texts using right-to-left (rtl) scripts, making the baseline “plain text” mode of editing unusable. This paper will survey the technical history of why we ended up in such a situation, discuss possible mitigations and their benefits and drawbacks, and make a call for investment in addressing this serious deficiency in our tooling. ID: 168
/ LP 3: 2
Long Paper Keywords: fonts, text rendering, writing systems, typography Dubsar: A New Approach to Rendering TEI Documents and Non-Western Text Independent Reseacher Rendering text is hard. A typical internationalized text rendering system cumulatively represents hundreds of thousands of lines of code, spread between operating systems, software libraries, and application programs. And yet, despite millions of person-hours and dollars worth of engineering, much of contemporary software still struggles to handle non-European, non-alphabetic text correctly. Programmers have heretofore typically assumed that "text", simpliciter, is fundamentally a one-dimensional sequence of symbols that is divided into a series of horizontal lines. In fact, neither of these assumptions are universally true. Unfortunately, modern text rendering stacks are frequently still oriented towards the needs of, and assumptions peculiar to, European alphabetic scripts. Attempting to retrofit non-European scripts onto an essentially European-Alphabetic model has only exacerbated the inherent complexity of the problem. The highly specialized domain knowledge required to handle non-European writing systems correctly is a serious impediment to internationalizing software. In the case of new or minority scripts, it might well be impossible at present. That fact that these scripts are excluded from common software and operating systems can contribute to their marginalization. The predominant font technology, OpenType (https://learn.microsoft.com/en-us/typography/opentype/spec/), addresses this problem by providing built-in "shaping models" that attempt to capture the logic of how the world's writing systems work. OpenType is highly effective at handling those classes of scripts within its purview, but it cannot easily account for behavior beyond what its designers have anticipated. I therefore present Dubsar, a new computer typography system that provides high quality rendering of complex and minority writing systems and TEI-encoded documents. Dubsar collapses nearly all aspects of text rendering into a single mechanism: a simple programming language, DubsarScript, which allows users to programmatically describe how glyphs are drawn and positioned. Dubsar is not limited in what kinds of writing systems it can express precisely because Dubsar fonts just are arbitrary DubsarScript programs. Moreover, Dubsar embeds an expansive notion of text designed to accommodate writing systems such as Maya, which has so far resisted computerization due to its incredible complexity. Technologists who wish to support internationalized text in their software need only implement an interpreter for the DubsarScript language, which can be readily accomplished without any knowledge of how potentially unfamiliar writing systems work. Due to its relative simplicity, Dubsar furthermore empowers users of uncomputerized minority scripts to create fonts which would otherwise be impossible. The difficulties of text rendering are only magnified in the context of TEI documents. Assumptions that were already tenuous within the plain text regime, such as unidimensionality, here cease to be true in any capacity. Dubsar is especially conductive to rendering TEI documents by virtue of the same properties that lend it to handling internationalized text. This paper will first introduce the landscape of contemporary computer typography. It will then describe the Dubsar system: how it works, how it is implemented, and how Dubsar can be used to render TEI documents. Finally, the paper will give a brief overview of how Dubsar fonts might be authored. |
11:00am - 11:30am | Break 1 - Pausa Café 1 |
11:30am - 1:00pm | Panel 1 Location: Aula 3 - Primer piso Session Chair: Gimena del Rio Riande, CONICET |
|
ID: 162
/ P1: 1
Panel Keywords: scholarly editing, digital humanities, Black DH, multilingual, translation TEI for Black DH: A Conversation between Revue des Colonies and Keywords for Black Louisiana 1University of Maryland, United States of America; 2University of Chicago; 3Yale University; 4Johns Hopkins University; 5University of Connecticut; 6Université Sorbonne-Nouvelle This seven-person roundtable brings together the team from “The Revue des Colonies” with co-editors of the microedition of "Kinship and Longing: Keywords for Black Louisiana" to discuss experiences working with TEI and make the case for modifications to better accommodate representation of Black life in slavery's archives and translation of language of race and empire in the colonial archive. The Revue des Colonies project focuses on the eponymous journal, edited by Martinican abolitionist Cyrille Bissette. Published between 1834 and 1842, it was the first French periodical for and by people of color. Its monthly issues provided news about ongoing struggles for civil rights across the French colonial world and beyond, alongside original and reprinted fiction and poetry by global Black writers. Led by an international and interdisciplinary team of scholars including project director Maria Beliaeva Solomon, technical director Raffaele Viglianti and co-editor and translator Grégory Pierrot, the project to digitally annotate and translate this invaluable record of the global history of colonization, enslavement and abolition emerged in response to the absence of any complete and searchable collection, let alone translation, of the journal's complete print run. Supported by the Foundation for the Remembrance of Slavery, the Schomburg Center for Research in Black Culture (New York Public Library), the National Archives, and the Andrew W. Mellon Foundation, the project aims to restore the emancipatory rhetoric of the Revue des Colonies within the political, material, and cultural contexts of its publication and make it accessible to new generations. At the heart of this endeavor lies the Text Encoding Initiative (TEI), which serves as "technology of recovery," as articulated by Kim Gallon in “Making a Case for the Black Digital Humanities'' (Debates in the Digital Humanities, 2016). TEI not only facilitates the digitization process but also becomes an instrument to guard against the unwitting reproduction of power dynamics embedded within the original texts. Building upon the imperative articulated by Kelly Baker Josephs and Roopika Risam in their introduction to The Digital Black Atlantic to resist technology's historical perpetuation of dominant narratives of oppression, our edition's critical apparatus aims to highlight the original contributions of Black authors, editors, journalists and activists to the political and cultural transformations of the nineteenth century. Our TEI customization focuses on the encoding of named entities in order to provide contextual and critical annotation. The tagging features provide a perfect opportunity to create substantial, and cross navigable entries for individuals, events, and organizations that have been overlooked in scholarly discourse. To increase accessibility to the Revue sources, we also provide professional English translations. On the Keywords side, beginning in Fall of 2020, with the leadership and collaboration of founding director Jessica Marie Johnson, guidance of Alex Gil, and editorial assistance of Raffaele Viglianti, Leila Blackbird, Olivia Barnard, Emma Katherine Bilski, and Ellie Palazzolo began working on a microedition for the journal Scholarly Editing (https://scholarlyediting.org/). The vision was to edit and publish a handful of documents from eighteenth-century colonial Louisiana—transcribed and translated via the Louisiana Historical Center Colonial Documents Digitization Project—to draw attention to stories of African and African-descended people in that archive. Keywords for Black Louisiana is supported by the National Historical Publications and Records Commission. The team has edited fifteen stories composed of twenty-one documents spanning from 1740 to 1795, bridging French and Spanish colonialism in Louisiana and reflected carefully on the possibilities and limitations of TEI. Some of the original documents are in French and others are in Spanish. All have been transcribed and translated into English. The transcriptions and translations are marked up in TEI. The primary tags used are <persName> to identify named and unnamed individuals and <seg> with @ana for Keywords. One cornerstone presentation on Black DH and TEI, Caitlin Pollock and Jessica Lu's 2019 talk "Hacking TEI for Black Digital Humanities," influenced the Keywords project's approach to textual encoding early on in the editorial process. Pollock and Lu invited editors to work with TEI on Black history and archives to push some of the boundaries and conventions that the dictionary so meticulously documents. We encountered a number of questions and roadblocks that validated Pollock and Lu's call, and did a bit of hacking ourselves that we propose to present at the annual meeting while speculating on and encouraging further formal interventions in TEI. By bringing together these two projects, we aim to foster conversation about how TEI can become more accountable to critical, postcolonial, and Black DH. We will build on the similarities and shared challenges and commitments as well as the differences between these two archives towards a far-reaching set of interventions. We propose additions to the TEI dictionary related to representation and translation of racialized language, to print as well as manuscript documents, and to documents written by and for as well as documents written about Black actors. |
11:30am - 1:00pm | Short Papers 1 Location: Aula 2 - Primer piso Session Chair: Hugh Cayless, Duke University TEI workflows |
|
ID: 109
/ SP1: 1
Short Paper Keywords: Python library, format conversion, NLP, genres Lessons learned from developing a customisable tool for TEI processing and handling of various TEI schemas 1Eötvös Loránd University, Department of Digital Humanities; 2National Laboratory for Digital Humanities; 3Eötvös Loránd University, Atelier Department of Interdisciplinary History; 4Eötvös Loránd University, Doctoral School of Informatics Throughout maintenance and systematic extension of five (currently medium-sized) corpora from different genres encoded in TEI schemas we have observed limitations regarding handling and enrichment faced by non-technical researchers. This bottleneck in further document processing steps has hindered our efforts to attract a larger userbase among students and researchers. While pursuing our goal to standardise the common processing steps that connect to the already standardised data storage format (i.e. TEI schema) we developed a lightweight Python library intended for solving conversion, linguistic annotation, and metadata extraction tasks in a unified manner. Intended for users with minimal technical knowledge our tool provides a high-level API for a range of TEI-XML-related tasks including validation, format conversion/text and metadata extraction for downstream tasks, and TEI-compatible linguistic annotation. We distinguish TEI schemas (e.g. for poems, dramas, novels, folk song, news articles) as genres, where each genre represents a unique (valid) TEI document structure. Our library (teiutils) consists of an API skeleton that provides handling of built-in genres and allows the easy development of custom bundles to be attached as Python modules without further restrictions. This approach creates a standardised framework extendible to numerous genres. Our library allows using multiple NLP pipelines to accommodate different languages, while supporting conversion to common output formats (JSONL, customisable HTML, sentence per line, vertical XML format for Sketch Engine corpus query framework) for using our corpora outside of TEI. We have also defined different TEI schema levels to fit NLP and genre-specific annotations, while adhering to the original text. This enables users to generate different annotation levels from raw TEI documents then convert them into another format in batch with only a few API calls programmatically. Furthermore, we present our observations and experiences and developed best practices regarding compatibility between annotations, TEI structures, and fidelity to original source texts. ID: 133
/ SP1: 2
Short Paper Keywords: text encoding, markdown, encoding tools, drama EasyDrama: a lightweight solution for encoding plays in TEI/XML Digital Humanities Potsdam Although in many cases TEI/XML markup can be automated, a lot of TEI/XML documents are still encoded manually due to limitations of technology, complexity of the annotated phenomena, or the desire of the researcher(s) to stay close to the material and be in control. In such cases, the entry threshold for manual TEI encoding becomes a challenge. To turn raw text into TEI, one has to familiarise oneself with XML and learn heavy-weight annotation tools like Oxygen or CATMA. When it comes to markup workshops with non-digital scholars, one must spend considerable time getting the participants familiar with the tools and the format. In the DraCor (dracor.org) project, it is important to enable people without technical background to encode drama in TEI/XML. Therefore, we are working on lowering the encoding threshold. One approach is EasyDrama (github.com/dracor-org/ezdrama) — a markdown-like language to encode the main structural elements of drama. In EasyDrama, speeches (TEI element <sp>), speakers (<speaker>), stage directions (<stage>), as well as acts and scenes (nested <div>-s) are encoded with just a handful of metasymbols (#@$%\n). This encoding is automatically translated to TEI/XML following a deterministic procedure. EasyDrama became popular within the DraCor community. It is even sometimes preferred by people with technical skills and knowledge of XML. The balance between the simplicity of the markup and its unambiguous translation to TEI/XML seems to appeal to encoders. Depending on the uniformity of the source, markup can be accelerated with simple search-replaces, regexes, or LLMs. It is easy to few-shot-learn LLMs to output EasyDrama, and still have more control than in end-to-end generation. While EasyDrama is a niche solution and does not replace other tools in the TEI/XML ecosystem, it can serve as a primer for interface simplification that increases the speed of drama encoding and lowers the threshold for encoders. ID: 118
/ SP1: 3
Short Paper Keywords: Calderón de la Barca, Character Annotation, DraCor, Natural Language Processing, Theater From Annotations in TEI to Natural Language Processing: A Computational Analysis of Characters in Calderón Drama Corpus 1Eberhard Karls Universität Tübingen, Germany; 2Universität Stuttgart, Germany The TEI-encoded Calderón Drama Corpus (https://dracor.org/cal) represents an important milestone by enabling the use of the digital methods for investigating Calderón’s work, such as the extent to which rules or genre conventions are followed (Ehrlicher et al. 2020; Lehmann & Padó 2022). An aspect of this corpus that is highly promising for future research concerns the treatment of characters and character types such as the ‘gracioso’. However, character-level information such as gender, social role, honorifics, or character types, is scarce: some of it can be recovered from set lists, some from secondary literature, but independently of the source, normalization and representation remain a challenge. On this poster, we report on our studies in which we enhance the TEI encoding of the Calderón Drama Corpus with character information, where available, and outline how this information can be used to recognize other characters within the vast corpus that fit into these archetypes, based on their speech and social relations, with minimal manual intervention and employing large language models and automatic classification. Bibliographic references Lehmann, Jörg & Sebastian Padó. «Clasificación de tragedias y comedias en las comedias nuevas de Calderón de la Barca». Revista de Humanidades Digitales 7 (27 November 2022): 80-103. https://doi.org/10.5944/rhd.vol.7.2022.34588. ID: 179
/ SP1: 4
Short Paper Keywords: TEI-C, github, organization, metrics, community C is for Co(nsortium|uncil|llaboration|ntributors|mmunity), or What Can GitHub Issues Tell Us About the TEI? Simon Fraser University, Canada The TEI’s GitHub organization is the central home for development of the TEI Guidelines, Stylesheets, and many other associated tools, projects, and working groups. Among other things, each repository contains every version of every source file, a full log of every change committed, a list of all releases and their source files, and a list of all completed and outstanding issues (or “tickets”). These issues are key to the distributed, asynchronous, and transparent work of TEI Technical Council and, much like the TEI listServ, the repositories provide an important channel for the TEI community to propose and suggest changes, raise issues, and ask questions. But they also serve as an incredibly useful record of the TEI’s development work over the last decade or so (since the migration of the TEI’s codebase from SourceForge in 2015). Every bug or feature request is (theoretically) logged in the repository, which also (theoretically) chronicles a history of that particular issue: who raised it, who resolved it, who responded, when was it closed, and under what circumstances. Drawing on recent research and initiatives into evaluating and measuring the “health” of open source code and communities, this paper investigates what an analysis of the TEI’s GitHub data might yield for understanding the relationship between the various groups that make up the TEI-C: council, consortium, contributors, collaborators, and community. Using metrics defined by the Linux Foundation’s “Community Health Analytics in Open Source Software” (CHAOSS) project (e.g. time to first response, time to close, and “bus factor”), this paper will present a critical analysis of the issues raised on the TEI’s two primary GitHub repositories (TEIC/TEI and TEI/Stylesheets) and a discussion of what, if anything, these metrics can tell us about the past, present, and future of the TEI. ID: 175
/ SP1: 5
Short Paper Keywords: digital edition, commentary, latin-polish translation, metadata Neolatina Sarmatica - from Web 1.0 to Web 3.0 Jagiellonian University, Poland The original resource and reason for the creation of the Neolatina Sarmatica project is Cochanovius Latinus, a collection of completed Latin works by Jan Kochanowski, published electronically (2006-2011) with the first contemporary translation and commentary. This attempt to present Kochanowski's Latin in digital form was based on static HTML page structures and the commentary reproduced the approach of the print editions with its structure and visualisation. As new tools were developed and became more widespread, a new edition based on the TEI-publisher engine was proposed in 2024, with the aim of enriching it with metadata and consequently increasing the reader's ability to search and interact with the main text and supplementary texts (critical apparatus, commentaries), supporting texts, comparisons with other texts, and comparisons of different translations. This approach makes the texts more accessible to readers with different philological competences: scholars who are well versed in the original language and the historical and cultural context of the works, as well as less qualified readers. The new opening of the texts to different types of readers has made it possible to update the commentaries and adapt them to the requirements of the modern reader, as well as to use the possibilities of varying both the scope and the way in which the texts are visualised. Work is currently in progress to utilise the metadata collected for editing in an ontology being developed, enabling a semantic approach to the text as data. |
11:30am - 1:00pm | Short Papers 2 Location: Aula 6 - Segundo piso Session Chair: Nicolás Lázaro, UCSF Minimal TEI - TEI para ediciones mínimas |
|
ID: 132
/ SP2: 1
Short Paper Keywords: minimal computing, ekphrasis, medieval Spanish poetry, controlled vocabularies, taxonomies Una solución minimalista al problema de la representación de la red de motivos artísticos en el Libro de Alexandre IIBICRIT-CONICET, Argentine Republic This short paper aims to address a challenge encountered in the initial stages of developing my digital edition of ekphrasis fragments from the Libro de Alexandre. Specifically, the markup of artistic motifs within these fragments emerged as a pivotal aspect of my research, alongside the creation of a website to enable reader interaction with these marked motifs through digital means. I'll discuss the solution adopted within the framework of minimal computing, which facilitated achieving this goal via a static website. Additionally, I'll explore the creation of a controlled vocabulary, where each motif, submotif, and specific term is assigned a unique identifier within the hierarchical structure of a taxonomy. Finally, I'll reflect on the practical and theoretical implications of navigating this challenge, arguing that what began as a practical solution to technological and financial constraints evolved into a fundamental aspect of my theoretical approach to representing ekphrasis in medieval Spanish poetry through digital means. ID: 131
/ SP2: 2
Short Paper Keywords: minimal editions; correspondence; workflows The accidental maximalist: Or, what's so minimal about minimal editions? Rutgers University, United States of America Minimal editions have become popular because they answer the problem of curating texts under a set of constraints. However, encoding and publication workflows, even for minimal editions, run the risk of becoming bloated if the editor loses sight of the project’s priorities. Over a project’s life cycle, it is normal that these priorities will shift under the pressure of this or that constraint or challenge. Several authors have noted that minimal computing stacks displace complexity away from users and onto the editor or technical partner (Dombrowski 2022; Giannetti 2019; Hughes 2016). Identifying the necessary technical complexity is rarely easy. I focus on the latter two of Risam and Gil’s four-question heuristic for minimal computing—“what must we prioritize?”; and “what are we willing to give up?”—in order to demonstrate the importance of reengaging with these questions as new challenges come to light (Risam and Gil 2022). With the Personal Correspondence from the Rutgers College War Service Bureau, I initially made choices consonant with a minimal approach. The edition navigation replicated the file structure of the archival collection, organized by the name of the alumnus serving in World War I. The project schema exhibited straightforward choices regarding the encoding of people, places, and events. However, the plan to capture biographical information on each Rutgers person mentioned nearly ran aground when a single soldier mentioned fifty classmates in his letters. A desire for a more enriching reader experience led to a choice to provide subject access to the letters, which in turn brought classification and UI difficulties. In this case study, I provide substance to the hard decisions editors face amidst evolving priorities and the need to curtail some plans. Documentation of best practices in this area will serve other editors who must make pragmatic choices in the service of local knowledge production. ID: 113
/ SP2: 3
Short Paper Keywords: Martín Fierro, Edición digital, lectura cercana, cultura gaucha, literatura argentina MiniFierro: la voz del gaucho a través del marcado TEI Universidad del Salvador, Argentine Republic Desde el laboratorio de Humanidades Digitales del Conicet estamos elaborando la edición digital de la primera publicación del poema gaucho “Martín Fierro” del año 1872. Esta edición busca ser una lectura cercana del mismo. A través del marcado TEI, se rescatan aspectos que no se encuentran presentes en otras ediciones digitales. En este proceso se nos han presentado varios desafíos: desde localizar una digitalización de buena calidad a realizar elecciones dentro del marcado. Nuestro objetivo es rescatar todos los aspectos del poema que resultan llamativos en su presentación y presentar un vocabulario que haga más ligera su lectura y comprensión. Lo más llamativo de este poema es que, no solo relata la vida del gaucho y su cultura, sino que refleja su modo de hablar particular, lo que aporta su singular riqueza al poema. Nuestra edición resalta la edición príncipe del poema, siendo lo más fiel posible a la misma. Para así, destacar su valor como una de las obras cumbres de la literatura argentina y de la cultura gauchesca. ID: 112
/ SP2: 4
Short Paper Keywords: historia, pintura colonial, pintores, pinturas, religión/history, colonial painting, painters, paintings, religion Edición digital anotada del "Diálogo sobre la historia de la pintura en México" Instituto de Investigaciones Estéticas, Mexico El Diálogo es considerado un clásico en la historia del arte mexicano por ser el primero que ofrece una síntesis sobre pintura colonial. Me propuse crear un recurso primario de consulta, con materiales no descargables que incluya el texto codificado con el estándar XML-TEI y criterios de Minimal Computing, notas de contexto, el facsímil digital del libro de 1899 y una línea de tiempo con los hechos y pinturas mencionados. Para la edición he usado los módulos de TEI obligatorios (core header, tei y textstructure) y los opcionales drama, namesdates, transcr y verse; lenguajes de marcado de hipertexto (HTML), de diseño (CSS) y programación (JavaScript y Ruby), y un generador simple para sitios web estéticos (Jekyll). El proyecto es parte del trabajo final del Máster en Humanidades Digitales de la UNE dirigido por la doctora Gimena del Río Riande. The Diálogo is considered a classic in the Mexican art history for being the first to offer a synthesis of colonial painting. I set out to create a primary reference resource, with non-downloadable materials which includes the text encoded with the XML-TEI standard, with Minimal Computing criteria, contexts notes, the digital facsimile of the 1899 book, and a timeline with the events and paintings mentioned. The project is part of de final work of the Master in Digital Humanities at UNED directed by Dr. Gimena del Río Riande. ID: 108
/ SP2: 5
Short Paper Keywords: María Mercedes Carranza, edición crítica digital, Asamblea Nacional Constituyente, Marcado TEI, Computación mínima. Hacia una edición crítica digital de los documentos del archivo de María Mercedes Carranza relacionados con la ANC Universidad de los Andes, Colombia Se presentará el diseño preliminar de una edición filológica digital de los documentos vinculados a la participación de María Mercedes Carranza (1945-2003), poeta, periodista y gestora cultural colombiana, en la Asamblea Nacional Constituyente (ANC) de Colombia en 1991. El objetivo principal de esta investigación es contribuir al reconocimiento de la relevancia de la participación de Carranza en la ANC mediante la creación de una edición filológica digital basada en su archivo personal. Esta iniciativa estará acompañada de una estrategia de divulgación web destinada a ampliar el alcance de la edición. La presentación comienza con una introducción concisa que presenta el problema de investigación y sus objetivos. A continuación, se encuentra una sección dedicada a la metodología. Esta sección se centra en el uso del estándar TEI para el marcado de todo el material epistolar de Carranza relacionado con el proceso constituyente, así como la cobertura periodística del mismo y las leyes discutidas en la comisión primera, de la cual ella fue parte. Se muestra en esta sección los detalles metodológicos clave que se entrelazan con debates teóricos relevantes, como la idoneidad del uso del estándar TEI, sus desafíos (como las etiquetas en inglés y la escasez de documentación en español, así como la interoperabilidad), y sus beneficios (alta flexibilidad y facilidad de aprendizaje). También se aborda la metodología para la publicación de la edición, que implica el uso de computación mínima y se plantea una discusión en torno a qué podemos entender por minimalismo en las Humanidades Digitales del Sur Global. Por último, se presentan los resultados esperados, junto con sus posibles alcances, y la bibliografía pertinente. Este diseño permite visualizar de manera clara y concisa los objetivos, métodos y potenciales contribuciones de la investigación, así como las implicaciones teóricas y prácticas de esta. |
1:00pm - 3:00pm | Lunch - Almuerzo |
3:00pm - 4:30pm | Long Papers 4 Location: Aula 2 - Primer piso Session Chair: Carlos Nusch, UNLP CONICET AAHD Encoding and analysis 1 - Codificación y análisis 1
|
|
ID: 171
/ LP 4: 1
Long Paper Keywords: marcado XML-TEI, edición digital académica, interfaces de visualización, Progetto Mambrino, TEI Publisher Interfaces de visualización de ediciones digitales académicas en XML-TEI: el caso de la Biblioteca Digital del Progetto Mambrino University of Verona, Italy La presente comunicación tiene el objetivo de reflexionar sobre el estado actual de las interfaces de visualización de las ediciones digitales académicas y aportar nuevos datos acerca de las posibilidades de publicación de ficheros textuales en el formato XML-TEI. El tema se abordará a partir de la experiencia madurada por el Progetto Mambrino (Universidad de Verona), en el contexto de la creación de una Biblioteca Digital de las novelas caballerescas italianas del Renacimiento, que son traducciones y continuaciones del famoso género español de los libros de caballerías. En concreto, se relatan las distintas fases que dentro del proyecto han llevado a la personalización de la aplicación TEI Publisher para la visualización de las ediciones, teniendo en cuenta también aspectos relacionados con la accesibilidad y la experiencia del usuario. ID: 152
/ LP 4: 2
Long Paper Keywords: tei, iiif, faircopy, editioncrafter, coredata TEI and IIIF Technologies in the Native Bound Unbound Project Performant Software, United States of America The Native Bound Unbound project seeks to document the lives of indigenous enslaved people in the Americas. Led by Dr. Estevan Rael-Gálvez, the project aims to tell the stories of the millions of indigenous people who were enslaved since the arrival of Columbus. Our firm, Performant Software Solutions, was selected to develop and implement the data infrastructure and website for Native Bound Unbound. In this presentation, we will discuss our approach, which utilizes technologies based on the TEI Guidelines and the International Image Interoperability Framework (IIIF). To create the fullest picture we can of the individuals whose lives are being documented, we need to gather information from as many sources as possible. This means we must manage many different types of primary sources, including: baptismal records, census records, court cases, tombstones, and oral histories. Texts are often handwritten in a variety of languages including Spanish, English, and Dutch. From this material, we need to distill information about individuals, events in their lives, and the places they lived and worked. One possible approach would be to read all this material and simply key it into a database. However, this would not make it possible for other researchers to examine our evidence and draw their own or further conclusions. Furthermore, this project is for the public as much as it is for researchers. So we need to both transcribe and translate the material we are working with so that we can display it on the website alongside the page images. These sources are then linked to the records generated from them and the whole process is open to inspection. Structuring the texts and marking up the people, places, and events in them allows us to programmatically enter the records into the database. These then need to be scanned for duplication and de-duplicated. Updates to the database or the transcriptions should not cause a loss of data or duplicate data. And, of course, our understanding of what information we want to collect and can collect has been evolving throughout this process, so we need a certain level of flexibility in our data models. Performant brought to this project an array of open source projects that we have developed for previous digital humanities projects, plus some projects developed by other groups that aligned with our client’s needs. These tools include: FairCopy, Core Data, EditionCrafter, FromThePage, Splink, and our IIIF CMS. These tools, taken together, provide a complete workflow for the data infrastructure of this ambitious project. They also provide an application programming interface on top of which we were able to construct the public facing website. This presentation will provide a detailed look at our technical approach, these tools, and the final project website. ID: 151
/ LP 4: 3
Long Paper Keywords: TEI Publisher, publishing, collaboration, interoperability, good practice Good practice built in: case study of e-editiones community e-editiones, Poland DH research usually works with very diverse datasets and tools: from facsimiles, through HTR or OCR, transcription, collation, entity recognition, authority links to further annotation and, eventually, publication online or in print. So much more is within our easy reach than even just a decade ago and these opportunities are widely embraced in our community. Nevertheless, creation of a digital resource is an intrinsically complex subject, as measured by a random assortment of acronyms from TEI to IIIF, so the average scholar can't be expected to have an overview of the ever fluctuating technological landscape. And yet she is still ultimately responsible for designing and orchestrating complex, collaborative workflows and has to directly face the short- and long-term consequences of each decision. I will discuss what happened when a community of practitioners instead of accepting the unfavourable situation, worked together firmly prioritizing the development of a generic, systematic solution above the specific, without compromising on the latter, analyzing the connected cases of TEI Simple, TEI Processing Model, TEI Publisher framework, and collectives like e-editiones and Sources Online. I would like to demonstrate the visible impact these collaborations, grounded in the mottos of "power to the editor" and "standardize where you can, customize where you must" had on the landscape of digital scholarly editions in recent years. With Publisher now in version 9 and an impressive international community of e-editiones, we can make some interesting observations. About 40 projects chose to register on the e-editiones website, similar number is currently in preparation and we estimate perhaps another hundred that we never yet heard about. I will talk about various observable trends among editorial projects that we can trace back to resources - software releases, demos and samples - made available by the community across the years. These range from functional features to adopting the text encoding strategies and data organization practices, to standards and open libraries, e.g. XML vocabularies like TEI, JATS or DocBook; MathML and TEX; IIIF for the images together with OpenSeaDragon and Tify viewers; Verovio for sheet music; OpenAPI for the specification of the programmatic interface and DTS for machine-consumption of digital text collections and so on. This has a profound effect on establishing a consensual good practice and base level of interoperability for digital humanities projects. For example, each and every TEI Publisher-based application exposes well documented programmatic interface, e.g. DTS protocol or authority registries. When it's available as default and by design integrated into the framework we treat interoperability as a priority and achieve it out of the box. Similarly, the re-use approach led to emergence of specialized data models, successfully adopted wholesale or customized by different projects. Concluding, the availability of an open source framework which allows users to mix existing solutions, strengthened by a supportive community of practice has proven extremely successful. For a practical demonstration of the power this approach affords, I will discuss the Sources Online project. |
3:00pm - 4:30pm | Long Papers 5 Location: Aula 3 - Primer piso Session Chair: Martha Eladia Thomae Elias, Universidade NOVA de Lisboa Encoding and analysis 2 - Codificación y análisis 2 |
|
ID: 125
/ LP 5: 1
Long Paper Keywords: esclavitud, edición, marcado, ética editorial ¿Cómo editar los censos de esclavos? Univ. of North Florida, United States of America Entre los siglos XVI y XIX, se prepararon en varias partes de las Américas enumeraciones de “propiedad humana”. Algunas son documentos mercantiles relacionados con el transporte y la venta de personas esclavizadas, otras son inventarios que figuran en los testamentos de los esclavizadores, y otras son registros burocráticos a propósito del cobro de impuestos y de los procesos de manumisión. Frecuentemente señalan el nombre, el sexo, y la edad de cada persona. A veces también indican una ocupación y, con una indiferencia espeluznante, una valoración monetaria. Son textos que nos pueden ayudar a rellenar vacíos que existen en la historiografía sobre la esclavitud y las poblaciones afrodescendientes, tanto en Latinoamérica como en los Estados Unidos. Son también documentos que nos permiten entrever aspectos íntimos de las historias de individuos y a veces de familias, además de pensar en la relación entre la escritura y la opresión. En esta presentación reflexiono sobre los retos a que nos enfrentamos a la hora de editar estos textos. ¿Cómo transmitir la información que contienen sin reducir a las personas a meros datos? ¿Cómo no victimizar de nuevo a los individuos que aparecen en ellos? ¿Cuáles son nuestras responsabilidades como editores ante textos tan deshumanizantes? Repaso primero las ideas de otros estudiosos que han editado documentos relacionados con la esclavitud. Después, explico los experimentos que he hecho en colaboración con mis estudiantes a propósito de dos textos del siglo XIX: un censo de esclavos de la década de 1840 en la región de Antioquia, Colombia, y el testamento de 1857 de Abraham DuPont, el dueño de una plantación en el condado de St. Johns en Florida. Comparto las ideas preliminares que hemos desarrollado en cuanto a un modelo para el marcado que tome en cuenta la humanidad de las personas enumeradas. También considero el diseño de interfaces que nos dejen interactuar con los textos marcados como conjuntos de seres humanos y no como simples aglomeraciones de datos. ID: 166
/ LP 5: 2
Long Paper Keywords: XML-TEI, ontology, GIS, LOD, sentiment analysis Encoding Travel Literature: Analyzing European Cultural Heritage through the Perspective of Latin American Women Writers Universidade Nova de Lisboa, Portugal The representation of cultural heritage in travel literature often reflects a curated selection influenced by various factors, from authorised discourses shaped by prior readings or guided tours to the personal interests of the authors. We unveil a dynamic historical narrative by conducting a comparative analysis of the insights provided by Latin American women authors regarding European cultural heritage at the turn of the 20th century. This exploration serves as a witness to the multitude of sensibilities surrounding European cultural heritage and the discordances therein. Thus, from the perspective of otherness, we study which European artistic manifestations or cultural practices arouse the interest of these Latin American women travellers since they decide to dedicate a few words to them and even express their emotions. The methodology of the REWIND project integrates different methods and tools from the field of Digital Humanities that consider aspects of cultural heritage related to its spatiotemporal coordinates and its impact on the construction of historical memory. In the framework of the Deep Data concept, we use a structured intelligent dataset to organise the information on cultural heritage in an ontological model that allows detecting patterns or relationships challenging to observe by humans, as well as to apply Distant Reading techniques such as for example, Sentiment Analysis. From the perspective of literary geography, GIS tools are used to analyse the relationship between space and cultural heritage. This dataset complies with FAIR principles to facilitate the reproducibility of the results and promote the reuse of the data in other research. We create the dataset by encoding the travel books using XML mark-up language and the standards proposed by the Text Encoding Initiative to extract information on the cultural heritage elements mentioned by female travellers. On the one hand, a modernised diplomatic digital edition of each publication is created so that not only the structure of the books is annotated, but also the original formatting of the text (italics, capital letters, indentation, etc.) and typos are preserved while adapting the Spanish to the latest edition of the Ortografía de la Lengua Española (2010). On the other hand, tagging is used to identify people, places, objects, and events, turning each book into a kind of XML database, which can be queried using XQuery expressions. This annotation system makes it possible to differentiate all cultural heritage elements by linking them, whenever possible, both with geographical coordinates, through the GeoNames gazetteer, and with linked open data from Wikidata. In addition, the ROSSIO Thesaurus is used to categorise the different European cultural heritage elements through the @type attribute according to the UNESCO typology that differentiates between intangible cultural heritage and tangible movable and immovable cultural heritage. Listening to the voice of these non-European women, committed to feminism and socio-cultural diversity, offers a counterbalance to the dominant Eurocentric discourse on cultural heritage. Their storytelling, at times, intertwined with European traditions yet often underscored by a distinct sense of otherness, provides fertile ground for constructing decolonial, inclusive, and plural historical narratives. |
3:00pm - 4:30pm | Long Papers 6 Location: Aula 6 - Segundo piso Session Chair: Gabriel Calarco, IIBICRIT-CONICET Théâtre classique, Siglo de Oro, and TEI - Teatro clásico, Siglo de Oro y TEI |
|
ID: 136
/ LP 6: 1
Long Paper Keywords: French drama, DraCor, Conversion, Workflow, XQuery French Drama in TEI: A Workflow for the Continuous Integration of the "Théâtre classique" Corpus into the DraCor Infrastructure 1Freie Universität Berlin, Germany; 2Universität Potsdam, Germany; 3Max-Planck-Institut für empirische Ästhetik, Germany Paul Fièvre's "Théâtre Classique", a growing corpus of French classical theater, has been maintained since 2007 and can be accessed at http://www.theatre-classique.fr/ (Schöch 2007). It is a unique source of currently 1.850 French-language plays and available in different formats, including a TEI P4 version. TC is a dynamic corpus: Paul Fièvre keeps adding new plays, corrects errors and also updates the markup. However, the files tend to have several markup problems, including non-valid TEI, there is no version control of changes and it is hard to keep track. Also, there is no simple way to prepare this corpus for research purposes. To address these challenges and bring the corpus in line with corpora of plays in other languages, we started a project to onboard the corpus to DraCor (Fischer et al. 2019) and keep it up-to-date. The French Drama Corpus (FreDraCor) as we call it, is by far the biggest corpus in the DraCor collection and can be accessed at https://dracor.org/fre. The most significant modifications performed on the original documents include simple things such as adding TEI namespace and XML declaration. The headers were updated with links to authority files like national libraries and Wikidata to provide machine-readable data on authors and works, including premiere and publication dates. We added the particDesc element to collect characters appearing in a play and to give them an ID and add gender info, which is the basis for many research approaches and visualisations based on structural data. For building the FreDraCor documents from the "Théâtre Classique" source, a scripted workflow has been set up that processes the original files with an XQuery transformation. To speed up the process, multiple eXist DB instances can be started in parallel using either Podman or Docker. Some types of corrections cannot be automatised but ultimately rely on manual changes to orthography and markup. For example, about 65% of all stage directions needed to be extracted from headings and speech prefixes via careful examination of suspicious strings. To keep control over the source corpus and our own adaptation and in order not to lose our own enhancements, be they automatic or manual, all changes on FreDraCor are version controlled. In our presentation, we will detail our workflow and show how it is adaptable for similar projects. We will also demonstrate how DraCor corpora are used in research, i.e. how this collection of thousands of TEI documents in more than a dozen different languages can be used to its full potential. ID: 148
/ LP 6: 2
Long Paper Keywords: teatro, Siglo de Oro, edición crítica digital, codificación ortodoxa, humanidades digitales LA CODIFICACIÓN EN XML-TEI DEL TEATRO ESPAÑOL DEL SIGLO DE ORO: PROBLEMAS, SOLUCIONES, ORTODOXIA Y HETERODOXIA Universitat Autònoma de Barcelona, Spain El creciente interés hacia las Humanidades Digitales en la última década ha llevado a la elaboración de muchos proyectos de edición crítica digital, algunos de los cuales realizados en el ámbito del teatro español del Siglo de Oro. Dentro de este grupo destacan La dama boba (dir. Marco Presotto, 2015), las ediciones de la Biblioteca Gondomar Digital (dir. Luigi Giuliani, 2017), las de la Biblioteca digital PROLOPE, relacionadas con el grupo de investigación PROLOPE, La discreta enamorada de Gemma Burgos (2019) y La estrella de Sevilla de Nadia Revenga (2021), dos tesis doctorales surgidas en la Universitat de València. Cada uno de ellos ha significado una contribución inestimable para el desarrollo de este campo, permitiendo evidenciar tanto las potencialidades como las dificultades que se presentan al elaborar un producto digital de este tipo y al codificar textos de teatro áureo. A tal respecto, cabe decir que el teatro español del Siglo de Oro implica desafíos importantes en la codificación XML-TEI debido a que combina la necesidad de marcado de versos y estrofas con los distintos parlamentos de los personajes, lo que favorece el fenómeno de overlapping, si se sigue el modelado más habitual para estructuras métricas y diálogos. Además, analizando el marcado de los proyectos mencionados, se puede constatar que estos abordan la codificación de algunos fenómenos textuales (estrofas, versos partidos, diferentes variantes y errores básicos tipificados en los manuales de crítica textual) de manera distinta. En unos casos, de hecho, se sigue una codificación más ortodoxa respecto a las directrices de TEI. En otros, sin embargo, ellos mismos confiesan acudir a soluciones de compromiso (también condicionados por las visualizaciones que se persiguen), pero en la medida en que esas marcaciones son heterodoxas y diferentes en cada edición crítica digital, se renuncia a un marcado uniforme y realmente interoperable. Mi intención, por lo tanto, es aportar un catálogo de la casuística, discutiendo las soluciones propuestas en los distintos proyectos y dando mi perspectiva sobre su efectividad. Al mismo tiempo, cabría estudiar si realmente las soluciones heterodoxas son las únicas posibles o si hay alternativas ortodoxas preferibles para una mejor interoperabilidad. Si no fuera así, sería deseable alcanzar soluciones consensuadas que se pudieran perseguir en proyectos distintos o realizar propuestas al Consorcio TEI para la resolución de estos problemas. ID: 160
/ LP 6: 3
Long Paper Keywords: edición filológica digital, codificación, TEI, crítica textual, Siglo de Oro Hacia el nuevo paradigma de edición filológica digital: contratiempos, prejuicios y el valor de la labor digital University of Miami, United States of America Esta presentación aspira a compartir una serie de reflexiones en torno a la creación de ediciones filológicas digitales y de las diferentes etapas que esta conlleva: desde el modelado de datos, basado necesariamente en un sólido conocimiento del contenido y la tradición textual de la fuente primaria, pasando por el establecimiento de unas guías tanto de edición como de codificación textual, a un sistema de colaboración y de flujo de trabajo colaborativo, hasta el diseño, la publicación web y la preservación digital. Concretamente, las experiencias nacen del proyecto Pronapoli, https://pronapoli.com/, cuyos objetivos principales son, por un lado, estudiar la estancia italiana del poeta Garcilaso de la Vega, llevada a cabo en Nápoles a intervalos, desde el verano de 1529 hasta la primavera de 1536, y por el otro, publicar una nueva edición crítica digital de su obra poética. Esfuerzos significativos se han ya dedicado a la creación de una versión beta de un primer grupo de textos, disponibles en https://pronapoli.com/ediciondigital/ Para una próxima ronda de financiación (2024-2028), se prevé que el mismo modelo se extienda al resto de textos. Se ofrecerá una visión de conjunto de la edición crítica digital de la obra garcilasiana, así como del proceso ecdótico, el flujo digital filológico, y el resultado final en línea. Interesa especialmente compartir nuestra práctica digital que ha intentado en todo momento correr en paralelo a algunas de las buenas prácticas difundidas en el terreno de las humanidades digitales en general y en el de la edición digital en particular: la codificación en XML-TEI y la consiguiente redacción de una documentación técnica para homogeneizar el trabajo en equipo, una transformación a partir del lenguaje XSLT, así como una estructura mínima; todo ello disponible en acceso abierto en un repositorio en línea. Algunos temas clave a tratar, y estrechamente relacionados con el proceso ecdótico, conciernen, en primer lugar, preconcepciones profundamente arraigadas sobre qué es y cómo debe presentarse una edición crítica, fruto sin duda de una larga tradición secular -especialmente en ámbito hispánico peninsular y en el terreno de la edición de textos medievales y del Siglo de Oro-. Preconcepciones y prejuicios que, en algunos casos, son incluso difíciles de percibir pero que entorpecen el camino hacia el establecimiento de un nuevo paradigma digital. En segundo lugar, se problematiza la búsqueda del equilibro entre el editor filológico (el que establece el texto crítico) y el que lleva a cabo la codificación, dos roles que raras veces se aglutinan en una sola persona. La labor del primero es entendida, mientras que la del segundo permanece todavía en un limbo, las más de las veces relegada a una actividad “técnica”, “mecánica. Nada más lejos de la verdad: la codificación, y en general de cualquier labor digital (e.g. creación de datasets), es una actividad intelectual y profundamente hermenéutica. En fin, otras percepciones sobre el supuesto carácter efímero, superficial o perecedero de los contenidos en la red -contrapuestos a las publicaciones en papel, por ejemplo- desafían la solidez y la autoridad de las ediciones críticas digitales. |
4:30pm - 5:00pm | Break 2 - Café 2 |
5:00pm - 7:00pm | <TEI o no TEI, esa es la cuestión/> - <TEI or not TEI, that is the question/> Location: Aula Magna Medicina. Córdoba Avenue, 1601 José Manuel Fradejas Rueda (Universidad de Valladolid)
Keynote in Spanish |
7:00pm - 8:00pm | Welcome Wine - Vino de Bienvenida |
Date: Thursday, 10/Oct/2024 | |
9:00am - 11:30am | TEI Annual Members' Meeting Location: Aula 3 - Primer piso |
9:00am - 11:30am | Registration - Registro Entrance USAL Rectorado Building - Planta Baja Rectorado USAL Rodríguez Peña Street, 640 |
11:00am - 11:30am | Break 1 - Pausa Café 1 |
11:30am - 1:00pm | Short Papers 3 Location: Aula 2 - Primer piso Session Chair: Martina Scholger, University of Graz Markup and narratives |
|
ID: 176
/ SP 3: 1
Short Paper Keywords: antiracist markup, collaboration, libraries, project management Interlibrary Collaborations: Forging Antiracist, Decolonial, and Inclusive Markup Interventions in Partnership 1Simon Fraser University, Canada; 2University of British Columbia, Canada Collaboration and community have long been central to the TEI-C’s mission and, as many people and projects across institutions and libraries have demonstrated, a powerful way for rethinking texts, encoding, and markup practices (e.g. Warwick 2012; Flanders 2012; Flanders & Hamlin 2013; Green 2014; Lu & Pollock 2019). Yet while inter-institutional partnerships offer a promising model for building both infrastructures and capacity for collaborative text encoding projects, there remains, as Bonn et al (2021) note, “much work to do” in developing best practices, frameworks, and working methods in support of such inter-institutional collaborations. This paper describes some of the challenges and opportunities of inter-institutional partnership that have arisen through the emergent partnership between the Digital Humanities Innovation Lab at Simon Fraser University and the Digital Scholarship in Arts (DiSA) initiative at University of British Columbia, initially developed in support of UBC’s Adaptive TEI Network (ATN). Co-led by doctoral students and faculty, the ATN unites several TEI projects to implement antiracist, decolonial, and inclusive encoding practices as well as challenge the stigma of multi-authorship and collaboration that persists within much humanities scholarship. The ATN has also served as the pilot for our cross-institutional collaboration, allowing us to bridge the two institutions to share resources, expertise, and infrastructure. In this short paper, we will describe how the need for anti-racist markup strategies enabled this partnership as well as discuss the administrative complexity—and necessity— in structuring embedded support for TEI projects across institutions. ID: 144
/ SP 3: 2
Short Paper Keywords: Black DH, AfroLatinidad, AfroMexicans, Nahuas people, Central Mexico Using FairCopy Editor to Encode Blackenss and Indigeneity in Sor Juana’s Villancicos Negros Texas A&M University, United States of America The ten minute tool demonstration will utilize FairCopy Editor to encode the voices of Afro Mexicans and the Nahuas people of Central Mexico that are depicted in Sor Juana Inés de la Cruz’s late seventeenth century villancicos negros. Specifically, I aim to demonstrate how TEI markup language can be used to capture the remixes of Black and indigenous voices in Sor Juana Inés de la Cruz’s Villancico 224 (1676): A la aclamación festiva. ID: 116
/ SP 3: 3
Short Paper Keywords: LGBT, 2SLGBTQ+, sustainabilty, archiving Our History is Missing: Digital Sustainability to Preserve the Legacy of Canadian Lesbian Activism University of Ottawa, Canada This short paper introduces sustainability plans for the Lesbian and Gay Liberation in Canada project, as we transition from a Neo4j database and node.js web app to an Endings Project compliant site. In our short presentation, we will introduce our 2SLGBTQ+ digital history project as a case study for why infrastructural and community support is so critical transmission of 2SLGBTQ+ Canadian history to future generations. In undertaking this move we have drawn on concerns arising from the accessibility of material related to the movement we study. We ask how to best represent the politics of liberation for members of the 2SLGBTQ+ community in Canada based on the intersectional principles articulated by Tremblay and Podmore (2015). The Lesbian and Gay Liberation in Canada (lglc.ca) project has long focused on men and English speakers, mainly because the material representing them is more readily archivable. How can we ensure that the material we produce is readily archivable for the future too? A TEI based project, we would be glad to introduce our move away from a 34,000-record database. The publicly accessible web app that sits atop the database is starting to show its age, leading us to ask: Should we continue to the migrate the database and the project web app forward indefinitely, or should we move to a static site model, with iterative releases, to help ensure that the site can live on without constant technical upgrades? We will introduce our plans to apply principles for sustainability and longevity to help keep the stories of 2SLGBTQ+ liberation on the web. |
11:30am - 1:00pm | Short Papers 4 Location: Aula 3 - Primer piso Session Chair: Frank Fischer, Freie Universität Berlin TEI in multiple languages - TEI en múltiples lenguas |
|
ID: 138
/ SP 4: 1
Short Paper Keywords: TEI, hipertexto, humanidades digitales, multi editorial Edición digital e hipertexto: Plataforma multi editorial y multilingüe para textos académicos y literarios UBA, Argentine Republic La proliferación de contenidos digitales habilita diversos modos de concebir un texto escrito. El presente trabajo reside en la exposición de una plataforma de lectura hipertextual que permite en tiempo real visualizar distintas ediciones y traducciones provenientes de un mismo texto origen. Las ediciones digitales realizadas contienen metadatos y etiquetas compatibles con los lineamientos de la Text Encoding Initiative (TEI), basado en el metalenguaje XML. Así, esta herramienta permite generar comparaciones y análisis de contenidos multi editoriales o multilingües en una misma pantalla de manera personalizada. Dada la visualización flexible de este sistema, es posible exponer intra e intertextualidades que enriquezcan el análisis de escritos académicos y literarios. No obstante, los recursos mencionados conducen a reflexionar sobre la tensión entre lectura, edición y autoría. A su vez, se problematizan los límites y alcances de la edición digital mínima. Resulta de interés indagar la viabilidad de una edición digital híbrida que mantenga las ventajas de la minimal computing e incorpore recursos característicos de los sitios web dinámicos. Por consiguiente, la propuesta implica un recorrido práctico, técnico y teórico que articula los distintos avatares del hipertexto. Considerando que la escritura se asienta en la cultura posmoderna como una multiplicidad, la plataforma y el debate propuestos pretenden enfatizar un destino posible para el hipertexto en el que su lectoescritura no permanezca infinitizada en contenidos fragmentarios, sino bajo una lógica que habilite la lectura crítica, rigurosa y contextualizada. ID: 140
/ SP 4: 2
Short Paper Keywords: Yiddish, Digital Humanities, Encoding, DraCor, Linguistics The “Foreign” Element in Yiddish Freie Universität Berlin, Germany Despite its interesting and problematic theoretical nature, little attention has been given to the “foreign” element in TEI. An exception is the work of Barbaric and Halonja, who investigated this issue in terms of Croatian. They claim that the "foreign" element should be applied to instances of code switching (Barbaric 2012). Distinguishing between code switching and borrowing is a contentious issue among linguists (Mahootian 2006, p. 512). The subtleties of these distinctions are not represented by the "foreign" element, and solutions, like Schlitz's "borrowing" element, have not been incorporated into the standard (“TEI element foreign (foreign)”, Schlitz 2009). This investigation proceeds from observations made during the encoding of plays for DraCor's new corpus of Yiddish drama (YiDraCor), and questions raised over the use of the "foreign" element in the context of Yiddish. Yiddish is a mixed language which has usually been spoken in multilingual contexts (Weinreich 1955). It is thus difficult to distinguish between the mixedness present within Yiddish, and borrowings from without. In the interest of coding, we have adopted a model of "foreign" element use that is primarily morphological, secondarily lexical. If a form evinces productive morphological features, then it should be tagged as "foreign." Failing that, following Stutchkoff and Weinreich, we tag as "foreign" those words and phrases which had entered local Yiddish usage, but had not yet entered Yiddish standards, and which would not likely be understood by those not familiar with their language of origin (Stutchkoff & Weinreich 1951). Generally, we contend that those terms which should be marked as "foreign," are those that evince a certain degree of multilingualism (Grimstad 2017, p. 4). Ultimately, we find that the “foreign” tag is best modeled pragmatically: since there is no inherently foreign component to any language, its use should follow the stated goals of the encoders. ID: 154
/ SP 4: 3
Short Paper Keywords: Unicode, East Asia, writing system, text structure, standardization Japanese Texts and TEI: A Gap Analysis 1International Institute for Digital Humanities; 2Keio University; 3Musashino University; 4National Institute for Japanese Language and Linguistics Japan is among regions where one of the most complex writing system as well as practice - which include typography, text structure, document formats, and the combination of them - have been utilized on a daily basis throughout the history. The Japanese manuscript and printing tradition does not only inherit the traditional Chinese typographical conventions, but also adds its own set of expansions, via ad hoc annotations or modifications, that brought forth subsystems visually and structurally distinct as well as diverse, and often incompatible. Imaginably, there are still a number of those idiomatic concepts do not transfer well into TEI, due to the limitation either in its vocabulary or in the environment surrounding it. We give a brief overview of the current progress and challenges in transcribing premodern Japanese textual elements fully digitally, mostly learned during our TEI encoding project of a large-scale text corpus of a Buddhist canon. As concrete examples, we will address the following topics: portability of characters uncoded and coded in legacy character sets, especially on the limitation of the current gaiji module; support of universal structures found in East Asian documents by guidelines, such as "interlinear annotations" (rubi, bōchū), "subline annotations" (warichū), "closing titles", etc; shortage of available vocabulary for metadata in various ISO and other international standards, most conspicuously with regard to the interaction with written variants of Japanese, which often indicates the genre of the document; community building towards standardization and long-term preservability of digital texts, with non-Anglophone user base in mind; and so on. ID: 105
/ SP 4: 4
Short Paper Keywords: Ediciones académica digitales, edición crítica, ecdótica, paradigma ecdótico digital, recensio, constitutio textus, dispositio textus Edición crítico-genética digital de "Frutos de mi tierra" (1896) de Tomás Carrasquilla Universidad de Antioquia, Colombia Se trata de la presentación del desarrollo metodológico para la elaboración de la edición crítico-genética digital de "Frutos de mi tierra" (1896) de Tomás Carrasquilla, basada en la conjugación del paradigma ecdótico digital de Susanna Allés Torrent (2020): a. Captura: datos informáticos. b. Remediación: el proceso a través del cual los nuevos medios transforman los anteriores. c. Estructuración y modelado: armazón interno y externo plasmado en un esquema XML-TEI d. Publicación e interfaces: e. Archivo y fluidez textual: elementos paratextuales (p. 75). Y las tres fases de la ecdótica colombiana de Edwin Carvajal Córdoba (2017) para realizar ediciones críticas impresas que provienen de la Filología hispánica de Alberto Blecua (1983) y Miguel Ángel Pérez Priego (2011): Recensio 1. Recuento de la búsqueda de los testimonios 2. Relación de los testimonios 3. Descripción bibliográfica 4. Selección del texto base Constitutio textus 5. Cotejo y valoración filológica Dispositio textus 6. Formulación del aparato crítico 7. Criterios editoriales 8. Fijación ( Carvajal Córdoba, 2017, pp. 327-340). El propósito es plantear un flujo de trabajo para producir la edición crítico-genética digital de "Frutos de mi tierra" (1896) de Tomás Carrasquilla mediante la siguiente estructura: Recensio a. Síntesis de las historias de transmisión b. Descripción material de los testimonios c. Selección del texto base d. Captura e. Remediación f. Formulación del juicio crítico o iudicium del editor Constitutio textus a. Cotejo b. Valoración filológica de los resultados c. TEI: estructuración y modelado Dispositio textus a. Composición del aparato crítico b. Criterios editoriales c. Fijación Bibliografía Allés Torrent, S. (2020). Crítica textual y edición digital o ¿dónde está la crítica en las ediciones digitales? En Studia Aurea 14(0). https://doi.org/10.5565/rev/studiaaurea.395 Carvajal, E. (2017). Crítica textual y edición de textos literarios contemporáneos. En Cultura y Memoria. Medellín: Sílaba-Universidad de Antioquia. ID: 146
/ SP 4: 5
Short Paper Keywords: HTR, TEI XML, Data Up-Conversion, Transformation, XSLT Evolving Hands: Investigating Conversion Transformations for HTR to TEI Data Workflows 1Newcastle University, United Kingdom; 2Bucknell University, USA This short paper proposal originates from the Evolving Hands project, but the topic is not the project itself, or Handwritten Text Recognition (HTR), but the conversion workflows for transformation, up-conversion, and enrichment of TEI data produced from HTR. There will only be a short introduction to the project since we’ve discussed it at previous TEI conferences. *Transformation* Transkribus exports to Page XML (though also has premium formats like ALTO and TEI). The in-built TEI export had a number of issues when tested, so some of the case studies in the project used the more complete Page XML export and then Dario Kampkaspar’s Page2TEI XSLT conversion for transforming the HTR transcripts. The paper will investigate some of the options in transformation of files to TEI P5 XML. *XML Up-Conversion* Up-conversion is the generation of more detailed markup from a less-detailed source, usually through recognising implicit structural patterns. During up-conversion the data may be supplemented by techniques such as: - the probabilistic intuiting of existing data structures based on less-detailed markup; - the wholesale replacement of existing hierarchies with more detailed and expressive substitutes; - or the determination of extra annotation based on retrieving data from external sources using contextual clues. As a case study in XML up-conversion the paper uses a dataset of structured print volumes where intra-word formatting is important but ignored by OCR. A cipher-based approach enabled us to preserve the intellectual content. *Data Enrichment* As a conclusion the paper will point to other forms of data enrichment such as editors like LEAF-Writer and related tools to provide the marking of Named Entities as Linked Open Data as a form of data enrichment. The presentation will highlight the ease of use of such tools for data enrichment, especially where LEAF-Writer has a dedicated import for HTR-generated TEI P5 XML files. |
11:30am - 1:00pm | Short Papers 5 Location: Aula 6 - Segundo piso Session Chair: Helena Bermúdez Sabel, Jinntec Image, Music, and TEI - Imagen, música y TEI |
|
ID: 121
/ SP 5: 1
Short Paper Keywords: TEI, East Asia, Critical Apparatus, IIIF Developing An Integrated TEI Viewer for East Asian Classics 1International Institute for Digital Humanities, Japan and Keio University, Japan; 2FLX STYLE CO., LTD.; 3Musashino University and The University of Tokyo In East Asia in general, including Japan, the diffusion of TEI has been extremely limited. Therefore, the authors have made various efforts to overcome this barrier. In the process, we have realized the need for a viewer that meets the needs of East Asian Classics, which are mostly written vertically, and that can be easily used while focusing on vertical display. In order to familiarize researchers of the East Asian Classics with TEI, it was necessary to provide a viewer that could be used easily, while focusing on the vertical display of the text. To this end, Nagasaki developed a simple vertical viewer based on CETEIcean, and customized it to meet each researcher’s needs as Nagasaki presented in the TEI conferences before. Moreover, Mr. Homma joined the team and integrated the customized viewers into a single viewer, while allowing users to easily customize some functions. This is the TEIviewer4EAJ[1] presented here. It supports several major styles of literature in East Asian classics (especially Japanese), especially those related to critical apparatus, waka poetry, and IIIF images, and is currently very popular. If a fragment in IIIF image is described by <zone>, the part can be focused in the viewer. It also provides a function to display the relationship between xml:id and @ref or @corresp in a graph. In addition, in response to requests for special customization, we have developed a viewer[2] for trilingual texts based on the TEIviewer4EAJ. This viewer is also equipped with functions such as the ability to display multiple translations in parallel and to magnify images on the corresponding picture scrolls. In the future, we plan to convert what is possible from this customized version back to the integrated version, while further developing an integrated viewer. ID: 128
/ SP 5: 2
Short Paper Keywords: Humanidades digitales, recuperación de la información, codificación musical, MEI, Guatemala, libros de polifonía, preservación del patrimonio nacional Aplicación de tecnologías de digitalización y codificación para la preservación del patrimonio musical de Guatemala Universidade NOVA de Lisboa, Lisbon, Spain En países como Guatemala, hay poco trabajo de digitalización y codificación de documentos musicales con fines de preservación y acceso. Este artículo presenta la labor de digitalización y codificación de los libros de polifonía de la Catedral de Guatemala que fue llevado a cabo con el propósito de preservar dicho patrimonio musical y darlo a conocer fuera de las paredes del archivo de la Catedral. Esta colección de libros de música es un ejemplo de la transmisión cultural de la música europea en Latinoamérica y es parte importante del patrimonio e historia del país. Con el objetivo de colaborar en su preservación y diseminación, obtuve permiso de parte del Canciller de la Curia Eclesiástica para la elaboración de un proyecto piloto para la digitalización y codificación del primero de estos seis libros, el GuatC 1. Este artículo presenta el proceso seguido para estos fines, el cuál involucra diferentes tecnologías de digitalización y la integración de distintas herramientas de codificación musical. Finalmente, el artículo da acceso a las imágenes a color y alta resolución de los folios del libro GuatC 1 y a su corpus de música codificada en MEI, la cuál puede ser renderizada y escuchada en línea. ID: 142
/ SP 5: 3
Short Paper Keywords: Film Industry, Publishing, Book Market, Correspondence, Network Analysis From Cinema to Publishing, there and back: Encoding a Corpus of Letters Between Filmmakers and Publishers in XML TEI Université de Mons, Belgium The connection between literature and cinema has been extensively studied from various perspectives. However, the intricate interplay between the film industry and the publishing market remains underexplored. As part of a larger research project, the paper investigates how Italian filmmakers influenced the publishing market between the 1950s and 1980s. By employing TEI P5 guidelines, the study reconstructs a corpus of edited and unedited letters from different archives. We will focus on three directors: Fellini, Pasolini, and Petri. While our selection may seem arbitrary, these filmmakers were chosen due to their relationships with various Italian and foreign publishers, authors, and literary agents. Following established models like the Darwin Correspondence Project (https://www.darwinproject.ac.uk), the Van Gogh Letters Project (https://tei-c.org/activities/projects/vincent-van-gogh-the-letters/), the Bellini Digital Correspondence (Del Grosso and Spampinato, 2023, http://bellinicorrespondence.cnr.it), and Vespasiano da Bisticci’s Letters (Tomasi, 2013, 10.6092/unibo/vespasianodabisticciletters), we created a macro.xml file containing access points to the letters. These include descriptions of individuals and their occupations, cited works, names of organizations (publishing houses, producers, etc.), places, and literary and film awards. Each letter is encoded in an XML file, employing TEI elements <CorrespDesc> and <CorrespContext> to provide a comprehensive framework. A generic <rs> tag, with attributes “ref” and “type”, specifies the mentioned elements (person, book, film, organization, profession) in the texts. This detailed encoding sheds light on the publication process and filmmaker-publisher interactions. Considering the possible copyright issues regarding archival material and drawing on programmable corpora methodologies from the DraCor Project (Fischer et al., 2019), visualization software facilitates network analysis without consulting the texts in case of copyright restrictions. This study aims to contribute to our understanding of the historical interplay between cinema and publishing and define a philological model for research in XML TEI. Ultimately, it seeks to provide a reusable and interdisciplinary model for encoding material concerning different creative industries. ID: 158
/ SP 5: 4
Short Paper Keywords: TEI, MEI, Measure, Zones, Open source, IIIF Cartographer App University of Paderborn The Cartographer app is an open-source image markup tool that automatically creates corresponding bars (measures) in Music Encoding Initiative (MEI). It integrates with public libraries ' resources with features like image file uploading, IIIF manifest loading, and MEI file generation. Additionally, it uses an AI library, i.e., Measure Detector, to automate bar position detection. While currently focused on MEI, plans include expanding to TEI and setting it apart with its GitHub and IIIF integration for annotating musical and non-musical elements. |
1:00pm - 3:00pm | Lunch - Almuerzo |
3:00pm - 4:30pm | Long Papers 7 Location: Aula 2 - Primer piso Session Chair: Diane Jakacki, Bucknell University Accessibility, interoperability and Sustainability
|
|
ID: 174
/ LP 7: 1
Long Paper Keywords: digital philology, semantic web technology, digital scholarly edition Integrating TEI XML with Existing Semantic Web Practices for Enhanced Accessibility and Interoperability in Scholarly Editions HUN-REN Research Centre for the Humanities, Hungary In recent years, the integration of the semantic web and linked open data has emerged as a pivotal area in digital philology. This discussion has observed numerous applications of semantic web technology and graph data models in modeling data for scholarly editions. Notably, there has been a shift away from the traditional TEI XML format, initially tailored for digital scholarly editing needs. My presentation will propose a refined architecture that aims to overcome the limitations observed in recent digital philological experiments. The prevalent approach—developing bespoke ontologies, data structures, and corresponding software—tends to isolate digital philology from broader scholarly engagement, inadvertently perpetuating the exclusivity of each edition. Our framework, hosted at DigiPhil (digiphil.hu), leverages TEI XML to ensure the accessibility of texts to a diverse academic audience, including historians and linguists. This accessibility is not limited to traditional close reading but extends to computer-assisted distant reading. We are actively developing tools that facilitate the conversion of XML into various data formats, such as plain text, CSV, and LaTeX. While TEI XML inherently possesses semantic properties, it does not alone bridge the gap between a document network and a data network as envisaged by Berners-Lee. Previous efforts to integrate TEI XML with the semantic web have fallen short, failing to elevate the use of philological data beyond its original academic confines to a broader cultural and scientific arena. Our proposed architecture emphasizes the integration of services and software such as WikiData, GitHub, Zenodo.org, Wikibase, and Invenio RDM—cornerstones of the open data philosophy. However, linking these platforms is not straightforward. This presentation will outline the entire workflow, from the initial editing of scholarly texts to the publishing of semantically rich linked data, describing the metadata relational network of larger text units and the practices of semantic annotation and linking of smaller text segments. Ultimately, I will showcase the DigiPhil infrastructure and the comprehensive workflow we employ, from inception to both print and digital publication, through a case study of a multilingual (Latin and English) digital scholarly edition. References • Rees, Thorsten; Palkó, Gábor: Born-digital archives. In International Journal of Digital Humanities 2019/1, p. 1–11. DOI: https://doi.org/10.1007/s42803-019-00011-x • Palkó, Gábor: The Phenomenon of “Linked Data” from a Media Archaeological Perspective. In The (Web)sites of Memory: Cultural Heritage in the Digital Age, ed. Morse E., Donald; O. Réti, Zsófia; Takács, Miklós; 2018, p. 23–31. Handle: http://hdl.handle.net/2437/280285 • Fellegi, Zsófia: Digital Philology on the Semantic Web: Publishing Hungarian Avant-garde Magazines. In The (Web)sites of Memory: Cultural Heritage in the Digital Age, ed. Morse E., Donald; O. Réti, Zsófia; Takács, Miklós; 2018, p. 105–116. Handle: http://hdl.handle.net/2437/280285 • Graph Data-Models and Semantic Web Technologies in Scholarly Digital Editing. Ed. Spadini, Elena; Tomasi, Francesca; Vogeler, Georg, 2021. URL: https://kups.ub.uni-koeln.de/54580/1/SpadiniTomasi.pdf ID: 143
/ LP 7: 2
Long Paper Keywords: cmc, TEI guidelines, post, archive, correspondence Can we apply the new CMC chapter to the TEI Listserv Archives? An experiment with TEI for Correspondence and Computer-Mediated Communication 1Penn State Erie, United States of America; 2Northeastern University, United States of America The TEI Technical Council is working with the Computer-Mediated Communication (CMC) special interest group (SIG) on introducing a new chapter on CMC for the TEI Guidelines; they expect the new chapter to be released either a few months before or a month after the conference. We propose to test the applicability of the new module to e-mail by encoding a subset of the TEI Listserv archive, and reporting on our successes, failures, and problems. The authors have been involved in a project to transfer those TEI mailing lists currently on the Brown University Listserv server, many of them dating from the 1990s, to a Listserv server at Penn State University. Simultaneously we have been reviewing and working on the introduction of the draft CMC chapter in its late stages of development in 2024. It is not clear to us how well the TEI with the new CMC Guidelines would apply to the archiving of e-mail in general, and in particular to e-mail from a mailing list. At the time of this writing, the draft CMC chapter primarily addresses the kinds of dialogic, conversational messages we encounter in chat forums, social media platforms, and discussion boards as amenable to TEI encoding. These are media that the CMC draft authors Michael Beißwenger and Harald Lüngen described as requiring packaging into “products,” or "posts" prior to transmission over a network. While the draft chapter does not specifically address e-mail, we want to determine how much of the encoding proposed for CMC could apply to e-mail messages posted in the conversational space of an email listserv. We expect that e-mail messages can be addressed in the encoding provided for correspondence introduced to the Guidelines, but we also wonder to what extent the CMC encoding can be blended with TEI encoding for correspondence in encoding in representing metadata about the transmission and distribution of messages. We expect that the CMC encoding is better suited to documenting the distribution of a Listserv post, and that Listserv messages shared to a community of recipients are better served by CMC encoding, while an e-mail archive representing an individual's personal communications over time may be better represented by the TEI correspondence encoding. The authors propose to explore the new encoding of CMC introduced to the TEI Guidelines by applying them at scale to a subset of TEI Listserv archives. We will work with the Listserv archive format currently supplied by Brown and Penn State Universities (versions 16.5 and 17.0 respectively). We will document our steps in transforming the Listerv archive format to TEI using text manipulation tools like Perl, Python, and XSLT. The authors expect that they will each have distinct ideas about how to apply the CMC and correspondence encoding to the data we are working with. We will document and present where our perspectives diverge, and we will seek the thoughts of TEI conference attendees on elaborating best practices for the encoding of e-mail list archives. ID: 114
/ LP 7: 3
Long Paper Keywords: large language model, text encoding, generative artificial intelligence, drama Towards a LLM-powered encoding workflow for plays / Hacia un flujo de trabajo de codificación para obras de teatro impulsado por LLM 1University of Potsdam; 2University of Padua Encoding new texts in TEI-XML format plays a central role in established research projects such as DraCor (Fischer et al. 2019), a major computational infrastructure hosting ‘programmable corpora’ comprising thousands of dramatic texts. As outlined by Giovannini et al. 2023, current DraCor production workflows change according to the initial markup of the texts to be onboarded: while computational transformations (mostly Python or XLST scripts) are applied to sources with basic (HTML) or advanced (XML) markup, texts with no markup are usually encoded through the application of the lightweight markdown language easydrama (Skorinkin 2024) and a successive scripted conversion. The rise of large language models (LLMs), however, promises to further automate such encoding tasks, and scholars have been already exploring the potential of generative AI by developing advanced prompt-engineering techniques to enhance outputs (Czmiel et al. 2024, Pollin 2023). Most efforts, however, seem to have been devoted to shorter textual forms, like letters (e.g. Pollin, Steiner, and Zach 2023), which present comparatively fewer challenges in encoding than longer texts like plays (i.a. due to the tendency of many models to shorten the long output). In this contribution, we present a proof-of-concept demonstrating how LLMs can be efficiently integrated into the corpus building pipeline of a standard DraCor corpus. To this aim, we conduct a series of experiments with several state-of-art models, including both commercial (OpenAI’s GPT, Google’s Gemini) and open-source (Meta’s Llama, Mistral AI’s Mixtral) products, to assess to which extent an largely automated TEI encoding of plays is possible. We therefore discuss strengths and limitations of this approach and eventually propose a set of tailored LLMs prompts which can be used to generate partially ‘DraCor-ready’ files from raw text input. In future developments, we also envision the possibility of fine-tuning existing LLMs with specific DraCor data to improve its performance. |
3:00pm - 4:30pm | Long Papers 8 Location: Aula 3 - Primer piso Session Chair: Gustavo Fernández Riva, University of Heidelberg Encoding and analysis 3 |
|
ID: 130
/ LP 8: 1
Long Paper Keywords: retrocomputing, born digital heritage, TEI encoding Editing Early-Born Digital Text in TEI University of Würzburg, Germany With the increasing prevalence of digital inheritances and digital literary publication forms, digital cultural heritage is gradually coming into the purview of editorial philology. Due to electronic storage, "born digital" resources do not possess a fixed material form but only temporary visualizations. Insofar, their materiality is transmitted to devices and storage media, to whose zeitgeist-specific technological design the storage is bound. A closer examination of digital remnants from the 1980s makes clear the fundamental technological change that has taken place in just a few decades: Neither software nor files are accessible using today's common systems, and handling original storage media and devices requires care and expertise. Emulation is only possible if corresponding digital images of the original storage media are available. For the scholarly editing of early digital material, it follows that digital is not equal to digital, but that digitality arises in various manifestations depending on the respective digital environment. On a ZX Spectrum, this was entirely different from on an Apple II or a Commodore Amiga 4000. This, in turn, creates different conditions for textual criticism, depending on the specific historic or system-specific concept of digitality. To approach these conditions, it seems reasonable to examine the electronic representation of text more closely. In the simplest case, there is a direct, symbolic connection between the binary code on a storage medium and the display on a display device, which is defined by a standard such as ASCII (later included in Unicode). However, depending on the historical environment, a multi-tiered chain can be expected: Companies like Commodore maintained proprietary standards in which the outward appearance of the character inventory could also be individually customized, while standard formats for character sets or their documentation did not exist. For a correct mapping of the byte sequence to the semantics of individual characters, sufficient knowledge of the adaptations is therefore necessary, because otherwise neither can the intended character be inferred from the encoding nor can the encoding be inferred from the outward form. However, this also means that the transmission of text cannot occur without the transmission of the system environment. The same applies to historical compression algorithms, which were often applied due to limited storage space but now need to be determined due to the lack of specifications when one wants to access the digital original text. But if the relationship between the displayed character and the underlying encoding only works best in the context of the original environment: What then is the text to be edited in the digital realm? And how can this relationship be meaningfully incorporated into textual criticism? Do emulations possibly serve a facsimile function? And finally, the crucial question: What does the TEI need to capture firstly the neccessary information in the metadata and secondly to encode the different digital text layers? The contribution discusses these questions theoretically and with reference to several examples from a current editing project. ID: 173
/ LP 8: 2
Long Paper Keywords: handwritten text recognition, deep learning, large language models, XIXth centuries documents in Hungarian Integrating Deep Learning and Philology: Challenges and Opportunities in the Digital Processing of János Arany’s Legacy HUN-REN Research Centre for the Humanities, Hungary In the first decades of the 21st century, we can witness two parallel and closely related trends in the fields of culture and science. On one hand, Artificial Intelligence is transforming and replacing various established cultural practices at an unforeseeable pace. On the other hand, partly due to the digitalization of cultural heritage and partly due to the huge volume of 'born digital' materials being produced, we are literally seeing the creation of data sets and networks of unimaginable scale. However, within the discourse of digital heritage, alongside easily processable and publishable printed or 'born digital' materials, the "real" – that is, handwritten – manuscripts tend to be overshadowed, as they cannot be made searchable with general models that do not consider the specific characteristics of the document group in question. A particular problem is that AI tools work better in major world languages spoken by populations of hundreds of millions, thus for instance, Hungarian handwritten documents are exceptionally underrepresented in the entirety of the digital cultural heritage. Addressing this issue is one of the main goals of the National Laboratory for Digital Heritage project. Led and professionally guided by digital humanities experts from ELTE Faculty of Humanities, this project, which is a collaboration of public collections and research institutions, considers its primary task to be the application of AI tools optimized for the Hungarian language in public collections. One of the most significant achievements of this work is the development of a handwriting recognition model that has made it possible to make János Arany's official documents searchable, thereby making an extremely valuable document corpus accessible to researchers and the general public. The present lecture, in addition to briefly outlining the HTR processing of a significant portion—approximately 30,000 pages—of János Arany’s legacy, fundamentally focuses on two issues. The first issue examines the potential and risks of various deep-learning technologies such as synthetic handwriting generation and large language models (LLMs) in “improving” the text quality of a corpus that is too large to be checked by human involvement, and in making it researchable. The second issue, which I discuss in detail in my presentation, addresses how a corpus converted into text by artificial intelligence can be integrated within the framework of critical text editions created by philologists. Specifically, it queries whether the TEI XML markup language is optimal for the publication of uncorrected HTR-ed documents and how texts of high philological demand in digital scholarly editions can coexist on a single platform with documents "read" by machines. I will present these issues in the context of specific IT developments. References: - Palkó, Gábor–Szekrényes, István – Bobák, Barbara 2023. A Digitális Örökség Nemzeti Laboratórium webszolgáltatásai automatikus kézírás-felismertetéshez. [Online Services of the National Laboratory for Digital Heritage for Automatic Handwritten Text Recognition] In: Tick, József – Kokas, Károly – Holl, András (eds.) Új technológiákkal, új tartalmakkal a jövő digitális transzformációja felé, Budapest, Magyarország: Hungarnet. 207 p. pp. 164–169., 6 p. https://doi.org/10.31915/NWS.2023.24 - Li et al 2022. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models https://doi.org/10.48550/arXiv.2109.10282 ID: 107
/ LP 8: 3
Long Paper Keywords: Novella, Umberto Eco, Literary Forgery, Annotation, Analysis Chasing ‘Carmen Nova’: Encoding and Analysis of a TEI Version of the Crime Novella Allegedly Written by Umberto Eco 1Freie Universität Berlin, Germany; 2University of Siegen, Germany On the first pages of Umberto Eco’s world-famous 1980 novel Il nome della rosa, we read about a book that the narrator finds in an antiquarian bookshop on Avenida Corrientes in Buenos Aires, which sets the plot in motion. With this contribution to the TEI 2024 Conference, we want to bring Umberto Eco back to Buenos Aires under circumstances that are just as intricate as in Eco’s novel. Using a TEI document, our project aims to thoroughly examine an unresolved case concerning the authorship of a crime novella: At the end of 2022, literary scholar Niels Penke discovered a book titled ‘Carmen Nova’ on eBay, which had Umberto Eco as the author. The afterword was allegedly written by Roland Barthes. Several copies of the book, which was published by a fictitious Swiss publishing house, are now known to exist. But as it turned out, the 64-page volume is a literary forgery, as nothing of such a work by Umberto Eco is known. The novella disguises itself as a German translation of the alleged Italian original. The plot revolves around the detective search for a certain Carmen, who, however, does not appear to be a concrete person, but rather a world literary concept that has its origins in Mérimée’s novella Carmen, published in 1845, which in turn was the basis for Bizet’s opera of the same name, which premiered in 1875. The Carmen Nova of the novella is one of these ‘fluctuating individuals’ (Eco 2009, pp. 86–89). Since Niels Penke’s discovery, a community of scholars, interested readers and journalists has formed to find out more about the author (or authors) of this literary forgery, which must have been written in the early 1980s. It is still not known who wrote, printed and circulated this book. The State and University Library Bremen (SuUB) has made a digital scan of its copy in 2023. In order to be able to carry out digital analyses with the full text of the work, we used OCR to convert the PDF to plain text and, after a round of corrections, encoded it in TEI (Araneda Lavín et al. 2023). We published the encoded file by help of the JavaScript library CETEIcean (Cayless & Viglianti 2018). Among other things, we have annotated mentioned persons and obvious spelling mistakes, clues to help uncovering the nature of the text. We will present our annotation strategy and present initial analysis results. Using a Jupyter notebook that utilizes the LXML library to extract information from the TEI-encoded version, we generated quantitative results that provide new insights into the making of this mysterious text. We have released both the TEI and Python code as open source (see bibliography) and hope that the presentation will arouse lively interest among the TEI community in this mystery, which has so far been confined to the German-speaking community. Last not least, it is a nice touch that, through the TEI 2024 Conference, a pseudo-version of Umberto Eco finds his way to Buenos Aires via winding paths. |
3:00pm - 4:30pm | Panel 2 Location: Aula 6 - Segundo piso Session Chair: Stefano Bazzaco, University of Verona Qué está pasando con la TEI en español? |
|
ID: 157
/ P2: 1
Panel Keywords: TEI in Spanish, projects, challenges, multilingualism, community building ¿Qué está pasando con la TEI en español? 1CONICET, Argentine Republic; 2University of Miami, USA; 3University of Florida, USA; 4Universidad de Salamanca (España) y Universidad Pontificia Bolivariana (Colombia) En los países hispanohablantes el interés creciente por la fijación de textos y su transformación en formato electrónico vino en un primer momento motivado por el análisis de textos a través de concordancias, diccionarios, o de la creación de léxicos específicos. La adopción del sistema Madison para la transcripción de manuscritos, desarrollado en el Hispanic Seminary of Medieval Studies (HSMS) de la Universidad de Wisconsin, ampliamente adoptado para textos medievales hispánicos desde fines de la década del 80 del siglo XX, colaboraría con la necesidad de echar mano de un estándar de codificación que fuera más allá de los formatos y que incluyera tanto información lingüística como semántica. En ese contexto, si bien la TEI surge en estos años en ámbito anglosajón, no tarda en interesar a algunos investigadores activos en los estudios hispánicos durante la siguiente década. En 1992, en el marco del Congreso de la Lengua Española celebrado en Sevilla, se organizó el panel “La lengua española y las nuevas tecnologías”, donde se empezó a discutir –quizás por primera vez y gracias a Charles Faulhaber– acerca de la TEI para textos en español. No es un dato menor el hecho de que en este evento se presentó la traducción al español de las primeras Guías Directrices de la TEI, elaboradas entre 1988 y 1990. Las “Guidelines for the Encoding and Interchange of Machine-Readable Texts” (TEI P1), traducidas al español por Francisco Marcos Marín y las bibliotecarias argentinas Verónica Zumárraga y Marcela Tabanera, pueden entenderse hoy como claro ejemplo del temprano interés de la comunidad hispanohablante en la TEI. La TEI es hoy un estándar extendido en la edición de textos hispánicos, aunque, como bien hemos afirmado en otro lugar, muchas veces deslucido por el contexto de trabajo aislado y desfinanciado en muchas academias. Este panel, busca ofrecer una serie de reflexiones sobre los derroteros que esta práctica digital ha tenido en los estudios hispánicos y describir los métodos, prácticas, las necesidades y las colaboraciones que en este momento están definiendo a la comunidad de práctica de la TEI en español. También busca ser un espacio de intercambio y debate con los asistentes al panel, con lo que después de las breves presentaciones, se buscará abrir un espacio de conversación acerca de los posibles futuros de la comunidad. In the Spanish-speaking countries, the growing interest in text markup and its transformation into electronic format initially stemmed from text analysis resources such as concordances, dictionaries, or the creation of specific lexicons. The adoption of the Madison system for the manuscript transcription, developed at the Hispanic Seminary of Medieval Studies (HSMS) at the University of Wisconsin and widely adopted for Hispanic medieval texts since the late 1980s, would contribute to the need for a coding standard that went beyond formats and included both linguistic and semantic information. In this context, although the TEI emerged in the anglophone countries during these years, it quickly caught the interest of researchers in Hispanic studies during the following decade. In 1992, during the Congress of the Spanish Language held in Seville, the panel "The Spanish Language and New Technologies" was organized, where, perhaps for the first time and thanks to Charles Faulhaber, the use TEI for Spanish texts began to be discussed. It is noteworthy that at this event, the Spanish translation of the first TEI Guidelines, developed between 1988 and 1990, were presented. The "Guidelines for the Encoding and Interchange of Machine-Readable Texts" (TEI P1), translated into Spanish by Francisco Marcos Marín and Argentine librarians Verónica Zumárraga and Marcela Tabanera, is a clear example of the early interest of the Spanish-speaking community in TEI. The TEI is now a widely used standard in the editing of Hispanic texts, although, as we have stated, it is often overshadowed by the context of isolated and underfunded work in many academies. This panel seeks to offer a series of reflections on the use of the TEI in Hispanic studies and to describe the methods, practices, needs, and collaborations that are currently defining the TEI community of practice in Spanish. It also aims to be a space for exchange and debate with the panel attendees, so after the brief presentations, we will open a conversation about the possible futures of the TEI Spanish-speaking community. Gimena del Rio Riande is Independent Researcher at the Instituto de Investigaciones Bibliográficas y Crítica Textual (CONICET) in Argentina, where she coordinates HD LAB. Susanna Allés Torrent is Professor at the Department of Modern Languages and Literatures de la University of Miami (Fl, US). Clayton McCarl is Clayton McCarl is Associate Professor of Spanish and Digital Humanities at the University of North Florida (Fl, US). Cristian Suárez is Editor and Professor of Literary Theory at University EAFIT, Colombia. |
4:30pm - 5:00pm | Break 2 - Café 2 |
5:00pm - 5:45pm | In Memoriam C. Michael Sperberg-McQueen Location: Aula 3 - Primer piso Session Chair: Elisa Beshero-Bondar, Penn State Erie |
5:45pm - 7:00pm | Poster Slam and Poster Session Location: Aula 3 - Primer piso Session Chair: James Cummings, Newcastle University |
|
ID: 110
/ PS: 1
Poster Keywords: TEI encoding, newspapers and periodicals, best practices, standardization, digitization Establishing Best Practices for TEI Encoding of Newspapers: A Case Study of the Darmstädter Tagblatt Centre for Digital Editions in Darmstadt (CEiD), ULB, TU Darmstadt, Germany This poster addresses the reuse of periodicals and newspapers and proposes a best practices for TEI encoding in digitization projects, focusing on the Darmstädter Tagblatt. As part of the newspaper working group of DHd – Association for Digital Humanities in the German Speaking Areas, we recognize the growing need for standardized TEI encoding to facilitate data reuse across various newspaper projects. Leveraging insights from a recently initiated series of workshops on the topic of reuse of newspaper data, of which the first is taking place in Darmstadt in May 2024, the objective is to explore the potential of TEI in enhancing access to historical newspaper data while fostering collaboration among researchers. Currently, concepts are being developed and will come to fruition after the initial workshop. One idea is the creation of a universal TEI header that is suitable for editions of newspapers as well as periodicals. An ongoing digitization project at ULB Darmstadt, which is funded by DFG (German Research Foundation), will serve as the scientific basis. The poster will provide an overview of the project “a Darmstadt Newspaper in Three Centuries - Digitisation of the Darmstädter Tagblatt, 1740 – 1986, (Thomas Stäcker, Marcus Müller, Dario Kampkaspar, et. al.)”, highlighting its significance. As one of the longest running periodicals within Germany, it represents a heterogeneous data set and provides an excellent case study for establishing a best practice. It will also address the challenges and opportunities in TEI encoding of newspapers, proposing best practices tailored to meet the needs of diverse projects. Engaging with the international TEI community, we seek to foster discussions on standardization, collaborative markup, and the revitalization of the SIG "Newspapers and Periodicals." By establishing standardized TEI encoding practices for newspapers, we aim to facilitate collaboration, enhance access to historical resources, and advance research in digital humanities. Count: 298 ID: 122
/ PS: 2
Poster Keywords: library, standardisation, digitisation, metadata Developing a Base Format for Heterogeneous Texts According to the TEI P5 Guidelines University and State Library Darmstadt, Germany The University and State Library Darmstadt (Germany) is collecting and digitising texts, i.e., scientific journal articles and monographs, digital scholarly editions, etc., to make them available for open access. Since not only the text types but also the file formats differ greatly, for example, JATS, TEI or non-XML formats, the goal is to transform these heterogeneous formats into one standardised format. For this, a so-called base format is being developed using the TEI P5 guidelines. The base format is basically a subset of the TEI, selected and put together with consideration of what information is needed, what information is present, and how it has to be depicted. The basic structure consists of a <teiHeader>, a <standOff> for entities and a <text> containing at least a <body>, and, when applicable, also <front> and <back>. While the entire content of the input file gets mapped to and converted into TEI, the main focus lies on the metadata in the <teiHeader>. Problems arise on details, for example, how to depict a <persName> (do we have a separation into surname and forename in the source material?) or the title (is the title separated into main title and subtitle?). Moreover, the fact that different text types need different TEI modules has to be taken into consideration, for example an edition additionally requires a <msDesc>. This base format guarantees not only a structured format that is the same for all texts but also that the handling of the metadata meets the criteria of the library, since the metadata of all texts will ultimately be put into the library catalogue. The texts will be made available and searchable on the library’s publishing platform for digital texts ‘TUeditions’ (https://tueditions.ulb.tu-darmstadt.de) at the Centre for Digital Editions. The base format itself will get a RelaxNG schema for validation, as well as a documentation in DITA format. ID: 117
/ PS: 3
Poster Keywords: Calderón de la Barca, Corpus literario, DraCor, Teatro, Análisis cuantitativo, Análisis de Redes Codificación TEI y análisis de redes: a propósito de Calderón Drama Corpus (CalDraCor) v.2.0 1Eberhard Karls Universität Tübingen, Germany; 2Universität Stuttgart, Germany Tras la publicación en su totalidad en acceso abierto en 2022 en DraCor (Fisher et al 2019) bajo el nombre CalDraCor, la codificación en formato TEI de un corpus de 205 obras teatrales de Calderón de la Barca abrió nuevos caminos para la investigación del Siglo de Oro con métodos digitales, incluyendo el análisis cuantitativo (p.e., Ehrlicher et al 2020). Sin embargo, el desarrollo de nuevas preguntas de investigación ha generado la necesidad de un enriquecimiento del corpus más allá de la codificación original. A modo de ejemplo, este póster se centrará en la codificación TEI para el análisis de las redes sociales. Presentaremos la metodología que desarrollamos para realizar la segmentación de actos en escenas a partir de las acotaciones que indican entradas y salidas de personajes a fin de captar la interacción y discutiremos la plusvalía de la codificación TEI con ejemplos del teatro calderoniano. Bibliografía Ehrlicher, Hanno, Jörg Lehmann, Nils Reiter, & Marcus Willand. «La poética dramática desde una perspectiva cuantitativa: la obra de Calderón de la Barca». Revista de Humanidades Digitales 5 (25 de noviembre de 2020): 1-25. https://doi.org/10.5944/rhd.vol.5.2020.27716. Fischer, Frank, Ingo Börner, Mathias Göbel, Angelika Hechtl, Christopher Kittel, Carsten Milling, y Peer Trilcke. «Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama». En Digital Humanities 2019: Book of Abstracts. Utrecht, 2019. https://doi.org/10.5281/zenodo.4284002. ID: 119
/ PS: 4
Poster Keywords: stage direction, drama, theater Enter the <stage>. A Review of the Encoding of Stage Directions within the TEI Max Planck Institute for Empirical Aesthetics, Germany Stage directions and other non-dialogical aspects of the dramatic texts have received only minor scholarly attention. It is therefore not surprising that they remain undertheorized in the TEI as well. This is the host of a variety of problems. For one, it leads to certain not uncommon structures of the drama being impossible to encode within the TEI. For example, the blend of speech prefix and genuine stage direction in “Pendant ce temps, ADRIEN.” has no obvious nor trivial markup within the TEI. More generally, the current guidelines tend to mislead practitioners, resulting in questionable markup choices. Maybe the most aggravating example occurs whenever a dialog is introduced not with a speech prefix but with a stage direction. In these cases, the dialog is habitually ignored as a speech or the stage direction is changed to conform to our idea of a classic speech prefix. These problems are augmented by the fact that the term “stage direction” lacks a stable equivalent in many languages. The poster explores “impossible” and unconventional stage directions from French, Spanish, and German plays from the 17th and 18th century as well as their encoding within the TEI(-adjacent) projects Théâtre Classique, EMOTHE, Deutsches Textarchive, and TextGrid Repository. It aims at a productive criticism of the current TEI schema and guidelines and suggests ways to overcome existing shortcomings. ID: 126
/ PS: 5
Poster Keywords: global, multilingual, antiracist, decolonial, inclusivity The Adaptive TEI Network: Antiracist, Decolonial, and Inclusive Markup Interventions University of British Columbia, Canada This poster presentation introduces the “PhD CoLab” project (University of British Columbia, 2024-26, with the collaboration of the SFU Digital Humanities Innovation Lab, DHIL) which brings together graduate students, faculty, and staff from various fields in humanities, languages, and literatures. While based in Vancouver, an English-speaking North American education system, this multifaceted project is concerned with the continuities and limitations of text encoding across languages (English, Spanish, German and Russian), geographic regions, and literary genres. This poster will provide concrete examples to illustrate the larger objectives of the PhD CoLab. One of the encompassed projects is NovElla, which focuses on making visible and accessible short prose fiction written by early modern Spanish writers. It includes a catalog of annotated bibliographic resources to help promote future research by both students and scholars. Another example, related to Latin America, is Unión Cívica Project that focuses on the newspaper Unión Cívica published by the eponymous political movement founded in 1961 in the aftermath of the Rafael L. Trujillo dictatorship (1930-1961) in Dominican Republic. We will offer high resolution digital reproductions of 140 issues, with annotations, to provide political and historical context. Furthermore, the very structure of the Adaptive TEI Network, rooted in a team-oriented ethos, disrupts the traditional mode of solitary, humanistic research. PhD students collaborate in a transdisciplinary team-based, project-oriented environment where we learn from and with one another while we propose a new TEI schema for text-encoding projects that consider antiracist, decolonial, inclusive and feminist markup practices. In short, the TEI schema aims to address some of the projects’ research questions like: Can we adapt current TEI modules or does an antiracist/decolonial and feminist engagement with the literary text necessitate new TEI markup standards or new modules? Is the TEI also robust enough to address/function for multilingual texts? ID: 165
/ PS: 6
Poster Keywords: tei transformation, JSON, DOCX, HTML, XSLT A Python Library for TEI Conversion into Edition Formats 1Independent researcher; 2Digital Humanities Potsdam TEI/XML was designed to encode the structure and purpose of document parts, and not to represent them visually. Unlike HTML or DOCX, TEI is generally not tied to any particular software that would offer a standard way of rendering it. At the same time, TEI is not as easily interoperable as more popular data exchange formats like CSV or JSON, for which there are tools ranging from Excel to OpenRefine to programming libraries like Pandas. Developers used to dealing with JSON might get confused when confronted with the task of using TEI. All this creates the need for easy-to-use transition tools from TEI to these more common formats. Traditionally, this has been accomplished with XSLT stylesheets (Rahtz, 2006). However, this approach has limitations. As of 2024, XSLT is not the most widespread technology in the developer world and it is not part of most programming courses. For continuous support of TEI in the future, it seems crucial to produce tools that wrap TEI transformation into more widely-known data processing ecosystems than bare XSLT (even if XSLT is used under the hood). With that in mind, we have developed a TEI transformation library for Python (pypi.org/project/TEItransformer/). Currently the library performs conversion of TEI/XML into three formats: JSON, DOCX, and HTML. Each format is created by a separate class with its own settings. For the DOCX and HTML transformation, the user gets to choose a transformation scenario. The client interface for the user is implemented by the TEITransformer class. Under the hood, the conversion is divided into three main steps: validation, transformation, and stylization. The library enables specification of an XML schema and the input of ODD or CSS files for the output customization. Since ODD is XML-based, the user does not need to have knowledge of XSLT to adjust the transformation. ID: 147
/ PS: 7
Poster Keywords: collation, svg, xslt, javascript, interface Visualizing the Frankenstein Variorum 1Penn State Erie, United States of America; 2University of Maryland, United States of America; 3Northeastern University, United States of America The Frankenstein Variorum team has completed work on a digital scholarly edition that compares five distinct versions of the novel Frankenstein. While we have presented this project at several conferences in recent years, we propose to share a new view of the project at TEI 2024: a visual summary of our publication method, and a visual survey of our collation data. Now that we have fully published the Frankenstein Variorum's TEI edition files, we are analyzing what we have learned about how much the novel changed over five distinct instantiations. These include: The 1816 manuscript notebook, The 1818 first anonymous publication, The Thomas copy's marginal handwritten revisions that were later lost, The 1823 edition produced by Mary Shelley's father, The 1831 edition, substantially revised. For the TEI 2024 conference, we propose a poster to display two things: 1) our publishing process applying JavaScript-based static site generation to publish the edition, and 2) a "big picture" view of Frankenstein’s changes over time drawn in Scalable Vector Graphics (SVG) from our TEI data. The poster will show how our project's TEI grounds the edition's interactive visualizations of the novel's transformation over time. Our TEI standoff spine and edition files store the collation data of the novel's five versions, and the pipeline processing and algorithmic refinement of those files has been the subject of several past conference papers and presentations. Now that the edition is complete, the TEI data invites us to analyze the edition's moments of alignment, divergence, and gaps where material was missing or removed. For this poster and for our edition, we illuminate the Variorum interface design and visualize Frankenstein's transformations, each built directly from the TEI. If successful, our poster will welcome discussion of our publishing architecture and invite a holistic, nonlinear exploration of the digital variorum edition. ID: 155
/ PS: 8
Poster Keywords: French, Caribbean, Digital Edition, TEI The Revue des Colonies Scholarly Edition and Translation: a Distributed and Bilingual TEI Project 1University of Maryland, United States of America; 2University of Connecticut, United States of America The Revue des Colonies Scholarly Edition and Translation project marks the first effort to digitally annotate and translate a landmark abolitionist periodical published in Paris between 1834 and 1842. Led by an international team of scholars, the project aims to make the Revue’s invaluable store of journalistic and literary contents accessible to the public. This poster will outline the project’s operations and workflows, as well as describing the TEI encoding strategies that we have adopted and plan to adopt. The project’s operation depends on a dynamic interplay between the encoding team and the editorial team, both geographically distributed and each with distinct roles and expertise contributing to the project's overarching goal. The encoding team includes graduate students currently completing training modules in transcription, TEI encoding, and research in online databases. The editorial team is composed of scholars covering a range of relevant disciplines who bring a deep understanding of the historical and cultural contexts in which the Revue des Colonies operated. These scholars compose meticulously researched annotations for the named entities identified in the text. Their annotations thus shed light on otherwise overlooked individuals, events, organizations, and historical documents, ensuring that the journal's original contributions to global discourse are recognized and contextualized within relevant scholarly fields. Collaboration between these teams is facilitated by an online content management system that exports TEI data and adjusts the project customization ODD as content is added. The project’s TEI customization focuses on the encoding of named entities to enable the creation of bilingual, substantial, and cross-navigable entries. Translation of the Revue itself is undertaken by members of the editorial team with significant experience in professional French–English translation, ensuring both the accessibility of the text to the widest audience and its fidelity to the rhetorical and stylistic features of the original. ID: 149
/ PS: 9
Poster Keywords: publicación, anotación asistida por ordenador, extracción de entidades, FLOSS TEI Publisher 9: más allá de TEI y de la publicación / TEI Publisher 9: going beyond TEI and publication 1e-editiones, Switzerland; 2Jinntec, Germany La filosofía de TEI Publisher gira en torno a la modularidad, reutilización y sostenibilidad gracias al uso de estándares. TEI Publisher nace con el objetivo de facilitar la producción de ediciones digitales, para que humanistas puedan crear productos académicos que responden a sus objetivos de investigación, con poca o ninguna programación. Esto se consigue gracias a un diseño modular que permite que las funcionalidades se puedan organizar y recombinar libremente. Esta concepción permite que personas con perfiles técnicos puedan hacer ajustes con facilidad, y que usuarios/as sin conocimientos de programación puedan aprovechar los valores predeterminados adaptados a diferentes modelos de edición. TEI Publisher admite diferentes formatos, tanto de entrada como de salida. Además de TEI, TEI Publisher puede utilizarse para la publicación de documentos en otros estándares como DocBook, MS Word (DOCX) o JATS. Los documentos fuente no tienen que responder a un esquema específico, y serán fácilmente transformados a una variedad de formatos de salida para su publicación: desde una interfaz web, hasta un libro electrónico, un archivo PDF o su fuente LaTeX. En las versiones más recientes de TEI Publisher se ha respondido a las demandas de la comunidad de usuarios/as que pedían ayuda para convertir e incorporar datos en diferentes formatos, así como para enriquecer la anotación de las fuentes sin tener que editar directamente el XML. TEI Publisher es pues más que una caja de herramientas para la publicación y explotación de documentos XML, convirtiéndose en un instrumento que permite la generación automática de documentos TEI y la anotación automatizada. Este póster presenta las características más importantes de TEI Publisher haciendo hincapié en las novedades introducidas en la versión 9. |
9:00pm - 11:00pm | Conference Dinner Location: Esquina Homero Manzi Esquina Homero Manzi.
3601, San Juan Avenue (San Juan y Boedo)
|
Date: Friday, 11/Oct/2024 | |
11:00am - 1:00pm | TEI para las comunidades de Humanidades Digitales: edición digital, análisis computacional, escritura científica Location: Botica del ángel. 543, Luis Sáenz Peña Street Ulrike Henny-Krahmer (University of Rostock) -
Keynote in Spanish |
1:00pm - 3:00pm | Farewell Wine and Tango - Vino de despedida y tango + Botica del Ángel Tour Orquesta Los Mazorqueros (tango) |
Contact and Legal Notice · Contact Address: Privacy Statement · Conference: TEI 2024 |
Conference Software: ConfTool Pro 2.6.153 © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |