Open Islamicate Texts Initiative: a Machine-Readable Corpus of Texts Produced the Premodern Islamicate World
1University of Vienna, Austria; 2Leipzig University, Germany; 3Aga Khan University—London, UK; 4University of Maryland—College Park, USA
The written heritage of the “Islamicate” cultures that stretch from modern Bengal to Spain is as vast as it is understudied and underrepresented in the digital humanities. The sheer volume and diversity of the surviving works produced in Arabic and Persian in the premodern period makes this body of texts ideal for computational analysis. While a great number of texts has been digitized over past two decades, OpenITI is the first corpus of Islamicate texts that is open, machine readable, and aims at being comprehensive. OpenITI strives to provide the essential textual infrastructure in Persian and Arabic for new forms of macro textual analysis and digital scholarship. The corpus is already actively used in several ERC projects.
Tracing the History and Provenance of Medieval and Renaissance Manuscripts through Linked Data
University of Oxford
The poster will present the results of the first eighteen months of the Mapping Manuscript Migrations project, funded by the Digging into Data Challenge for 2017-2019. The topics covered will include the new digital platform which has been developed to aggregate heterogeneous manuscript data in order to enable large-scale research into manuscript histories and provenance.
Specific areas of interest will include the nature of the sources of data which have been combined, the data modelling which has been carried out to unify these disparate data sources, the Linked Data principles and techniques which have been deployed, and the ways in which the aggregated evidence has been presented and visualized.
Migrating Charters into the TEI P5
Zentrum für Informationsmodellierung, Karl-Franzens-Universität Graz, Austria
This poster will present approaches to the modelling and migration of encoded charter data that arose during the migration of the Charters Encoding Initiative (CEI: www.cei.lmu.de) to be compliant with the current version of the Text Encoding Initiative (TEI P5: www.tei-c.org/). It is part of a project to migrate and enhance encoded charter descriptions from the virtual charter platform monasterium.net in order to provide a well documented, reusable environment that prolongs the data life cycle. As part of this, a new data model extension to the TEI was developed in order to model elements of legal documents in a cross-cultural way, including fetures of authentication, conventional legal language, person/organization-level legal actors, and status of documents as originals or copies. As part of the migration process, structured ontologies
Data Beyond Vision
1Princeton University, United States of America; 2Pratt Institute, United States of America
Data visualization is frequently used in Digital Humanities for exploration, analysis, to make an argument, or to grapple with large-scale data. Increasing access to off-the-shelf data visualization tools is beneficial to the field, but it can lead to facile and homogenized visualizations. Data physicalization can be used to defamiliarize and refresh the insight that data visualizations initially brought to DH. Spatial, acoustic, and temporal dimensions of data representation can generate rich narratives and invite the audience to explore new relationships.
We will exhibit a multi-media installation consisting of data physicalization objects and dynamic displays at the conference poster session concurrently with an explanatory poster. Pieces in the installation will utilize space, time, and/or interaction to provide new ways of engaging with a dataset and the arguments and narratives behind it, in order to challenge the dominant paradigms of conventional screen-based data visualization.
New Beginnings: Using Keystroke Logging For Literary Writing
1Huygens ING, The Netherlands; 2University of Antwerp, Belgium
Our project studies the implications of the largely digital creative processes of present-day literary writers for textual scholarship's theories and methodologies. This poster presentation examines a born-digital literary story and shows how keystroke logging data provided by Inputlog can help interpret revisions made during the writing process. It focusses both on small revisions and on the construction of the beginning of the story (incipit) and tries to examine whether the small revisions can be linked to the changes made in the opening passage. We will study both the versions of the text and the process data from Inputlog; we cannot only see which revisions were made to create the ultimate incipit but also when – in the complete writing process.
LiLa: Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin
Università Cattolica del Sacro Cuore, Italy
The LiLa: Linking Latin project was recently awarded funding from the European Research Council (ERC) to build a knowledge base of linguistic resources for Latin. LiLa responds to the growing need in the fields of Computational Linguistics and Humanities Computing to create an interoperable ecosystem of NLP tools and resources for the automatic processing of Latin. To this end, LiLa makes use of Linked Open Data (LOD) practices and standards to connect words to distributed textual and lexical resources via unique identifiers. In so doing, it builds rich knowledge graphs, which can be used for research and teaching purposes alike.
Mediating Research Through Technology @ NEP4DISSENT
1Institute of Literary Research of the Polish Academy of Sciences (IBL PAN), Poland; 2Trinity College Dublin; 3DARIAH ERIC; 4Luxembourg Centre for Contemporary and Digital History; 5Central European University; 6Lab1100
With this poster, the EU-funded scholarly network New Exploratory Phase in Research on East European Cultures of Dissent (NEP4DISSENT) wishes to invite collaboration in facilitating the integration of DH methods and tools by the multidisciplinary community built around the study and curatorship of the cultural legacy of resistance and dissent in former socialist countries in comparative and transnational perspective.
The research and capacity building agenda of NEP4DISSENT represents a complex and original challenge for the marketplace of digital research infrastructures due to its multidisciplinary character, the uneven propagation of DH research practices between disciplines and national scholarly communities East and West, the uneven digital readiness of the sources, as well as its multilinguality. On the other hand DH approaches, are uniquely qualified to explore in full scope that comparative and transnational dimension of the dissident networks of solidarity, which has been one of the most extraordinary aspects of that legacy.
Digital Ecosystem For The French Archaeological Community
1CITERES-LAT CNRS/Université de Tours, France; 2consortium MASA (TGIR Huma-Num)
Created in 2012, the Mémoires des Archéologues et des Sites Archéologiques (MASA) Consortium has been labelled by the Very Large Research Infrastructure Huma-Num. MASA was born from the experience acquired by and within several Maisons des Sciences de l'Homme in the field of processing the documentation produced by archaeologists. MASA's partners have pooled their skills to meet the needs of the archaeological community. The issues identified are multiple and involve several levels of complexity intertwined.
The MASA consortium proposes to the archaeological community a process of data manipulation from acquisition to publication according to a systemic approach. The MASA digital ecosystem is composed of bricks for archiving and sharing archaeological data sets. This digital ecosystem relies on the data culture of archaeologists and their long experience in computerization to bring the community to respect the FAIR principles and to open these corpus in the Linked Open Data.
CAT tools in DH training
Le Mans Université, France
Considered as a tool, Computer-Assisted Translation doesn't really belong to a DH curriculum. Considered in a user-interface perspective though, or as an approach allowing to reflect on the impact of machine learning methods on the Humanities, CAT methods (e.g. their practice and the reflection on these) can legitimately be integrated in such a curriculum.
This poster presents the way we are integrating CAT tool-based translation training in Le Mans Université. The main part of the poster is dedicated to the training setting. The poster will also show the role the Computer Science research department played in setting up a solid infrastructure for these two environments as well as the type of data that has been gathered from the student’s input.
Towards a national collaborative network: Spatial Humanities Netherlands
1International Institute of Social History; 2University of Groningen; 3Triply; 4Webmapper; 5BertSpaan.nl; 6Fryske Akademy
Over the past decades, the Netherlands has fostered a rich variety of projects in a field we would today refer to as ‘spatial humanities’. Such projects include long-running infrastructural undertakings, e.g. the municipality boundaries of NLGIS and cadastral maps of HISGIS Netherlands. With the rise of Linked Data in recent years, the field of spatial humanities has gained a strong momentum in the Netherlands by cultural heritage orientated tech-companies creating smart geo-tools. Yet, the field is fragmented and there is little coordination regarding best-practices, tools, and vocabularies.
With the input of four academic institutes, tech-companies, and cultural heritage partners, our aim is to move towards an approachable, national spatial humanities platform, for exchange and collaboration, within and outside the Netherlands. To ensure the latter, the network communicates with existing projects and SIGs around the world for the exchange of infrastructural knowledge, data models and vocabularies, benefitting researchers worldwide.
Gender and Intersectional Identities in the Digital Humanities
1Indiana University, United States of America; 2Stanford University, United States of America; 3Alan Turing Institute, United Kingdom; 4Western Sydney University, Australia
The role of gender and intersectional identities in digital humanities remains an urgent topic of conversation. Despite this, precious few spaces exist for open, safe, and inclusive discussions around intersectional gender. Digital spaces like the Crunk Feminist Collective (http://www.crunkfeministcollective.com/), FemTechNet (https://femtechnet.org/), and FemBot Collective (https://fembot.adanewmedia.org/) provide blogs, resources, and opportunities for public writing on issues that matter to female-identified researchers. Between January and June 2019, individual volunteers are organizing a series of monthly virtual meetings, each around a specific topic (e.g. credit, authority, (lack of) infrastructure, emotional and invisible labor, gender equity at panels, gender disparities in technical work, gender and leadership in digital humanities initiatives, etc).
Tracing People, Places And Dates In An Early Modern Context
1Huygens ING, The Netherlands; 2University of Oxford, United Kingdom
The accurate identification of people, places, and dates is fundamental to historical research. In practice each raises considerable difficulties for anyone working in the early modern period. Dating letters requires systems for mastering transitions between different calendars. Recording places requires capturing changes to how places are named and nested within larger entities. Identifying people requires the development of authority files for individuals not found in biographical dictionaries or library catalogues. To facilitate this process, the Cultures of Knowledge project at the University of Oxford and the Humanities Cluster at the Netherlands Academy of Sciences (KNAW) are developing three Linked Open Data resources for people, places, and dates. This poster presents the current development of EM Places – a collaboratively curated, historical geo-gazetteer for the 16-18th C. – and EM Dates, an early modern calendar conversion service. A prosopographical name authority, EM People, is being planned for development after these tools.
Using a Feminist and Inclusive Approach for Gender Identification in Film Research
Film University Babelsberg “Konrad Wolf”, Germany
Although there is a scientific consensus that gender is not binary, immutable, and physiological, it is still common to operationalize it in such a way. Recently, there have been more attempts to critically assess and change these exclusive practices. This contribution joins these efforts by describing our attempt to measure gender of film directors by relying on their own chosen self-representation. The research stems from a study on circulation of films within film festival networks. Gender of directors constitutes an important piece of information due to known discriminatory practices in film industry. In our operationalization of gender, we focused on directors’ ways to use personal pronouns on available web resources. We also compare our results to alternative findings when binary manual and automatic gender detection methods are used. In communicating the comparisons, we visualize the data to invite others to critically reflect on current practices of gender operationalization.
Determining And Visualizing Genesis: A Digital Edition of Goethe’s Faust
1Julius-Maximilians-Universität Würzburg; 2Goethe University Frankfurt
Since October 2018, a digital genetic edition of Goethe’s Faust is publicly accessible online. The poster demonstrates its most specific interlinked visualizations that let users explore Goethe’s lifelong work on Faust and introduces a graph-based approach to infer genetic ordering from dating information in the literature of 120 years of Faust research.
Developing MORROIS (Mapping of Romance Realms and Other Imagined Spaces): Digitizing Geographic Data Drawn from Literary Sources
1Memorial University of Newfoundland, Canada; 2Zentrum für Informationsmodellierung, Karl-Franzens-Universität Graz
The MORROIS (the Mapping of Romance Realms and Other Imagined Spaces) project, a digital geographic concordance of literary spaces, collects line-by-line instances of explicitly geographic place-name usage in Middle English manuscripts. The end goal of MORROIS is to explore the research possibilities afforded through distant reading and various data visualizations (including GIS).
My poster will: (1) present my data migration from Omeka Classic to RDF, including the selection and customization of flexible and transferable ontologies and the benefits of RDF metadata modelling; (2) highlight some of the challenges inherent in data migration from a traditional relational database format to one geared for Linked Open Data; and (3) address methods for extracting data from the manifold formats of Middle English texts, including editions preserved in simple HTML format, printed and non-digitized critical and diplomatic editions, or in hard copy or digital manuscript facsimile.
Visualizing Poetic Meter in South Asian Languages
1Michigan State University, United States of America; 2Government Zamindar Post Graduate College, Pakistan
The explication of poetic meter in the modern languages of South Asia is a source of consternation even for experienced poets, let alone readers and scholars. Urdu poetry, for example, is written in meters drawn from Perso-Arabic and Indic sources. Traditionally, these two metrical systems take their rules for versification from poetic traditions in what are considered their traditional source-languages: classical Arabic, on the one hand, and Sanskrit, on the other. The trouble is, neither poetic system aligns well with the phonological features of modern South Asian languages. As a result, the Arabic system quickly becomes combinatorially explosive, leading to multiple acceptable scansions. Modern scholars have offered alternative ways to think of meter. We augment that work by presenting an interactive web-based software package under development to visualize poetic meter using directed graphs that accommodate multiple languages and scripts to make accessible poetic knowledge for readers, scholars, and poets.
EScripta: A New Digital Platform for the Study of Historical Texts and Writing
1École Pratique des Hautes Études – Université PSL; 2Université PSL
This poster presents a new platform for palaeographical, linguistic and textual studies of manuscripts, documents and inscriptions. The platform is conceived particularly for experts working in a very wide range of writing-systems and writing directions which are often not supported by existing frameworks, including not only alphabets but also abjads, ideoglyphs, hieroglyphs and others, from left to right, right to left, top to bottom, bottom to top and so on. The platform combines tools for manual and automatic approaches such as manual transcription and Handwritten Text Recognition (HTR), manual and automatic linguistic markup, deep structured palaeographical annotation, and the preparation and publication of editions. Rather than building everything from scratch, it also draws on the substantial existing tools which are now available using Web-based APIs and standards such as the International Image Interoperability Framework (IIIF) and Distributed Text Services (DTS).
Who Teaches When We Teach DH?
1Bucknell University, United States of America; 2Brigham Young University, United States of America
In this poster, we will present the work we have done to develop a survey of those teaching digital humanities throughout the world. First, we will discuss the development of the survey. Second, we will outline the methodology we have employed in developing the survey in order to best ascertain how and who these teachers are. Third, we will begin in real time the data collection at the conference.
L’Environnement Sonore en tant que Ressource Culturelle pour les Selk'nam et les Yahgan : de la Terre de Feu au Cap Horn
1Sorbonne Université, France; 2Institut de Recherche en Musicologie (UMR 8223); 3Lutheries-Acoustique-Musique (UMR 7190)
Ce poster présente une étude pluridisciplinaire des relations qu'entretenaient et entretiennent deux cultures amérindiennes (yahgan et selk'nam) avec leurs territoires et leurs environnements sonores respectifs.
Ce sujet soulève plusieurs axes de réflexion: En quoi le paysage peut-il témoigner d'une culture? Comment démontrer, cartographier, mettre en évidence l'importance des sons? Comment étudier leurs territoires dans cette perspective?
Au-delà de l'adaptation d'outils existants et de l'élaboration de nouvelles méthodes et protocoles, cette recherche interroge la place occupée par celui qui écoute, s'immerge et se déplace dans des territoires témoins de cultures différentes de la sienne, avec toute la charge émotionnelle qu'implique le génocide de ces deux peuples aux cours des XIXe et XXe siècles.
Polish Literary Bibliography - New Research Data Portal for Complex Cultural Dataset
Institute of Literary Research of the Polish Academy of Sciences, Poland
The poster presentation deals with the remediations of the Polish Literary Bibliography, a large cultural dataset available online (www.pbl.ibl.waw.pl).
PBL has experienced two comprehensive remediations in recent decades. First one transformed PBL from a printed book into an online database, and provided a stable environment for 20 years of continuous creation of rich bibliographic metadata. Yet, its Oracle-based production environment and its user interface was geared to faithfully represent the structure and layout of the bibliographic data of its printed predecessor, rather than to open for the possibilities of digitally-enabled data exploration
The second remediation aimed to better display the complexities of PBL dataset, and facilitate data-driven uses of the bibliography.
The poster will present 1. the main characteristics of the new PBL service and 2. the main challenges that the project team, responsible for the second remediation, had to face and resolve.
Entangled Histories of Early Modern Ordinances. Segmentation of Text and Machine-Learned Metadating.
1Ghent University; 2KB-Library; 3Erasmus University Rotterdam
Libraries and archives throughout Europe host books with ordinances, or individual ordinances (‘laws’) from the 15th till 18th century. These texts contain indications of how governments of burgeoning states dealt with unexpected threats to safety, security, and order through home-invented measures, borrowed rules, or adjustments of what was established elsewhere.
These ordinances are used widely within research, but only through cherry-picking those necessary for one’s research. Systematic searching the ordinances is not yet possible due to bad OCR and lacking text-segmentation. Therefore, this project will apply Transkribus to improve the text-recognition. Being able to search through the thousands of texts, requires uniform metadata which this project will add automatically after supervised training through topic modeling. The meta-data gathered from the sources will be accessible through an RDF-compliant tool in order to be able to visualise the topics the ordinances dealt with in various regions, throughout time.
Repetition And Popularity In Early Modern Songs
1Utrecht University, The Netherlands; 2Meertens Institute, The Netherlands
This study explores the relation between repetition and popularity in Dutch historical songs. We quantitatively model the relationship between popularity and various forms of repetition in the lyrics of 15k seventeenth century songs from the Dutch Song Database (Nederlandse Liederenbank).
To establish a ranking of songs reflecting their contemporary popularity, we approximate early modern hit charts in which the popularity of historical songs is defined as the interaction of several variables that affect the popularity of a song.
After that, we employ different methods of text compression to quantitatively estimate a song's degree of repetitiveness. We use (i) the Shannon entropy, (ii) the Lempel-Ziv-Welch-algorithm (LZW) and (iii) the Bloom Filter.
Using these compression methods as predictors, we model the relationship between popularity and repetition in early modern songs with regression models.
COSME² - Complexities: 30 Years Of Research Of Medievalists DH Concerning A Thousand Years Of Medieval Sources
1COSME² (Huma-Num), France; 2UCLouvain, Belgium; 3ENS Lyon, France; 4UAvignon, France; 5EnC, France
Cosme², a Huma-Num consortium dedicated to the study of medieval sources, brings together a large part of the French medievalist community around the digital processing of medieval sources, mainly written. The complexity of the medieval digital landscape is due to its ancientness: medievalists were among the first to be concerned about the digital processing of their sources. Databases and corpora digitised in various forms are therefore many and varied, many remain dormant or need to be upgraded, others lack metadata, others are no longer online or on outdated media, others lack interoperability, even if their content allows them to do so. Thousands of digitised medieval charters are not yet effectively linked. Medievalists were among the first to design electronic publishing platforms (such as TELMA, Scripta, CBMA) but they are not yet interoperable. This poster will propose new solutions to solve these complex problems, in the name of COSME².
Towards Constructing An Ecosystem for Digital Scholarly Editions of East Asian Historical Sources: With the Focus on the TEI-Markup of the Engi-Shiki
1University of Tokyo, Japan; 2International Institute for Digital Humanities, Japan; 3National Museum of Japanese History; 4Ochanomizu University
The Text Encoding Initiative has long been the defacto standard for constructing digital scholarly editions of humanities as the interoperable data. Compared with European sources, however, there are fewer projects to create TEI documentation for East Asian materials.
This poster presents the importance of creating a TEI documentation for East Asian sources, through the markup project of the Engi-Shiki. The Engi-Shiki is a 50-volume work compiled between in 907 and 927 C.E. The first ten volumes are Imperial Shinto regulations, and the last 40 are codifications of criminal and administrative law.
This poster demonstrates the further implication of the markup of the Engi-Shiki regarding East Asian studies, through its textual features, varieties of literary styles, and connections with Chinese Tang history. Though we are in the process of completing the documentation, it is valuable to invite feedback from TEI practitioners and DH researchers at the conference.
APOLLONIS: The Greek Infrastructure for Digital Arts, Humanities and Language Research and Innovation
Athens University of Economics and Business / ATHENA R.C.
APOLLONIS is the Greek national infrastructure for Digital Arts, Humanities and Language Research and Innovation. It brings together the leading strengths and capacities in the field by providing high-level computational tools, interoperable datasets and services. APOLLONIS was recently formed by the union of two existing ESFRI-related national research infrastructures: clarin:el, the CLARIN-related Greek network for language resources, technologies and services; and DARIAH-GR/DYAS, the DARIAH-related Greek network for digital research in the Humanities.
This poster will enable DH2019 audiences to engage with, comment and discuss the four main lines of action of the APOLLONIS infrastructure: Tools and Services, Resources, Education and Training and Communities of practice.
QuoteSalute - Inspiring Greetings for Your Correspondence
Berlin-Brandenburgische Akademie der Wissenschaften, Germany
quoteSalute (https://quotesalute.net/) aggregates salutes (closings of letters) from various openly available digital scholarly editions of letters based on the encoding of the TEI-element <salute>. The project website hosts a corpus of curated salutes, so they can be copied into an e-mail with a single button press. Thus users can quote historically important persons and use these quotes in their daily correspondence. The project is available as part of the web service correspSearch (https://correspsearch.net/) which aggregates metadata of various scholarly editions of letters. The complete source code (data, scripts, etc.) is accessible on GitHub. Furthermore, templates as well as an extensive documentation are provided, so other projects can quickly incorporate their own data into the corpus of salutes.
How to Sustain an International Digital Infrastructure for the Arts and Humanities
1DARIAH ERIC; 2NOVA FCSH; 3Université de Neuchâtel
Europe has a long and rich tradition as a centre for the arts and humanities. However, the digital transformation poses challenges to the arts and humanities research landscape all over the world. Responding to these challenges the Digital Research Infrastructure for Arts and Humanities (DARIAH) was launched as a pan-European network and research infrastructure. After expansion and consolidation, which involved DARIAH’s inscription on the ESFRI roadmap, DARIAH became a European Research Infrastructure Consortium (ERIC) in August 2014.
The DESIR project sets out to strengthen the sustainability of DARIAH and firmly establish it as a long-term leader and partner within arts and humanities communities. It focuses on 6 key challenges for a research infrastructure: dissemination, growth, technology, robustness, trust, training and education.
Ethics and Legality in the Digital Arts and Humanities
1University of Graz, Austria; 2Austrian Academy of Sciences, Austria; 3Institute for Ethnology and Folklore, Croatia
The European Research Infrastructure Consortium "Digital Research Infrastructure for the Arts and Humanities" (DARIAH-EU) promotes open access of methods, data and tools, and stands for responsible scholarly conduct and community engagement.
The Working Group on "Ethics and Legality in Digital Arts and Humanities" (ELDAH) is dedicated to addressing the needs of the DH research and education community regarding the topics of legal issues and research ethics by producing recommendations, training and information materials on IPR, open licenses and Open Science in general, and offering workshops on these topics to scholars in the context of DARIAH events across Europe.
This poster will inform the audience about the main activities and topics covered by the ELDAH Working Group and enable us to engage with colleagues from outside of Europe to exchange and learn from experiences and practices on legal and ethical aspects of their work.
Madgwas: a Database of Ethiopian Binding Decoration
Zentrum für Informationsmodellierung, Karl-Franzens-Universität Graz, Austria
Ethiopia is home to the only remaining continuous tradition of widespread Christian scribal production, but the manuscripts produced by that tradition are little-studied and the resources for dating and describing Ethiopian manuscripts are few and poorly-developed compared to their European relations. Ethiopian manuscripts are an understudied but cognate part of the wider European/Mediterranean Christian manuscript tradition. Madgwas is a database for the identification, cataloguing, and dating of Ethiopian binding tools and decoration. It leverages European and international libraries’ increasing sharing of manuscript images through the International Image Interoperability Framework (IIIF) to produce a catalogue that links binding decoration, scribal tools, and individual manuscripts in a way that will serve a versatile set of researcher needs. This poster will present the results of the first stage of project development, the ingest of the Ethiopian manuscripts hosted by the British Library’s Endangered Archives Program.
Conceptual Vocabularies and Changing Meanings of “Foreign” in Dutch Foreign News (1815-1914)
Utrecht University, The Netherlands
The nineteenth century saw the first waves of globalization. One of the prime vehicles through which nineteenth century publics registered global changes was foreign news. Newspaper articles not only described, but also defined what was considered global, international and foreign. This research traces the changing meaning of the concept "foreign" in Dutch newspapers between 1815-1914. Using collocations, n-grams and diachronic word embeddings this research investigates the word senses and associations of words related the concept foreign. It shows how, over the course of the century, the meaning of the concept changed in ways that both reflected and stimulated globalization.
Orbis-in-a-Box (OIB): Modeling Historical Geographical Networks
1University of Vienna, Austria; 2Leipzig University, Germany; 3University of Pittsburgh, USA; 4AIT Austrian Institute of Technology, Austria
In 2012, researchers at Stanford (led by Walter Scheidel) developed ORBIS (http://orbis.stanford.edu/) which offered a complex model of connectivity by reconstructing the duration and financial cost of travel in antiquity. Revealing the true shape of the Roman world, ORBIS provided a unique perspective on premodern history and became an object of envy for scholars working in other historical contexts. Since ORBIS was not designed to be easily adaptable to other contexts, a DH-team at the University of XXXX organized a hackathon, where participants worked on a tool which historians with minimal DH skills could easily install and run, and, by supplying their own data, could explore their own historical networks in ways similar to ORBIS.
Her Hands On Her Hips: Body Language In Children’s Literature
University of Birmingham, United Kingdom
This poster will present our digital reading approach to body language in fiction. We use the web application CLiC – Corpus Linguistics in Context (freely available at clic.bham.ac.uk) and a range of corpus linguistic methods to identify gendered patterns of body language in literature for children. We are particularly interested in how the presentation of body language has changed over time and how the changes we identify reflect socially structured and gendered patterns of behaviour.
The Library In The Digital Humanities: Surveying Institutional Practices In The UK And Ireland
Research Libraries UK
It is widely accepted that research libraries play an important role in facilitating academic research and teaching. However, given the technological advances of the last few decades, this role has been continuously transforming; the emergence of digital humanities, in particular, raises new challenges for libraries. This paper investigates current practices in research libraries across the UK and Ireland concerning the support of or involvement in digital humanities research. Exploring the different models of engagement that libraries follow when it comes to working with digital humanities researchers, the nature of these professional relationships as well as the benefits and challenges they involve will hopefully increase our knowledge about an institutional side of the digital humanities in the UK and Ireland that remains largely undocumented.
Navigating the Complex Landscape of Digital Humanities Methods and Tools with the OpenMethods Metablog
1DARIAH EU; 2Université Paris-Nanterre; 3University of Coimbra; 4CONICET; 5Huma-Num / CNRS; 6Institute of Literary Research of the Polish Academy; 7DANS-KNAW; 8University of Applied Sciences Potsdam; 9Huygens Institute for the History of the Netherlands - Royal Academy of Arts and Sciences
Navigating through the rich and dynamically evolving Digital Humanities (henceforth DH) landscape can be a time-consuming task and difficult to integrate into researchers’ everyday routines.The OpenMethods metablog aims to explore and deliver a solution for this need in a Digital Humanities (henceforth DH) context. It provides a platform to bring together all formats of openly available digital publications. The platform provides a convenient and easy way for DH experts from around the globe to select, propose, curate, and highlight online published content. Suitable online content may be proposed by Community Volunteers. The OpenMethods platform is intentionally interdisciplinary and multilingual to facilitate a timely disclosure and spread of knowledge and to raise peer recognition for the related research results. The group of DH experts, known as the OpenMethods Editorial Team, currently comprises 23 editors from 11 countries.
DSE Visualisation with EVT: Simplicity is Complex
1Università di Torino, Italy; 2Università di Pisa, Italy; 3Istituto di Linguistica Computazionale - CNR, Pisa, Italy
Developers of EVT, a web-publishing tool for TEI-based digital editions, are facing a dilemma: on the one hand, scholars using this tool appreciate its clean UI, the simple configuration and customization tools, and the features it offers; on the other hand, the growing number of features, the mixing of different edition levels (both diplomatic and critical, with support for multiple witnesses) and the complexity of the navigation layer have posed significant challenges with regard to the design and building of a flexible framework and of an User Interface layout that can manage all the aspects of a sophisticated Digital Scholarly Edition. The proposed poster will describe the latest developments and solutions devised by the EVT team to solve the issues hinted above and more precisely described in the abstract.
DARIAH Beyond Europe
1Stanford University, United States of America; 2DARIAH; 3Ghent Centre for Digital Humanities, Ghent University, Belgium; 4Library of Congress, United States of America; 5Australian Academy of the Humanities, Australia; 6Australian Research Data Commons, Australia; 7eResearch South Australia, Australia
DARIAH, the digital humanities infrastructure with origins and an organizational home in Europe, is nearing the completion of its implementation phase. The significant investment from the European Commission and member countries has yielded a robust set of technical and social infrastructure, ranging from working groups, various registries, pedagogical materials, and software to support diverse approaches to digital humanities scholarship. While the funding and leadership of DARIAH to date has come from countries in, or contiguous with, Europe, the needs that drive its technical and social development are widely shared within the international digital humanities community at large. The DARIAH Beyond Europe workshop series, organized and financed under the umbrella of the DESIR project (“DARIAH ERIC Sustainability Refined,” 2017–2019), convened three meetings between September 2018 and March 2019, in the United States, and in Australia. This poster reflects on key outcomes and future directions arising from these workshops.
Living apart together: Research across Repositories
Akademie der Wissenschaften und Literatur | Mainz
The poster shows exemplarily how research questions can be raised across repositories. The repositories in question are both epigraphic: "Deutsche Inschriften Online" and "Epidat - Forschungsplattform für jüdische Epigraphie". Both make their research data available as TEI-XML. Using the generic web service XTriple, RDF statements can be extracted from XML resources. As soon as the data has been merged in an RDF store, research questions can then be asked across repositories. As a test case, it is examined how gender is distributed in the respective repositories.
Designing the Database of Indigenous Slavery in the Americas
1Department of History, Brown University, United States of America; 2Department of History, Brown University, United States of America; 3Brown University Library, Brown University, United States of America; 4Brown University Library, Brown University, United States of America; 5Brown University Library, Brown University, United States of America
Scholars estimate that between 2.5 and 5 million Native people were enslaved in the Americas between 1492 and 1900. This is an astonishing number, even compared to the approximately 12.5 million Africans who were brought as slaves from Africa during the same period. Only in the past fifteen years, however, have researchers undertaken a sustained examination of the history of this nearly hidden form of slavery. The Database of Indigenous Slavery in the Americas (DISA) is developing a database to document as many instances as possible of indigenous enslavement in the Americas between 1492 and 1900, consulting records such as runaway slave ads, probate records, records of individual colonies, journals, financial records, ship manifests, correspondence, and church records. Our work details how we have engaged with a variety of complexities in designing a database about enslaved people.
DHmine: an Open Source Cloud-based Framework for DH Research
Budapest University of Technology and Economics, Hungary
The DHmine Toolkit is a collection of open source software tools including a Web front-end, non-stuctured and relational data storages, a cloud-based file store, an RDF triplestore and autonomous software tools that perform various tasks on demand (like OCR, TEI encoding, document conversions, content analysis, entity recognition and others). There are two statistical tools included in the system: a Web-based stylometry tool and RStudio for providing a programmable environment.
The toolkit employs Docker-based virtual machines to simplify installation and maintenance, and an auto-configured Web proxy server with Let's Encrypt support to securely relay its services to the clients. These enable rapid deployment and flexible service configuration in a cloud-based environment.
The software was used to process and publish a large text corpora from the 18th century extended with an author's dictionary, critical annotations and related knowledge entries from Linked Open Data sources.
Le Dictionnaire topographique. Une API pour les toponymes anciens français
École des chartes, France
The Dictionnaire topographique is a leading resource for historians and toponymists: it has more than 1,100,000 ancient french toponyms that have been dated and referenced. The 35 volumes have been digitized and an application is being developed. Its documented API provides standardized access to data, and uses data linking to locate place names. The objective of this API is to promote the re-use of this important resource, but also to continue to enrich it by providing researchers with an interface to correct and complete the content as they discover it. This paper aims to promote this essential resource for toponymic research: we will present the history of this publishing initiative, detailing the steps involved in digitization, restructuring and data enrichment. Finally, we will present the API and the associated application that makes it possible to exploit new relationships within the Dictionnaire, and above all, to revitalize an unfinished editorial initiative.
Early Career Researchers and Research Infrastructures: Barriers and Pathways to Engagement
Trinity College Dublin, Ireland
This poster will present the results of work conducted since November 2017 into Research Communities and Research Infrastructures (RIs), with a focus specifically on Early Career Researchers in the Arts, Humanities and Social Sciences. We look at practices within and issues particular to this group of researchers, and offer recommendations for how RIs might integrate the needs of this specific research community into their wider communications practices.
How we designed galassia Ariosto
1Net7, Italy; 2Net7, Italy; 3SNS, Italy
In the poster we present the UX design methodologies we applied within the project Galassia Ariosto (www.galassiaariosto.sns.it). The platform is the result of the project ERC AdG 2011 LOOKING AT WORDS THROUGH IMAGES, leaded by the Scuola Normale Superiore of Pisa.
Liquid Galaxy Visualization of IMS's Photographic Collections
1Instituto de Matemática Pura e Aplicada, Brazil; 2Instituto de Matemática Pura e Aplicada, Brazil; 3Instituto Moreira Salles, Brazil; 4Instituto Moreira Salles, Brazil; 5Instituto Moreira Salles, Brazil
This poster presents the first results of an ongoing project using Liquid Galaxy (LG) platform with a particular interest in its applications for panoramic geographic-based visualization within the scope of a research agreement between two Brazilian institutions. One of the main goals of this agreement is to research and develop immersive panoramic and geospatial navigation interfaces using LG platform to present Instituto Moreira Salles' (IMS) photographic collections.
The Problem of Hobbes and the Bible: A Textometric Approach
ENS de Lyon, France
The materialist philosopher Thomas Hobbes (1588-1679) developed a growing interest in scriptural issues that led him to scatter a myriad of biblical citations in his major political works. But the acknowledgment of his scriptural references has been a challenge to complexity since then, let alone an exhaustive comprehension of his use of the Bible (Jones 1984, Pacchi 1989, Somos 2015). With this poster, we aim to showcase the benefits of a textometric approach to 'the problem of Hobbes and the Bible’, by presenting the TXM-based prototype corpus of XML-TEI P5 encoded EEBO-TCP diplomatic transcriptions of Hobbes’s English political works built for the ongoing ‘Digital Theological Hobbes’ project.
Using Data Visualization to Explore International Trade Agreements
University of Edinburgh, United Kingdom
This poster explores what can be learnt by applying different data visualization methods to a corpus of 450 preferential trade agreements, gathered and structured into XML format by the ToTA: Text of Trade Agreements project (Alschner et al. 2017) and available at https://github.com/mappingtreaties/tota. It seeks to understand the kinds of relationships between countries which can be discerned by examining the text of legal documents that regulate economic interactions between those countries, and the relationship between the documents themselves, with a particular focus on the influence of earlier documents on later documents. The visualisation methods used include the visual clustering of documents based on topic similarity, bimodal network visualisations, and word embeddings rendered in two dimensions.
The Begums of Bhopal: Digital Metadata Analysis In The Field of Representation
Loughborough University, United Kingdom
The use of imperial media in representing India and its people was an important aspect in the consolidation of colonial rule. This poster examines how the analysis of the representation of Indian individuals’ links to colonial consolidation using encoded metadata in sources. I will be demonstrating how the development of an overlapping and layered approach to metadata in encoding can better represent and therefore aid research into media representations.
This poster will explore the use of digital metadata analysis in the field of representation, and will demonstrate a custom-designed database system that allows for consistent layering of metadata on textual and material historical objects, digital reproductions, and enhanced fragments. The use of metadata will allow for a comparison between visual and textual sources, that may otherwise never be discovered.
Which Services for User Participation? Representing Cooperation and Collaboration in a Participative Digital Library
Université Grenoble Alpes, France
This poster will present the cooperative and collaborative services defined by a French-Italian Digital Library (DL) founded on the principles of public engagement . Cooperation and collaboration are often confused with each other, but our experience leads us to distinguish them in order to better achieve the purpose of our project.
Tikkoun Sofrim – Combining HTR and Crowdsourcing for Automated Transcription of Hebrew Medieval Manuscripts
1EPHE, PSL, France; 2University of Haifa, Israel; 3CHart, France, EA 4004; 4Orient et Méditerranée UMR 8167
We present a pipeline combining HTR of Medieval Hebrew manuscripts with crowdsourcing-based process for the corrections towards the use for scholarly editions and the integration into a library manuscript service for long term preservation. The project includes: (1) design and structuring of efficient document analysis pipeline that integrates and streamlines multiple steps/processes needed to be taken when transferring an image of a handwritten document into a machine readable text, transcribing, validating and making it publicly available; (2) the pipeline is implemented by adopting and harnessing an existing HTR tool  for the sake of page segmentation and automated transcription; (3) developing a crowdsourcing system for validation and correction of the machine-based transcriptions (4) design and implementation of policies for structuring thriving community of volunteers; (5) data structuring of products for future implementations in both library viewers and critical edition viewers, such as Mirador and TEI Publisher.
Branding East Asian Cultural Studies By “Opening” Access To Research Resources, Research Groups, and Know-Hows
Kansai University, Japan
Our project-based research center aims to build digital archives from our university’s East Asian collections and promote East Asian cultural studies. In this paper, we will explain our project concepts of "openness" policy and current status.
Our digital collections are roughly divided into three groups, pre-modern Chinese materials, modern Japanese local resources, and archaeological research data related to ancient Japan.
We will provide the collections from the standpoint of the three concepts of “openness” and an open platform. The concepts are to open access to research resources, to wider research groups, and to provide research know-how. In addition, our open platform will employ the above three concepts of openness and provide a global search engine portal for East Asian IIIF collections.
Currently, we are building the digital archives with the aim of releasing them within FY2018.
Constructing A New Science Framework In Japanese Historical Studies Through Digital Infrastructure
National Museum of Japanese History, Japan
We are developing a new digital infrastructure to serve as a comprehensive digital network of Japanese historical resources. Using the system, we are constructing a new science framework for Japanese historical studies. The system enables access to resource data in universities, museums, and other institutes across Japan through interdisciplinary studies in the humanities and sciences. This paper introduces our system, which is called ‘Knowledgebase of Historical Resources in Institutes (khirin)’. As one of the khirin’s prospects, we present an application of the scientific resource data for Japanese historical studies. We also show that disseminating historical resource information can promote advanced collaboration in historical studies between relevant Japanese and international institutes.
Multimedia Markup Editor (M3): A Semi-automatic Annotation Software for Static Image-Text Media
Paderborn University, Germany
This poster introduces an editor software specifically designed for graphic narratives, including graphic novels and comics, but also other kinds of illustrated still-image media. Users are able to mark up these documents in XML via a Java-based GUI. The annotation language used in the system, which we call “Graphic Novel Markup Language” (GNML), is an extension of John Walsh's TEI-based “Comic Book Markup Language.” A number of automatic processes in the editor software, such as marching squares algorithm and livewire segmentation, simplify manual annotation. The editor software facilitates the analysis of multimodal corpora with complex text-image interactions. Such evidence-based investigation may help revise existing theories of graphic narrative or falsify more qualitative scholarship.
Digital Database of WWI Victims from Slovenia (ZV1): Project Cooperation Between the Digital Humanities and Cultural Heritage
Institute of Contemporary History, Slovenia
Digital Edition and Analysis of the Mediality of Diplomatic Communication - Habsburg’s Envoys in Constantinople in the mid-17th century
1University of Graz, Austria; 2University of Salzburg, Austria
The project examines the transfer of information between the courts of Vienna and Constantinople in the mid-17th century with the help of digital methods. It focuses in particular on written sources of diplomatic missions of that time, building on the hypothesis that composing these media followed specific rules and was shaped by various factors (e.g. transport conditions, personal interests). The media had great impact on knowledge transfer between Habsburg’s diplomats and the imperial court in Vienna and determined public perspectives of the Ottoman Empire. Thus, the computer-aided analysis of the sources is conducted from a media-scientific perspective.
Several digital methods are employed:
1. The sources are available to the public as a digital edition. The texts are transcribed and enriched manually and automatically.
2. Analyses of the transcriptions will reveal dominant topics, diplomatic networks and structural specifics of the texts.
The resulting data is archived in a trusted digital repository.
Visualizing A Prosopographical Study Of The Young Turk Elites: Using Data Mining, Network Clusters And Spatial Mapping
Sabanci University, Turkey
This poster presentation aims to visualize the output of a research project that seeks to analyze biographic data about the members of a distinct group of late-Ottoman / early-Republican elites, the Young Turks, in order to better understand patterns of relationship and activity among the various networks of these political elites whose roles were very significant in the making of modern Turkey. The poster is based on the applicant’s collaborative research project that aims to create a digital database and employ digital humanities tools to interpret that data, which would then constitute a basis for a prosopographical research. The project brings together three humanities scholars, including the applicant as the supervisor, and a computer scientist who is consulted for the uses of data mining and visualization techniques throughout the project.
Cultural Analysis of Spoken Linguistic Signalling: A Pipeline for the Alignment of Audio, Text, and Prosodic Features
1University of Richmond, United States of America; 2Université Paris Diderot
Linguistic elements are known to be powerful signals for social categories such as class, race, education, political affiliation, and gender. The vast majority of work on linguistic signalling in the digital humanities, however, has focused on the analysis of print culture due to the availability of large textual datasets and readably available methods. Spoken language, however, is known to vary considerably within communities, even when they share a common written language and dialect. Phonetic features such as tone, rhythm, and phoneme variation all serve to signal social identity. In this poster, we present a general pipeline for the construction, alignment, and analysis of spoken linguistic data. As a way of illustrating how this linguistic data pipeline is able to produce new scholarship, the poster focuses on an application to a corpus of spoken British English curated by the French-led Aix-MARSEC project.
Ranke.2 - A Teaching Platform for Digital Source Criticism
1University of Luxembourg, C2DH, Luxembourg Centre for Contemporary and Digital History; 2University of Luxembourg, C2DH, Luxembourg Centre for Contemporary and Digital History; 3University of Luxembourg, C2DH, Luxembourg Centre for Contemporary and Digital History; 4University of Luxembourg, C2DH, Luxembourg Centre for Contemporary and Digital History
This poster is about the teaching resource Ranke.2 - Source Criticism in the Digital Age. This consists of a series of online open source lessons created to teach students how to apply source criticism to retro-digitised and digital born data.
The poster explains the relevance of digital source criticism, listing the key questions that should be posed to both analogue and digital sources.
It also presents the core Ranke2 teaching principles:
1. Differentiation in complexity and time required, to connect to different teaching contexts. This means lecturers can choose between the modules SMALL( an animation and/or a quiz), MEDIUM (a series of assignments) or LARGE (a tutorial for a hands on workshop).
2. Offering teaching content in a variety of attractive interactive formats: colourful animations, quizzes, assignments for web research, and tutorials for a hands on workshop.
SemAntic: A Semantic Image Annotation Tool For The Humanities
1University of Passau, Germany; 2University of St. Gallen, Switzerland
In this paper we present SemanAntic, a web-based application for semantically annotating images. We describe its high-level architecture, the basic functionality and finally outline future work. SemAntic accepts a variety of image formats, enables the user to mark parts of the image using circular, rectangular and polygonal regions, to associate them with a user loaded RDF ontology classes and lastly, export the resulting annotations to JSON according to the Web Annotation Data Model, a W3C Recommendation.
SemanAntic was developed in the context of Neoclassica, where the automatic image classification component required an image corpus annotated according the specifically developed Neoclassica domain ontology. SemanAntic will be available as open source upon completion.
Book Formats and Reading Habits in Early Modern Europe
University of Helsinki, Finland
The eighteenth century entailed a rapid change in reading and writing books. To trace changing practices of reading, we have analysed how smaller book formats, in particular the octavo format, became more popular in the eighteenth century. Smaller books could be easily transported, carried in a pocket to places where individuals could read in solitude. To assess the change in the material dimensions of books and other print, we turned to four large bibliographies. Altogether, they cover 2.64 million harmonized entries from the period before 1830. The statistical analysis shows clearly how the octavo format became more popular in Europe toward the end of the eighteenth century, but also indicates that the development was uneven in the sense that the timing and speed of the development varied according to location. We further use the analysis to discuss types of towns based on the profiles of books produced in them.
IncipitSearch: a guide to collaboration
Academy of Sciences and Literature | Mainz, Germany
A centralized access to sources, editions, and further kinds of publications facilitates the research process and provides a comprehensive overview of existing information. To connect musicological collections and repositories, we created a metasearch for annotated music: IncipitSearch. It is a tool and a service specifically tailored for research on music incipits, the initial sequences of notes that characterize a work. IncipitSearch is a service to interconnect musical pieces via metadata. It is also a tool that can be reintegrated into existing digital research platforms. By connecting some of the largest digital collections of music metadata it already offers access to around 1 million incipits. In four comprehensible steps, this poster will be a guide explaining how data owners can add their data to IncipitSearch and how the reimplementation of the search functionality can be carried out.
Disentangling the Hairball: Observing International Style in Kazuo Ishiguro’s Novels in Network Visualisations
University of Konstanz, Germany
The poster strives to illustrate the possible correlation between stylistic particularities and thematic similarities in Kazuo Ishiguro’s oeuvre. The goal is to explore through digital methods Rebecca Walkowitz’s contemporary theory that deems Ishiguro’s literature as evidently and inherently international, which becomes apparent in his own dual-national identity and his methods to reflect the represented culture of a novel in a distinct style that appears as already translated. Visone – a program developed by Ulrik Brandes and Dorothea Wagner for social network analysis – will be used as the primary network program in order to demonstrate its potential for digital humanities as it combines an easily approachable design with in-depth methods of graph theory for means of multi-layered visual explorations.
Single Image Super Resolution Approach to the Signatures and Symbols Hidden in Buddhist Manuscript Sutras Written in Gold and Silver Inks on Indigo-Dyed Papers
1Okayama University, Japan; 2National Museum of Japanese History, Ritsumeikan University, Japan
Infrared imaging has revealed that signatures and symbols are hidden in Buddhist manuscript sutras written in gold and silver inks on indigo-dyed papers during the late Heian period in Japan. We have analyzed them with the help of single image super resolution technology, since many of infrared images are of low resolution. As a result of the analysis, we are led to the conclusion that they suggest that some paper studios, aristocrats or noble priests drew their signs on the papers in order to show their possession.
Digitalizing Old Diary and Reading Multi-layered Everyday Life: A Data Analysis of an Upper-class Elite Man’s Diary (1692-1699) in the Chosǒn, Korea
1EWHA WOMANS UNIVERSITY, Republic of Korea; 2Dept. of Computer Engineering DAEJIN University, Republic of Korea; 3Ewha womans University, Republic of Korea; 4The Graduate School of Korean Studies, The Academy of Korean Studies, Republic of Korea
This research analyzes the text of Jiamilgi(支菴日記, 1692-1699), an 8-year diary written by a man named "Yun Ihu", in Chosǒn period of Korea. By reading an old diary in detail while translating and digitalizing the whole contents, this research attempts to trigger a dialogue between historical studies and computational methods, increase the density of the analysis of the historical materials, and expand the analytical horizons.
In this research, we extract various elements such as persons, historical events, everyday commodities, and places, etc. These elements are to be constructed as Ontology Database, and relations models from various perspectives are to be created through visualization and Quantitative analysis, and general interpretation present a new research methodology as a case of a DH-based diary research, while, at the same time, show the expandability of the existing historical research of the Chosǒn period.
VR Video Production for Interactive Digital Maps
Molloy College, United States of America
This poster session showcases a combination of gear, open source code, and teaching materials for producing VR video experiences that correspond to narrative GIS projects. In addition to our poster, our session will offer faculty the opportunity to experience VR made for narrative-based digital maps and a tutorial for producing VR experiences with accessibility in mind. Faculty will also be able to access the gear in both high cost and low cost production kits, maps to which VR video corresponds, and instructional materials outlining each piece of gear’s use. In addition, we will offer faculty syllabi, access to our projects, as well as the source code for our maps.
Using Ngrams to Develop a Query Algebra for Conceptual History
1Department of Informatics, Karlsruhe Institute of Technology; 2Department of Humanities, Karlsruhe Institute of Technology
We present a query algebra for empirical analyses of temporal text corpora, the Conceptual History Query Language (CHQL). A temporal text corpus in our sense is a set of words and word chains, i.e., ngrams, together with their usage frequency at various points of time, like the Google Books Ngram Corpus. Our query language is meant to be useful for conceptual historians, i.e., be descriptive and complete (match all actual and potential hypotheses of conceptual history), and bear optimization potential to allow fast query processing on large data sets. We focus on an algebra inspired by the German tradition of Begriffsgeschichte (conceptual history), as exemplified by the work of Reinhart Koselleck. We also show first results, namely, the change of the words "East" and "West" from parallel concepts in the geographical sphere to counter concepts in the political sphere.
Topography of Character's Body: a Case of Russian Children's Literature
Higher School of Economics in Saint Petersburg, Russian Federation
This poster presents quantitative data on the representation of characters’ bodies in the corpus of Russian children’s literature, visualized as a series of body heatmaps. Our literary topography represents what parts of a character’s body the author’s pen is allowed to touch. The central question of the research is how the selectiveness of authors in describing their characters’ bodies is related to the demographic features of characters, such as gender and age. Our data reveal gender differences in the representation of female and male characters, point out differences in the representation of adult and child characters, and provide comparative material for the study of character embodiment in literary fiction.
Visualizing Shakespeare’s Sonic Signatures
Carleton College, United States of America
Good authors imbue their characters with distinctive voices that are often discernible devoid of explicit dialog labels, both by their word choice as well as sometimes by the actual sound of the words. For instance, in Shakespeare's Othello, the speech of the titular character is said to be characterized by longer, rounder vowel sounds than the quick speech of his counterpart Iago. Such a phenomenon provokes a wide variety of questions. Can we detect these differences in speech computationally? If so, what would it tell us about these characters? What would it tell us about the author? We developed a web-based tool to visualize the differences between the “sonic signatures” of different characters within Shakespeare’s plays.
Diving Into The Complexities Of The Tech Blog Sphere
1German Historical Institute (GHI), Washington DC, United States of America; 2Roy Rosenzweig Center for History and New Media, George Mason University, USA; 3Berlin-Brandenburg Academy of Sciences (BBAW), Germany
Following the assumption that the tech blog sphere represents an avant-garde of technologically and socially interested experts, we describe an experimental setting to observe its input on the public discussion of matters situated at the intersection of technology and society. Our interdisciplinary approach consists in joining forces on a common base of texts and tools. This cooperative research effort stems from researchers working on the impact of digital media on democratic processes and institutions (German Historical Institute, Washington DC and the Roy Rosenzweig Center for History and New Media at George Mason University), corpus and computational linguistics for texts and microtexts written in German (Berlin-Brandenburg Academy of Sciences, BBAW), and linked open data for Digital Humanities projects and digital archiving at the Alexander von Humboldt Institute for Internet and Society in Berlin.
Chinese Dunhuang Mural Vocabulary Construction Based on Human-machine Cooperation
Wuhan University, China
Being a significant intersection of Western and Eastern culture and economy on the ancient Silk Road, Dunhuang is regarded as a treasure trove of world culture and art. As one of the important forms of Dunhuang cultural heritage, Dunhuang mural is of great value for research on history, art and religion, etc. However, the absence of Dunhuang mural vocabulary imposes a limit on Dunhuang mural studies and its value exploration. The construction of current vocabularies in humanities relies heavily on manual processes, with longer construction cycle, bigger costs.In this project, we explore the human-machine cooperation mechanism, which is realized by a combination of a top-down process and a bottom-up process, of vocabulary construction, and then we apply the mechanism to our vocabulary construction, with which we hope to improve the efficiency of construction, and more importantly, promote humanities studies and further the development of Dunhuang mural digital humanities applications.
Training NLP Models for the Analysis of 16th Century Latin American Historical Documents: Tagtog and the Geographic Reports of New Spain
1Digital Humanities Hub-Department of History, Lancaster University, United Kingdom; 2Templo Mayor Museum, National Institute of Anthropology and History, México; 3tagtog.net; 4Computer Science and Engineering Department of IST, University of Lisbon, Portugal
The aims of this poster are to present the annotation model created to deepen knowledge and understanding on economy and society during the 16th century New Spain and the use of tagtog.net (an online tool for automatic annotation) to create and curate the resources required for developing NLP tools.
The CORLI Consortium: CORpus, Languages and Interaction
1INSERM/Modyco, CNRS/U. Paris Nanterre, CORLI, France; 2ICAR/CNRS, CORLI, France; 3U. Nice Sophia Antipolis, CORLI, France
CORLI (CORpus, Languages and Interaction) is a consortium of Huma-Num (https://huma-num.fr) dedicated to the sharing of methodological approaches, tools and software, best practices and training within the community of linguists building and investigating corpora.
We present the complexities underlying ours goals.
Script Analysis In A World Of Anonymous Writers
1Huygens ING, Netherlands, The; 2Utrecht University
The presented project attempts to create a digital tool, based on a deep learning system, for the automatic clustering and classification of medieval scripts. The projects responds to the increasing amount of digitally availalble manuscript collections. It aims to develop a new approach to recognize patterns in medieval scripts, which can help manuscript scholars to compare, date and localize the production of medieval writings and to gain new insights in the evolution and distribution of script types during the medieval period.
Conveying Uncertainty in Archived War Diaries with GeoBlobs
1City, University of London, United Kingdom; 2The National Archives, United Kingdom; 3University of Victoria, Canada
We introduce GeoBlobs, a visualization technique to represent ambiguous spatio-temporal data derived from handwritten War Diaries from the First World War (WWI), documenting the story of the British Army and its units on the Western Front.
ISEBEL an Intelligent Search Engine for Belief Legends
1KNAW Humanities Cluster, Netherlands, The; 2Meertens Institute, Netherlands The
Distributed around the globe more databases of folktales, including belief legends, have come into existence. Combining them might open up new and exciting research possibilities. ISEBEL is a project aiming to create a search engine that makes exactly this possible by providing unified search over the participant's database, while dealing intelligently with the various languages.
Topic Modeling with Interactive Visualizations in a GUI Tool
University of Würzburg, Germany
The DARIAH-TopicsExplorer is software that allows researchers to do topic modeling their own computers, with their own text collections, relying on a graphical user interface for the entire process from unprocessed texts to visualized results.
Early prototypes and a number of 1.x versions have been presented to researchers and students in various workshops. These workshops generated user feedback that has fueled further development, resulting in a standalone software for Windows, MacOS and Linux that features interactive visualizations and the export or results in csv format.
The latest version 2features a completely redesigned interface that allows to browse through a topic model, explore the properties of a single document and find other texts with similar or related content.
With the development of the TopicsExplorer, we hope to increase the number of researchers that can use topic modeling, understand the method and are able to critically discuss it.
The Birth of Boston: Reconstructing Boston’s Social History in 1648
"The Birth of Boston" project uses one of the only maps that exists of Boston in the seventeenth century and makes its history interactive with the our online interface. It is a movable web-map which users can click on land parcels that made up the town in 1648 to see details of each Boston inhabitant. The webmap was created with ArcGIS and combines the geographic data from the Samuel Chester Clough collection and person data from the Annie Haven Thwing collection, both housed at the Massachusetts Historical Society. The citizen data ranges from information on a person's spouses and children, to their occupation and participation in the Church and municipality. Commercial and legal documentation is also included, if the records exist. Overall, "The Birth of Boston" is a resource that incorporates spatial and social data to create a history of Boston's early years of settlement.
Indexing and Linking Text in a Large Body of Family Writings
Université Paul-Valéry Montpellier 3 - Praxiling UMR 5267 CNRS
This poster presents Corpus 14, a corpus of correspondences between French soldiers and their families during WW1. We describe the TEI encoding of the writings and the ongoing project to develop a visualisation of the correspondences exploiting Named Entities annotation and Semantic Web resources.
Publishing Digital History: Integrating Methods, Sources, and Argument
University of North Carolina, Chapel Hill, United States of America
In this poster, we demonstrate how digital texts present a new and unique form of scholarly argumentation that both challenges and extends traditional methods by outlining our framework for a new digital book, Voice of a Nation: Mapping Documentary Expression in New America. This digital manuscript recovers the significant history of the Southern Life History Project (SLHP) by applying computational methods to analyze the collection. The SLHP was a unique project created under the New Deal in the U.S. to capture the stories over everyday Americans, especially those who had traditionally been marginalized in the historical record. The poster will make explicit the scholarly intervention of the project and then explain how the book’s arguments are being conveyed through digital forms, specifically organized around layers and thick mapping building off of the spatial turn in digital history.
Comparing diagrams in Euclid’s Elements
King's College London, UK
Diiagrams are crucial to Greek mathematics and necessary to reading the text, but he notes that this fact was little discussed in modern literature. In recent years, however, there has been a growing interest in including diagrams and the manuscript evidence in the preparation of scholarly editions.
This poster aims to intorduce a new research project on the potential of automated collation for non-textual data such as mathematical diagrams, focusing on the case of Euclid’s Elements.
Encoding the ‘Floating Gap’: Linking Cultural Memory, Identity, and Complex Place
Bucknell University, United States of America
In this poster the authors present a model for encoding what ethnographers term the “floating gap” when constructing an historical gazetteer of place names. This step is especially crucial as scholars make intersections and linkages between place-based, data-driven research projects. The authors argue that the concepts for Event and Place used to encode semantic relationships overlook the fact that it is the Actor or Agent who names the events, and thus by extension names the places at which those events occurred. Place names connected with those events must correspond to those agents. In the brave new world of linked data, the vagaries of named places constitute a vexed problem, and attempts to resolve the messiness and fuzziness of place, time, and perspective run the risk of eliding the floating gap of cultural memory.
Traveltext: (Re)Writing the Eastern Mediterranean, Complexities and Simplicities.
Gennadeius Library, American School of Classical Studies, at Athens, Greece
Traveltext focuses on travel account of the sixteenth and seventeenth centuries. It progressively however will incorporate the eighteenth and nineteenth centuries. It includes accounts written in English, French, German, Dutch and Italian. All accounts have been carefully read, and indexed based on two basic criteria, space and subject. The regions each traveler visited constitute a separate entry, cities, such as Istanbul and Athens included, and each of these entries is accompanied by a list of all themes each author wrote about each specific place, from political concerns to antiquities and relics to gift exchange and dinners and receptions. Further on, each passage, and list of themes is followed by specific tagging, with information deriving from the material, alongside relevant iconography. In short, Traveltext provides a detailed taxonomy of the contents of west European account to the Ottoman Empire, and permits the user to approach the material comparatively.
Construction of a Corpus of “Christian Materials” for the Study of Colloquial Japanese of the Muromachi Period
1National Institute for Japanese Language and Linguistics, Japan; 2Nagoya Women's University, Japan
The main contribution of our paper is that we constructed a corpus of “Christian Materials,” documents written by Catholic missionaries who came to Japan from the 16th to the 17th century AD. The original texts of our corpus were written in the Japanese colloquial language of the time and in the Roman alphabet with Portuguese spellings, thus these are quite valuable for the study of colloquial Japanese in Muromachi period. Our corpus has three features.
1. Morphological information is annotated for each text.
2. The corpus has two texts, the Roman alphabet text and the Japanese character text.
3. The corpus includes a direct link to the image of the original print from the British Library.
With these features, this corpus not only functions as an index, but also enables more advanced research and statistical analyses in a wide range of fields, including phonetics, grammar, notation research, and so on.
Czech Literary Bibliography: Database Mirror of 250 Years of the Modern Czech Literature
Institute of Czech Literature, Czech Academy of Sciences, Czech Republic
Proposed poster shall present the datasets and the current DH related projects of the Czech Literary Bibliography research infrastructure (CLB), which is nowadays continuously operated for more than 70 years under the long-time developed methodology. The CLB comprises a set of bibliographical and other specialized databases processing the scientific informations on the Czech literature. The parameters of the CLB bibliographical databases make them the most extensive specialist bibliography in the Czech Republic and one of the most complex sources of the literary-scientific informations in Europe.
The stress shall be put on the current project „Czech Literary Internet“, centered i. a. on the development of the set of superstructural analytical and statistical tools for the visualization of the selected bibliographical data, and project „RETROBI“, within which large card catalogue was digitized and presented in the specialized software enabling i. a. the semistructured queries in the OCR-based representations of the original cards.
The European Literary Text Collection (ELTeC)
1University of Trier, Germany; 2Humboldt-Universität Berlin, Germany; 3Universidad de Alicante, Spain; 4Institute of Polish Language, Polish Academy of Sciences, Kraków, Poland; 5Independent Consultant
The COST Action Distant Reading for European Literary History is a collaborative, interdisciplinary network which aims “to facilitate the creation of a broader, more inclusive and better-grounded account of European literary history and cultural identity”. The network consists of European researchers from different disciplines and research fields such as computational linguistics, corpus linguistics, and (digital) literary studies. Currently, over 100 researchers from 30 different countries are working together in the Action. With the present poster, we would like to present our strategy for developing a key output of the project, the corpus which serves as an empirical basis for our project.
Traditional Methods Of Textual Criticism Vs. Juxta Commons: A Study Of One Poem Existing In Many Versions
Università Cattolica del Sacro Cuore di Milano, Italy
The paper presents a study of Uljalaevshhina (1924–1960s), famous soviet poem by Il'ya Sel'vinsky (1899–1968). Our aim was to reconstruct the history of Uljalaevshhina using traditional methods and digital instruments. First we collated the versions "manually", using common text editors apps; than we applied quantitative methods (in particular we created and compares the frequency dictionaries of the versions); finally we made collation sets in Juxta Commons.
The paper discusses advantages and disadvantages of Juxta used for the work of textual critic, and proposes the options that, in our opinion, would help the system to be more effective.
Nodes and Edges in Literary History. Modelling 19th Century Literary Landscapes
1University of Copenhagen; 2Society for Danish Language and Literature, Copenhagen; 3National Research University Higher School of Economics, Moscow
Who were the protagonists of 19th century European literature? And what are the promises and pitfalls when it comes to the modelling of the composition and dynamics of historiographical works with the means of network analysis? These are the central questions to be addressed and displayed in this poster. It aims to show the results of our endeavours to analyse and visualise central and, hopefully, new aspects of Georg Brandes’ Main Currents of 19th Century Literature (6 vols. 1872-1890), a vast and complex work, regarded not only as historiography, but also as a text reliant on features of fictionality, such as narration and plot.
Exploration of the Seventeenth Century Japanese Authors’ Writing Style Using a Quantitative Approach
Osaka University, Japan
This study aims to an exploration of the seventeenth century Japanese authors’, Saikaku Ihara (c.1642–93), Dansui Hōjō (1663-1711) and Ichirōemon Nishimura (?-c.1696), writing style from a quantitative point of view. In this study, we compared Saikaku, Dansui, and Ichirōemon by the most frequent words, Japanese particles, Japanese particle bigrams, character unigram, character bigrams and character trigrams using principal component analysis (PCA) to see the differences in each author. Thus, Saikaku, Dansui and Ichirōemon’s novels made each group. Moreover, as said in qualitative research, Saikaku and Dansui’s novel showed closer and Ichirōemon showed different characteristics, specially Sayogoromo. We on-going digitize Dansui, Ichirōemon and the other writers' text data. In the future analysis, we will add works and the other writers for comparisons the relationship of the seventeenth century Japanese authors works.
Digital Humanities in European Research Libraries - a Survey
1Koninklijke Bibliotheek, National Library of the Netherlands, Netherlands, The; 2Maastricht University, Faculty of Arts and Social Sciences, Netherlands, The; 3University Library Humboldt University Berlin; 4Glucksman Library, University of Limerick; 5Consortium of European Research Libraries; 6University of Edinburgh Library; 7Dublin City University Library; 8KU Leuven Libraries
This paper will present the results of a Europe-wide survey amongst all 400 members of the European association for research libraries (LIBER) on their activities relating to digital humanities (DH).
In Europe, DH activities have seen a growth in academic libraries, but little has been published about this and no attention has been given to the library community as a whole. This survey, conducted by the LIBER working group Digital Humanities & Digital Cultural Heritage, is the first endeavor to map all activities of European research libraries in DH.
The Diaries of John Quincy Adams Digital Project
The Adams Papers, Massachusetts Historical Society, United States of America
This presentation will explain the progess that has been made since 2016 on the Diaries of John Quincy Adams Digital Project and the search tools that still need to be built. It will discuss the decisions that have been made on the project to provide increased access to a middle and high school audience. It will also highlight the challenges that come with utilizing a work force made up of staff members, interns (graduate students, undergraduates, and high schoolers), and volunteers.
The Prepare and Visualize Mallet Data Spreadsheet
New York University, United States of America
This paper will describe and demonstrate an Excel spreadsheet that imports data from Mallet topic models, visualizes the strength of the words in the topics, re-formats data about the strengths of topics in individual texts and the texts that are most heavily weighted for each topic. It also graphs the weights of topics in the texts, allows the user to toggle topics on and off in the graph, and also allows the user to select a topic and text and highlight the words of the topic in the text and then add highlighting for up to three additional topics.
Buy Healthy, Tasty, Pure! A Digital Text Analysis of Neoliberal Trends in Dutch Food Culture
1Utrecht University, Netherlands, The; 2Humanities Cluster, Royal Netherlands Academy of Arts and Sciences, Amsterdam, Netherlands, The
By digitally studying a commercial food magazine, this paper aims to bring to the surface the slow rise of neoliberal thinking in Dutch food culture. The impact of neoliberalism in food culture can be traced back to the postwar period and has been summarised by the paradox that worthy neoliberal citizens "must want less while spending more". This paper focuses on the extent to which the major Dutch retailer Albert Heijn appropriated this tendency in its framing of the products it tried to sell. A digital text analysis of Albert Heijn's food magazine Allerhande for the period 1954-1973 shows, for example, how the qualifications of "healthy", "slim", "tasty", and "pure" became discursively interlaced from the mid-1960s, and how the magazine offered a platform for products like margarine to brand themselves within this discourse. For the final paper, the studied period will be extended to cover the period until 2010.
An Unsupervised Lemmatization Model for Classical Languages
University at Buffalo, United States of America
The lemmatization task, wherein a computer model must group together inflected forms derived from the same stem, or 'lemma,' is a fundamental problem in language modeling. Software that handles more complex humanistic tasks like authorship attribution or intertextuality detection relies on lemmatization as an early step. Current models rely on either a set of annotated data, which can be labor-intensive to generate, or large aligned corpora, which do not exist for most classical languages. This paper presents a simple algorithm for deriving a lemmatization model from a corpus of text and a chart of possible word-forms, whose performance is on par with the best available models derived from annotation data or aligned corpora.
Extracting Drum Patterns in Traditional Folk Songs Among East Japan
1Doshisha University, Japan; 2Noboribetsu Onsenkyo Takinoya, Japan
In this research, in order to empirically capture the rhythm patterns hidden in traditional Japanese music, we focused on folk songs (having the most primitive characteristics of Japanese music) and analyzed the rhythm of the Japanese drums. This study aimed to compare the rhythm patterns of Japanese drums in the folk music of East Japan quantitatively and to clarify the rhythmic features of each region.
Paving the Way to Linked Open Data: Evaluating the Path to LOD for the Census of Antique Works of Art and Architecture Known in the Renaissance
Berlin-Brandenburg Academy of Sciences and Humanities, Germany
The Census of Antique Artworks and Architecture Known in the Renaissance identifies and collects antique monuments and related Renaissance documents in a database. Established in 1983, data has continually been added to the database. Due to constraints of the currently utilized proprietary system (easydb 4), the Census project is evaluating how to port its data and research supporting functionalities regard a) openness, b) usability and c) maintainability. We evaluate multiple web applications that promise easy to understand user interfaces and Linked Open Data capabilities. However, we conclude that another holistic system always carries the risk of high maintenance in the future and instead focus on establishing an RESTful API centric approach. This approach makes it easier to swap, extend and update front- and back-end components as well as the database if the need should arise, as long as the API still functions as specified.
Lemmatizing Low-resource, Less-researched Languages: The Linked Open Text Reader and Annotator
Goethe University Frankfurt
There is a well-known dichotomy for the methods in natural language processing: data-driven versus rule-based approaches: the former requires a large amount of data while the latter requires deep knowledge of the language in question and is usually used for less-resourced languages.
The problem arises when the language is neither well-documented nor has a sufficient amount of data, which means that neither methodology could be used. This case often arises when developing computational aid for linguistic research of minority languages which are yet to be documented.
In this paper, we present our experiments for dealing with such languages, namely, performing lemmatization — finding a dictionary headword for wordforms. We show several methods which can be applied to any language and explore how much effort is required to achieve reasonable quality. We also present a new tool that allows linguists to use those methods for their language data.
“Fear in your Eyes”: Analyzing Threat Perception and Its Influence on Deadly Use of Force by Police Officers against Civilians Using Hebrew NLP Tools
1Faculty of Law, Hebrew University, Israel; 2Department of Computer Science, Ben Gurion University
This study seeks to explore and empirically substantiate the effects of the gap between objective and subjective threat perceptions of police officers in cases of lethal use of force against civilians.
It focuses on a newly accessible corpus of digitized transcripts of police officers who testified before an Israeli State Commission of Inquiry in Hebrew. Combining topic modeling and sentiment analysis to explore subjective threat perceptions of duty holders, we aim to establish a deeper, empirically validated, understanding of serious human rights violations.
The study makes an important methodological contribution to the field of NLP for non-English languages and non-Latin characters, by applying, for the first time, newly developed NLP tools for Hebrew on a Modern spoken Hebrew corpus. As such, it also contributes to the diversification of the DH community be opening it to minor languages.
Deciphering Lyrical Topics in Music and Their Connection to Audio Feature Dimensions Based on a Corpus of Over 100.000 Metal Songs
1University of Vienna, Austria; 2Otto-Friedrich-University Bamberg
Our contribution examines the connecton between audio features and lyrical content of metal music. To do so, we combine a previously trained prediction model for automatic extraction of high-level audio features such as "hardness" and "darkness" with Latent Dirichlet Allocation used on a novel corpus of over 100.000 Metal lyrics.
Modelling a Catalogue: Bilingual texts in Tuscan Middle Ages (1260-1430)
University of Venice 'Ca' Foscari'
This poster introduces a domain-specific ontology built for a catalogue conceived for the study of multilingualism in a medieval context. The catalogue is a final outcome of a systematic investigation from a five-years project of various literary documents which circulated simultaneously in more than one language in Tuscany, and especially Florence, between the mid-13th century and the beginning of the 15th century. The conceptual model for this catalogue may serve as a model for the study of similar processes in other language regions.
Towards Creating A Best Practice Digital Processing Pipeline For Cuneiform Languages
Mainz University Of Applied Sciences, Germany
This publication introduces a proposal for a best practice workflow to document and annotate cuneiform tablets in terms of linguistics, semantic annotation, digital paleography, 3D scan annotation and information extraction which is to be used in an upcoming research project dealing with the documentation of cuneiform tablets in Haft Tappeh. The goal of the best practice workflow is to create a pipeline which allows non-technical users to create datasets which can be used interdisciplinary and therefore supports a variety of formats and ontologies. Challenges with this particular pipeline and used tools, results and data provision is explained.
Designing Multilingual Digital Pedagogy Initiatives: The Programming Historian for English, Spanish, and French speaking DH Communities
1Universität zu Köln-Cologne Center for eHumanities; 2University of Sussex; 3Institut de recherches historiques du Septentrion
With this poster we would like to discuss the design of a multilingual strategy for The Programming Historian and to reflect on some theoretical concepts such as contact zone, lingua franca vs. vernacular, and writing for a global audience that foster our current work as editors. We would like also to present our recent progress made during the transition from a monolingual digital pedagogy initiative to a multilingual one available in English, Spanish and French: on the one hand, after translating 41 lessons from English, in April 2019, The Programming Historian en español has recently published two original lessons while two more are currently under review. On the other hand, also in April 2019, the Programming Historian en français was launched and has already published one translation from English into French.
Italian Resistance Goes Digital: Event And Participant Extraction From Partisans’ Memoirs
1Dipartimento di Informatica, Università di Torino Italy; 2Data and Web Science Group, University of Mannheim, Germany
This contribution reports on a recently concluded project that developed a system for extracting events, participants and their roles from a digitized corpus of Italian memoirs of Resistance members during the Second World War. In particular, in our work we have adopted and adapted resources, techniques and tools from research literature in information extraction to provide advanced semantic access to the collection. We chose events as structural concept for extracting and representing textual information in a text-to-data application to historical memoirs.
A Paper Full of Things. Quantitative and Qualitative Approaches to Early Modern Newspaper Advertisements.
University of Basel, Switzerland
Intelligence newspapers, quite popular in the eighteenth century, contained all kinds of classified advertisements, for selling books, renting rooms, offering jobs or looking for lost handkerchiefs or poodles. The huge mass of information contained in seemingly innumerable ads – these newspapers often appeared over decades – is the reason why this source type has not been considered for closer, systematic studies so far. By using computational methods and digital tools, we want to facilitate an extensive and comprehensive analysis of intelligence newspapers, combining quantitative with qualitative approaches. The segmentation of the issues of one source into single ads, connected with the underlying text, serves as the starting point for data mining as well as different research questions, and therefore for different workflows. Cascading classification of the ads into ad and content types makes it possible to handle the mass of advertisements and to prepare it for further and collaborative research.
Epistolary networks in Italian Humanism: Collecting, editing, analysing Italian humanistic letters - 1400-1499 (with a critical edition of familiar letters of Iovianus Pontanus)
1Università della Basilicata, Italy; 2Università Federico II di Napoli, Italy
The current project aims to encourage a critical review of the epistolary production of XVth century through the creation of an online platform intended to map and catalogue letters, collect metadata, analyse epistolary networks, edit texts free from copyright and unpublished works (starting with an edition of Pontanus’ familiar letters). The project consists of three phases: 1. Get 2. Analyse 3. Represent - disseminate. To execute these three phases, we will use the native XML-db application server eXist-db. The XML format used to store and index data is TEI. The choice to use an XML native db and TEI-XML format from the beginning will enable us to extend the metadata collected to the full-text at any time, without facing platform migrations. To publish the results in machine-readable format, to create linked data in the eXist-db API, and to project data in other formats Apache Jena will be used.
Building a Diachronic and Contrastive Parallel Corpus - and an Intended Application in the Form of a Study of Germanic Complex Verb Constructions
1University of Gothenburg, Sweden; 2Meertens Instituut, The Netherlands; 3Radboud University Nijmegen, The Netherlands
We present a parallel corpus under construction, which is parallel in diachronically (through time) as well as contrastively (between languages). The corpus is made up of Bible texts spanning almost 6 centuries in 4 languages. Our project's direct purpose of building the corpus is to track the development of verb combinations containing multiple auxiliary verbs through time in German, Dutch, English and Swedish. We will also make the corpus available to other researchers.
In this poster, we discuss the design of the corpus, our selection of sources, issues with bringing together a wide variety of sources, and alignment of the data. We will also touch upon intended future work concerning the automatic linguistic processing needed to facilitate the study of verb constructions, and the methodological challenges of doing corpus linguistic research on the varying quality of annotations produced by automatic methods on materials from such a wide range of origins.
Digital Edition And Linguistic Database: A Fully Lemmatized And Searchable Model
King's College London, United Kingdom
This poster reports on a fruitful collaboration between specialists in the fields of medieval French literature, lexicography, and digital humanities. The outcome will be the first fully lemmatized digital edition of a medieval French text, with a search page directly linked to one of the principal Old French dictionaries. The project team are editing the two most important manuscript copies of the earliest universal chronicle in French. In addition to publishing the first complete edition of the text, it will provide an extensive corpus (over 350,000 words) of lemmatized and searchable linguistic data. The principal development in the last six months is the integration of the data from lemmatization into the edition’s faceted search page. In addition to reporting on some of the challenges faced during this period, the poster will outline the custom pieces of software developed as part of the overall editorial workflow.
Recreating Dante’s Commedia in VR: The Intersection between Virtual Reality and Literature
University of California, Berkeley
The digital humanities project “Virtual Commedia” explores the intersection of virtual reality technology with literature, through the development of a virtual reality simulation of Dante Alighieri’s Commedia. The goal of the project is to create a 3D, immersive, and interactive tool of data visualization that functions as a high-concept “network” of the text. The VR user may move through the simulation linearly, following the poem’s chronology, or thematically, interacting with the network by tapping specific “nodes” of interest (recurring motifs, important encounters, or rhetorical turns, for instance). Due to the thematic complexity of the Commedia’s narrative and to the incredible visuality produced in its language, the text lends itself well to this virtual visualization. The project also raises questions about the developing intersection between literature and virtual reality technology; the future of reading and literary criticism; and opportunities for unprecedented scholarly collaboration within the virtual reality space.
Word Embeddings for Processing Historical Texts
Fondazione Bruno Kessler, Italy
In the last years, word embeddings have become important resources to deal with many Natural Language Processing tasks. Several pre-trained word vectors have been released starting from huge amount of contemporary texts. The interest towards this type of distributional approach has recently emerged also in the Digital Humanities community with studies on vectors built from historical or literary texts and employed to track semantic shifts. This submission aims at expanding current research on historical word embeddings by presenting a set of English vectors pre-trained on a corpus of texts published between 1860 and 1939 with three different algorithms. These embeddings have been used to train a new model for the identification of place names in historical travel writings achieving very satisfactory results in terms of precision, recall and f-measure.
'Alle Begjin Is Swier': The Use Of The Frisian Web Domain Web Data For Digital Humanities Research
Koninklijke Bibliotheek, Netherlands, The
This poster will describe the plans and the preliminary results of The Koninklijke Bibliotheek – National Library of the Netherlands (KB-NL) to map, harvest and create a web data set out of the Frisian web domain starting with the .frl TLD. KB-NL as a national library has collected born digital material from the web since 2007 through web archiving. It makes a selection of websites with cultural and academic content from the Dutch national web. A future harvest of the Frisian web domain will provide future researchers with an unique born digital data set of a minority language which can be combined with other similar data sets of the Frisian language.
'How the World Jogges': Interconnectedness, Modularity and Virality in Seventeenth Century News
Queen Mary University of London, United Kingdom
Civil-war era London housed an early relatively free newspaper industry. This included regular news from abroad: news mostly from Europe but occasionally further afield. The regular structure of these early 'newsbooks' means that structured data can be mined from the texts. This poster will outline the methodology used to create such a structured dataset, which can be used in several ways: mapping geographic news 'hotspots', understanding the temporal variances in the transmission of news, and, for this poster, network analysis.
The poster will demonstrate how combining such a dataset with network analysis has led to the discovery of communities of news and information, specifically using network modularity and community detection to suggest the extent to which Europe could be divided into individual clusters of cities closely linked by the sharing of information, and how this can be used to understand the viral nature of early modern news.
A CLARIAH Environment for Linguistic Research
KNAW/HuC, Netherlands, The
The CLARIAH Virtual Research Environment offers a rich set of features, with the aim to provide researchers with uniform access to an increasingly diverse landscape of linguistic resources, tools and services. Thus, lowering the barrier for researchers to apply Digital Humanities methods.
A Collaborative System for Digital Research Environment via IIIF
1International Institute for Digital Humanities, Japan; 2The University of Tokyo
The poster will present a collaborative system for digital research environment by use of IIIF (International Image Interoperability Framework) which has recently spread among cultural institutions in the world in order to make their hi-resolution Web images interoperable. As a use case, the authors adopted a system for digital facsimiles of Buddhist scriptures which have released as parts of digital collections in the world. The system aggregated the distributed digital images into the system, embedded metadata, and provides them as JSON data with a collaborative manner. The system and the workflow will be useful for various field in the humanities.
A Digital Enquiry On The Italian Reception Of The English Novel In The Periodical Press Of The Long Eighteenth Century
This short paper aims at showing the first results of the ongoing collaborative research project "The reception of the English novel in the Italian literary press between 1700 and 1830: a transcultural enquiry into the early shaping of the modern Italian literary and cultural identity". The project aims at investigating the reception of English novels in the Italian literary press during the Long Eighteenth Century (1700-1830). The analysis focuses on an existing corpus of data relative to the publication, dissemination, translations, critical reviews, and editorial advertisements of English novels in Italian literary newspapers and journals of the time. The main purpose of the project is to uncover how the English novels were introduced to the Italian readership through literary journalism with the application of digital methodologies of investigation.
A Geo-Sampling Model To Analyse Micro Level Historical Agricultural Production Data For Mid-19th Century Southeast European and Anatolian Regions
Koc University, Turkey
We are working on a sampling strategy to aggregate and analyse household based detailed micro level individual data on agricultural production for 5 regions in the Ottoman Empire in the 1840s. In five regions centred around cities of: Ankara and Bursa in Turkey; and Plovdiv and Ruse in Bulgaria; and Bitola in the Northern Republic of Macedonia, we have geo-located close to 2,000 villages. For almost all of the inhabitants of these villages we have detailed, household based agricultural survey data from 1845 in the Ottoman state archives. This massive data have never been aggregated or tabularised. We are working on a HGIS sampling method by taking into consideration of soil depth and quality (using remote sensing soil data from ESDAC (European Soil Data Centre: https://esdac.jrc.ec.europa.eu/) for Bulgaria); agricultural suitability using SRTM data with 30 meter spatial resolution; and connectivity (based upon historical transport networks from 1900s).
A Kind of Magic: Migrating a Large Digital Edition of Letters into a New Infrastructure
University of Graz, Austria
The poster will introduce the approach taken to migrate a large-scale digital letter edition with accompanying material to a new technical infrastructure.
The underlying project is concerned with the work on the scientific estate of 19th century linguist Hugo Schuchardt. The primary objective is the edition of the scientific correspondence, an endeavor being underway since decades.
The database comprises nearly 6500 full-text transcribed letters with facsimiles and editorial comments. In addition to a large bibliographical database of primary and secondary literature, a former funding period also produced thesauri for persons, places and subjects. The reason for the migration primarily lies in the increasing difficulty to maintain the proprietary infrastructure, which has been developed and extended for more than two decades.
A New Journey Through Shared Ethnological Archives For Understanding Anthropology: The “Archives Des Ethnologues”, A Multifaceted Consortium
1Institut des mondes africains (IMAF); 2Maison méditerranéenne des sciences de l'homme (MMSH), Aix-Marseille Université; 3CNRS; 4Consortium Archives des Ethnologues (TGIR Huma-Num), France
Social anthropologists have produced numerous hitherto in the field that have been sometimes deposited in documentation centre of research facilities. The nine resource centres that make up the consortium Archives des ethnologues, and their partners, house multi-media materials collected by French anthropologists. Once archived, these notes, field notebooks or various papers, these photographs, films or sound recordings are digitized and some of them are posted online in accordance with ethical and legal guidelines.
To combat the misleading way in which digital technology tends to standardise data, the Consortium Archives des ethnologues has chosen to diversify access to this data because the uniqueness of these archives reflects their scientific and heritage value, the wealth and diversity of the societies they attest to, the history of the sciences and the methodologies used in the course of time.
A Scientific Network Serving The Uses Of 3D For The Digital Humanities
1Archeovision - Consortium 3D, France; 2LS2N- laboratory of digital sciences of Nantes; 3institut optique graduate school
The 3D Consortium of the TGIR Huma-Num has been created based on the observation that there are many initiatives around 3D for the Digital Humanities without real coordination between them. The proliferation of initiatives makes the task difficult and only a consortium-type organization can bring together forces in order to define standardized solutions.
The difficulty is increased by the fact that we are dealing with multiple domains, combining science and technology with the humanities. The aim of the consortium is to facilitate discussions by putting together a maximum of research groups that integrate the use of 3D digital data in their scientific practice, to develops tools for acquisition, visualization, interpretation and preservation of data for the Humanities.
A tool for multifaceted analysis of the Old Polish New Testament apocrypha
1Poznań Supercomputing and Networking Center, Poland; 2Faculty of Polish and Classical Philology, Adam Mickiewicz University, Poland
The Polish mediaeval Apocrypha of the New Testament are fundamental not only for the history of Polish culture, but also for the literature and language of the East Slavdom. This is also the most extensive body of the Polish mediaeval writing – they consist of more than 2000 pages of manuscripts. Unfortunately, those texts are largely inaccessible or poorly accessible (unpublished, published only in transliteration) or available only in excerpts. Moreover, the editions remaining in circulation are not sufficient to conduct in-depth research.
Due to their complexity and diverse character the above mentioned texts require a digital way of presentation. Consequently, one of the aims of the project is to develop a tool enabling fully interdisciplinary and multifaceted studies. This tool will be an advanced search engine with the functionality of comparing results based on a meticulously developed database, including, among others, Latin sources, Slavic contexts and the employed themes.
A Web-Based Tool for the Annotation of Scribe Data in Medieval Documents
Fulda University of Applied Sciences, Germany
Innerhalb des letzten Jahres wurde das System Signum zur interaktiven, webbasierten Annotation von mittelalterlichen Handschriften der Bibliotheka Fuldensis entwickelt.
Die zentrale Zielsetzung ist die Erfassung relevanter Eigenschaften einzelner Buchstaben, bzw. von Buchstabenkomplexen innerhalb eines Dokuments.
Die so erfassten Eigenschaften werden als gewichtete Feature-Vektoren betrachtet und sind der Input in einen Klassifikationsalgorithmus.
Hier wird eine Zuordnung vom Feature-Vektor zu einem möglichen Schreiber berechnet, welche dann, so der theoretische Ansatz, dokumentübergreifend eine Schreiberidentifikation ermöglicht.
Durch die schrittweise Weiterentwicklung der Webtechnologie ist es heute möglich, den Annotationeditor vollständig als leistungsfähige Webanwendung umzusetzten, und zwar ohne Qualitätsverlust in der Interaktion und Usability.
Adapting a system for Named Entity Recognition and Linking for 19th century French Novels
1Praxiling UMR 5267, Univ. Paul-Valéry Montpellier 3; 2Lattice UMR 8094, ENS, Univ. Paris 3-Sorbonne Nouvelle; 3ECSTRA, IHEC, Univ. de Carthage; 4CRH UMR 8558 / EHESS
This poster describes a Natural Language Processing pipeline combining two existing tools, one for named entity recognition and classification (NERC) and the other for named entity linking (NEL) for referencing to Knowledge Bases (KB), in other words, Linked Data sets, and their adaptation for use in the annotation of 19th century French Novels. These tasks are crucial for producing enriched Digital Editions as well as for Digital Literary Stylistics and Spatial Humanities which largely rely on Distant Reading techniques. Our pipeline is able to provide a dynamic cartography and allows for the exploration of the spatial dimension of texts by retrieving structured information about places. Besides tools and experimentations, our contribution is more specifically a annotated corpus of 19th century French Novels, and an adapted NER model and KB, reusable resources by the digital humanities and the NLP community.