SP-15: Language, Languages
Thursday, 11/Jul/2019:
9:00am - 10:30am

Session Chair: Manolis Fragkiadakis
Location: Pandora Foyer
C-SALT APIs - Connecting and Exposing Heterogeneous Language Resources

Francisco Mondaca, Felix Rau, Claes Neuefeind, Börge Kiss, Daniel Kölligan, Uta Reinöhl, Patrick Sahle

Universität zu Köln, Germany

In this paper, we present a strategy for the integration of existing heterogeneous language resources like texts and dictionaries by connecting these resources and making them available for internal projects and third party applications through APIs. We describe our approach in the context of the C-SALT (Cologne South Asian Languages and Texts) initiative, where projects and resources hosted at the University of Cologne covering South Asian languages are presented. To illustrate the potential use of our setup, we first introduce VedaWeb, a web-based platform that provides access to ancient Indian texts written in Vedic Sanskrit, the oldest form of ancient Indo-Aryan. Then we describe the C-SALT APIs for Dictionaries. These APIs make several large Pāli and Sanskrit dictionaries available. Building on that, we present the architecture behind these APIs and finally we summarize by analyzing the potential role of APIs in Digital Humanities projects.

A European-Hindustani Dictionary? Reflections on Methods

Anna Pytlowany

University of Amsterdam, Ireland

This presentation is the first report on the project “Hindi Lexicography and the Cosmopolitan Cultural Encounter between Europe and India around 1700” from Uppsala University (UU). The primary goal of the project is to produce an online dictionary (Latin-Hindustani-French) on the basis of the unpublished 'Thesaurus Linguae Indianae' by François-Marie de Tours (1). The shortcomings of the Uppsala project will guide the design of an extended cross-linked online dictionary of early modern Hindustani based on little known wordlists and vocabularies compiled by European merchants and missionaries in the 17th c. India. The novelty of the approach resides in combining multilingual sources describing a foreign language to create a ‘pan-European perspective’, which may offer new comparative insights for the historical linguistics of target languages. If successful, this approach can be applied to other early modern vocabularies constituting unique and valuable descriptions of non-European languages.

Poetry In Motion: Quantified Self Data And Automated Poetry Generation

Justin Tonra1, Brian Davis2, David Kelly1, Waqas Khawaja3

1National University of Ireland Galway; 2Maynooth University; 3Insight Centre for Data Analytics, Galway

Eververse is a project which synthesises perspectives from disciplines in the humanities and sciences to develop critical and creative explorations of poetry and poetic identity in the digital age. Deploying tools and methods from poetic theory, data analysis, and Natural Language Generation (NLG), which is the automatic production of natural language output from a non-linguistic data source. Eververse uses data from quantified self (QS) devices to automatically generate and publish poetry which correlates to the wearer/poet’s varying physical states.

From the Margins to the Center: A Method to Mine and Model Complex Relational Data from French Language Historical Texts

Ashley Sanders Garcia

University of California, Los Angeles, United States of America

This project employs the spaCy Python library to build an information extraction system to mine personal relational data in French language sources. As a test corpus, it uses four digitized, OCRed, and hand-cleaned nineteenth-century French chronicles of Ottoman Algerian history in order to model socio-political networks and uncover the positions and roles of women in this society. The challenge is to extract not only named entities and their relations to one another, but to extract unnamed persons and their relationships as well. Those who remain unnamed are most often women, servants, slaves, and Indigenous people – the very people about whom scholars are most anxious to know more. This short presentation will share the complete information extraction code, its accuracy, the resulting visualizations, a brief analysis from the case study, and additional use cases that extend far beyond the initial case study to other languages and textual sources.

Using Network Analysis to Do Traditional Chinese Phonology Study

Jiajia Hu

Beijing Normal University, China, People's Republic of

Traditional Chinese Phonology, lacking of alphabetic system of phonetic notation such as IPA, had to deal with large written materials in Chinese characters, and used Chinese characters as a tool to analyze sounds of words. This brings up a significant feature of its study, that is, the relationships of words’ sounds are more important than their phonetic values.

Xìlián (literally: "inter-link") is one of the most important methods in traditional Chinese phonology. Its fundamental is to build networks of Chinese characters having same syllabic elements. This paper takes Xìlián of Fǎnqiè in Guǎngyùn as an example to show how to use network analysis and visualization software to improve traditional Chinese phonology study.