A Recap on the CLARIN Café on Linguistic Linked Data

Submitted by Linda Stokman on 17 May 2021

About

The CLARIN Café on Linguistic Linked Data, took place via Zoom on Thursday, April 29. It was attended by over 80 participants who came from a number of different backgrounds and interests as well as different geographical locations (including Australia).

As the name suggests, the café was devoted to the subject of linguistic linked data (LLD) with a focus on the relationship between the CLARIN infrastructure and the LLD community and a view to discussing common goals and strengthening collaborations between the two. The first part of the café consisted of a series of presentations by experts (described in more detail below). The second part was devoted to a discussion.

The whole event was opened by Dieter Van Uytvanck, Technical Director and Vice Executive Director of CLARIN , who gave a brief introduction on the technical and knowledge sharing infrastructure of CLARIN and briefly described a number of CLARIN linked data related initiatives.The speakers were introduced in their turn by Fahad Khan, a member of the Italian national CLARIN consortium and researcher at the Istituto di Linguistica Computazionale (ILC-CNR) in Pisa. A number of mouthwatering sweets and pastries also made their appearance throughout the various presentations.

Watch the recording of the Opening and CLARIN 101 on the CLARIN YouTube channel.

LLD: An Introduction

Christian Chiarcos started the proceedings with an introduction to the area of linguistic linked data which covered the foundations of the field. This included a discussion of interoperability in language data. It also covered the basics of linked data, including an introduction to RDF and the linked open data cloud, and the growth in popularity of linguistic linked data.

Watch the recording LLD: An Introduction on the CLARIN YouTube channel.

LLD: Benefits and some ongoing initiatives

Next, a presentation was given by Jorge Gracia, a researcher from University of Zaragoza and Chair of the “European network for Web-centred linguistic data science” (NexusLinguarum). This short talk served to reflect on the main benefits and opportunities that linked data technologies bring to the field of language resources. Further, some ongoing initiatives aimed at exploiting LLD were briefly introduced, such as the NexusLinguarum COST Action and the Prêt-à-LLOD project.

Watch the recording LLD: Benefits and some ongoing initiatives on the CLARIN YouTube channel.

A view into metadata for LLD

The next presentation was given by Penny Labropoulou, who is a Researcher at ILSP/ARC working on Research Infrastructures, such as the Greek CLARIN, and metadata models for Language Resources. After a short introduction on LD principles and how they apply to metadata, she gave an overview of the most popular metadata vocabularies for LLD. Drawing on the use cases of LingHub and VLO, she discussed the way semantic interoperability is accomplished in catalogues collecting metadata records from various sources and the way they harmonize heterogeneous descriptions of resources. Similarly, she used the Language Resource Switchboard and the clarin:el Workflow Registry to explore semantic interoperability issues across resource types, such as automatic matching between datasets and candidate processing services. The presentation aimed to show the benefits of LD methods and the way these can be exploited within the CLARIN infrastructure.

Watch the recording A view into metadata for LLD on the CLARIN YouTube channel.

Prefixes Matter. CLARIN and LLD in the light of the LiLa Knowledge Base

Marco Passarotti presented an application of LLD to interlink distributed resources for Latin. After discussing a number of contributions that CLARIN and LLD can provide in support of each other, he detailed the architecture of the LiLa Knowledge Base, which makes Latin resources interoperable by linking their components (like tokens in textual corpora and lexical entries in dictionaries) through a large collection of Latin lemmas, described as "canonical forms" following the Ontolex Lemon model.

Marco also showed the LiLa Knowledge Base in action, detailing the linking between a lemma of the collection of Latin canonical forms and its lexical entry in the Latin WordNet, thus presenting how the information of this specific lexical resource is modeled in LiLa.

Watch the recording Prefixes Matter. CLARIN and LLD in the light of the LiLa Knowledge Base on the CLARIN YouTube channel.

Discussion and future developments

Prior to the open discussion, Jorge Gracia and Christian Chiarcos gave a quick overview of the main issues and challenges that LLD is facing nowadays, in order to serve as a basis for later analysis and discussion in the CLARIN Café. In fact, despite LLD being a mature field, with a vibrant community behind and with great capabilities to solve re-usability and interoperability issues through FAIR data, a number of challenges have to be faced to enable a broader adoption. For instance: finding sustainable hosting solutions or lowering the entry barrier for language resource providers (and consumers). In that regard, large infrastructures, such as CLARIN ERIC and ELG, can play a role here. Hosting services, not limited to the SSH domain, may also be developed within the context of (European Open Science Cloud).

Other points raised for discussion included the challenges concerning the creation of common linked data vocabularies for differing linguistic theories and the existing and potential relationships between the Clarin Concept Registry (CCR) and linked data resources.

After this very promising first café follow up meetings are planned in order to set up a joint collaboration. In particular the organization of hands-on tutorials and activities around the topic of LLD in the context of CLARIN are envisaged.

Watch the recording LLD: Issues and open challenge on the CLARIN YouTube channel.

Watch the recording of the discussion on the CLARIN YouTube channel.

The presentation slides of this CLARIN Café can be found on the event page

Next CLARIN Cafés

To stay update on newly scheduled cafés you can consult the CLARIN news section, subscribe to the CLARIN Newsflashand follow CLARIN on Twitter (#CLARINcafe). The CLARIN Café page will also always provide the latest details.

If you want to receive an individual email for each virtual event organised by CLARIN you can subscribe to the Virtual Events Announcements mailing list.