CLARIN Workshop at Corpus Linguistics 2023 Conference

Submitted by Karina Berger on 3 August 2023

Written by Paul Rayson

On Sunday 2nd July 2023, CLARIN members hosted an in-person workshop titled ‘What can you do with the CLARIN research infrastructure?’. This took place on the day before the Corpus Linguistics (CL2023) conference began at Lancaster University (UK). The workshop focused on practical issues in terms of how corpus linguists can benefit from the CLARIN network and infrastructure. It was organised by Darja Fišer, Francesca Frontini, Paul Rayson and Martin Wynne, and included a special preview session on ParlaMint, presented by Dario Del Fante. The workshop attracted 25 participants.

The three-hour long workshop began with a welcome from Paul Rayson from the UCREL research centre at Lancaster University, who is also a CLARIN ambassador, who introduced the participants to the aims of the event. Next, Martin Wynne from Oxford University and National Coordinator of CLARIN-UK, gave a broad overview of the CLARIN infrastructure, covering the distributed network of over 60 centres of expertise, Resource Families, and key FAIR principles, including the interoperability of language resources, corpora, metadata and tools.

Next, Dario Del Fante from the University of Ferrara (member of the Italian node of CLARIN), presented the flagship CLARIN project, ParlaMint, which is creating comparable multilingual corpora of parliamentary debates across Europe. This vast resource forms part of CLARIN’s Parliamentary Corpora Resource Family, and brings together transcripts of parliamentary proceedings from multiple countries under a common XML schema format, with Universal Dependencies annotation for morphosyntax, named entities and semantic tagging. Workshop participants were presented with potential use cases in political sciences, digital humanities as well as corpus linguistics, and were assisted in exploring several practical activities, including a corpus-assisted gender analysis and the parliamentary language around crises. The workshop happened a few days before the full open-access release of ParlaMint 3.0.

Finally, in the second half of the workshop, Martin, Dario and Paul supported the participants as they further explored their chosen case studies, which were selected from a wide set of hands-on activities related to CLARIN’s Virtual Language Observatory, Switchboard, and Federated Content Search. These included extended analyses of gender in parliamentary discourse, creating a linguistically annotated corpus of 19th century English novels, how to search for linguistic patterns across corpora, as well as advance access to UPSKILLS learning content prepared by Iulianna van der Lek, Training and Education Officer at CLARIN .

All slides and handouts are available on the workshop web page.