Programme CLARIN Annual Conference 2023

Event name: CLARIN Annual Conference 2023
Date: Monday, 16 October 2023 - Wednesday, 18 October 2023 (all times are CEST)
Location: Irish College Leuven, Leuven, Belgium
CLARIN 2023 | Proceedings

Conference Programme Outline

Lost in Meaning - Found in Translation

Jörg Tiedemann

University of Helsinki

Monday 16 October, 16:15 - 17:00

Ethical Issues of Generative AI

Laurence Devillers

University Paris-Sorbonne IV/LIMSI CNRS

Wednesday, 18 October, 11:00 - 11:45



Conference Programme Details

16:00 - 16:15
  • Conference Opening Session
  • Steven Krauwer Award
16:15 - 17:00

Keynote by Jörg Tiedemann

Lost in Meaning - Found in Translation: Natural Language Understanding with Multilingual Data (slides)

The task of translation involves language understanding and generation and, in this way, naturally combines the two essential challenges in computational linguistics and language technology. In the FoTran project, we are interested in the ability of neural translation models to pick up linguistic properties and to generalise to meaningful representations when trained on large amounts of multilingual data. Our focus is on the effect of linguistic diversity on abstraction and generalisation. In order to study this, we need to create the necessary resources and infrastructure. In this talk, I will first introduce the OPUS ecosystem that fuels our research. In the second part, I will concentrate on the experiments, studies and developments that this ecosystem enables within and outside of FoTran. I also welcome discussions on further directions that can be taken with the multilingual infrastructure we build, looking forward to your input.
17:00 - 18:00

Papers (Poster Format)

Linguistic Resources and Tools for Ukrainian: Grounds for Creating a K-Centre
Olha Kanishcheva and Maria Shvedova
The Making of the CLARIN Resource Family for Oral History: Lessons Learned from ‘Voices from Ravensbrück’ (poster)
Stefania Scagliola, Silvia Calamai, Henk Van Den Heuvel and Christoph Draxler
Libraries as Data Infrastructures
Martin Wynne, Andreas Witt, Leinen Peter and Sally Chambers
(CI) Workflow for Quality Assurance Checks for Corpora of Multimodal Interaction (poster)
Anne Ferger, André Frank Krause and Karola Pitsch: A Continous Integration 
The LiRI Corpus Platform (poster)
Jonathan Schaber, Johannes Graën, Daniel McDonald, Igor Mustač, Nikolina Rajović, Gerold Schneider and Noah Bubenhofer
DBBErt: Part-of-Speech Tagging of Pre-Modern Greek Text
Colin Swaelens, Els Lefever and Ilse De Vos
A Multilingual Database for Icelandic L2 Flashcards
Xindan Xu, Þórunn Arnardóttir and Anton Karl Ingason
Korpusnik: A Corpus Summarizing Tool for Slovene
Iztok Kosem, Jaka Cibej, Kaja Dobrovoljc and Simon Krek
Topics in Swedish News on Climate Change: A Timeline 2016 - 2023
Maria Skeppstedt
Sharing the Finnish Dark Web Marketplace Corpus  (FINDarC) (poster)
Krister Lindén, Teemu Ruokolainen, Lasse Hämäläinen and Tuomas Harvianen
Swissdox@LiRI – A Large Database of Media Articles Made Accessible to Researchers (poster)
Johannes Graën, Igor Mustač, Nikolina Rajović, Jonathan Schaber, Gerold Schneider and Noah Bubenhofer
Analyses of Information Security Standards on Data Crawled from Company Web Sites Using SweClarin Resources
Arne Jönsson, Subhomoy Bandyopadhyay, Svjetlana Pantic Dragisic and Andrea Fried
Building and Consolidating a FAIR-Compliant Ecosystem of Infrastructures
Cristina Grisot, Noah Bubenhofer, Andrea Malits, Stefanie Strebel, Johannes Graën and Stefan Buerli
Dynamically Chaining APIs: from Dracor to TEITOK
Maarten Janssen
The ACoDe Project: Creating a Dementia Corpus for Icelandic

 Elena Callegari, Anton Karl Ingason and Agnes Sólmundsdóttir

Emotion and Abstractness in Austrian Parliamentary Discourse
Tanja Wissik and Klaus Hofmann
Developing Manually-Annotated Corpora for Teaching and Learning Purposes of Brazilian Portuguese, Dutch, Estonian, and Slovene (the CrowLL Project)

 Tanara Zingano Kuhn, Carole Tiberius, Špela Arhar Holdt, Kristina Koppel, Iztok Kosem and Rina Zviel Girshin and Ana R. Luís

Dining room wing
18:30 - 19:30 Welcome Reception
Historic Town Hall
Grote Markt 9
19:30 - 22:00  Welcome Dinner
Tiensestraat 8

Day Two

Time Tuesday 17 October 2023 Room
09:00 - 09:10 Presentation by Programme Committee Chair (slides) Aula
09:10 - 09:15 Presentation by Local National Coordinator
09:15 - 10:00 Pitches by CLARIN Committees  (slides)
10:00 - 10:30 State of the Technical Infrastructure (slides) Aula
10:30 - 11:00 Coffee Break  
11:00 - 13:00

Thematic Session: Infrastructure

Chair: Jurgita Vaičenonienė

11:00 - 11:20
Standards Information System for CLARIN Centres and Beyond (slides)
Piotr Banski and Eliza Margaretha Illig
11:20 - 11:40

The CLARIN:EL Infrastructure (slides)

Maria Gavriilidou, Stelios Piperidis, Dimitrios Galanis, Juli Bakagianni, Penny Labropoulou, Athanasia Kolovou, Dimitris Gkoumas, Miltos Deligiannis, Kanella Pouli, Iro Tsiouli, Leon Voukoutis and Katerina Gkirtzou
11:40 - 12:00
NB DH-LAB: A Corpus Infrastructure for Social Sciences and Humanities (slides)
Magnus Breder Birkenes, Lars G. Johnsen and Andre Kåsen
12:00 - 12:20 
CORLI CLARIN K-Centre: Development and Perspectives (slides)
Christophe Parisse and Céline Poudat
12:20 - 12:40
The SSH Open Marketplace and CLARIN (slides)
Alexander König, Laure Barbot, Cristina Grisot, Michael Kurzmeier and Edward J. Gray
12:40 - 13:00
CLARIN-IT: Texts, Documents and New Contexts (slides)
Federico Boschetti, Angelo Mario Del Grosso, Riccardo Del Gratta, Francesca Frontini and Monica Monachini
11:00 - 13:00
Teachers' workshop: Using CLARIN in Training and Education (slides)
Click on Details to view the programme. For more information about the abstracts, please visit the workshop programme page.

11:00 - 12:00  Presentations of Accepted Abstracts 

11:00 - 11:10  Welcome and Introduction 
Francesca Frontini
11:10 - 11:20 Privacy by Design in Linguistic Research
Henk van den Heuvel

11:20 - 11:30 Teaching Syntax with CLARIN Corpora and Resources 

Antonio Balvet

11:30 - 11:40 Learning Programming in Python for Linguistics and Language Studies

Koenraad De Smedt

11:40 - 11:50 NLP Annotation for Digital Scholars 

Maarten Janssen and Silvie Cinková 
11:50 - 12:00 DH-Course Registry: A Bridge Between Infrastructures, DH Masters Degrees and Industry? 
Amelia Sanz, Vicky Garnett, Tom Gheldof, Adeline Joffres, Iulianna van der Lek, Edward Gray,

12:00 - 12:10 Discussion

12:10 - 13:00 Demo of the CLARIN Learning Content in the UPSKILLS project 

12:10-12:20 Introduction to the UPSKILLS Project 
Stavros Assimakopoulos 
12:20 -12:35 Introduction to Language Data: Standards and Repositories  
Iulianna van der Lek 
12:35 -12:50 Automatic Speech Recognition and Force Alignment 
Louis ten Bosch 

12:50 - 13:00 Discussion & Wrap-Up

13:00 - 13:45 Lunch
13:30 - 14:30 PhD Poster Session Dining room wing
14:30 - 15:30

Thematic Session: ParlaMint

Chair: Maciej Piasecki

14:30 - 14:50

The ParlaMint Project: Ever-Growing Family of Comparable and Interoperable Parliamentary Corpora (slides)

Maciej Ogrodniczuk, Petya Osenova, Tomaž Erjavec, Darja Fišer, Nikola Ljubešić, Çagrı Çöltekin, Matyáš Kopp, Katja Meden and Taja Kuzman

14:50 - 15:10

Workflow and Metadata Challenges in the ParlaMint Project: Insights from Building the ParlaMint-UA Corpus (slides)

Anna Kryvenko and Matyáš Kopp
15:10 - 15:30

Adding Political Orientation Metadata to ParlaMint Corpora (slides)

Tomaž Erjavec, Katja Meden and Jure Skubic
15:30 - 16:00 Coffee Break
16:00 - 17:20

Thematic Session: Tools

Chair: Vincent Vandeginste

16:00 - 16:20

MATEO: Machine Translation Evaluation for Users and Developers (slides)

Bram Vanroy
16:20 - 16:40
Domain-Specific Languages for Epigraphy: The Case of ItAnt (slides)
Luca Rigobianco, Federico Boschetti and Valeria Quochi
16:40 - 17:00

Finding Dutch Multiword Expressions (slides)

Jan Odijk, Martin Kroon, Tijmen Baarda, Ben Bonfil and Sheean Spoel
17:00 - 17:20
Automatic Anonymisation of Human Faces in Images of Authentic Social Interaction: A Web Application (slides)
André Frank Krause, Anne Ferger and Karola Pitsch
17:30 - 19:00 Bazaar Poster Session Dining room wing
19:30 - 22:30 Conference Dinner
Faculty Club
Groot Begijnhof 14

Day Three

Time Wednesday 18 October 2023 Room
09:00 - 10:20

Thematic Session: Corpora

Chair: Tomaž Erjavec

09:00 - 09:20
A Spoken Academic Belgian Dutch Corpus (slides)
Vincent Vandeghinste, Jolien Mathysen, Patrick Wambacq and Elke Peters
09:20 - 09:40
NGT-HoReCo and GoSt-ParC-Sign: Two New Sign Language - Spoken Language Parallel Corpora (slides)
Mirella De Sisto, Dimitar Shterionov, Lien Soetemans, Vincent Vandeghinste and Caro Brosens
09:40 - 10:00
Teaching Syntax with Clarin Corpora and Resources (slides)
Antonio Balvet
 10:00 - 10:20
A New CLARIN Resource Family for Lexical Semantic Change Research (slides)
Paola Marongiu, Fahad Khan and Barbara McGillivray
10:20 - 11:00 Group Photo and Coffee Break
11:00 - 11:45

Keynote by Laurence Devillers

Ethical Issues of Generative AI (slides)

In this keynote, I offer studies and reflections on the ethical issues of generative artificial intelligence (AI). The special feature of generative artificial intelligence systems is that they are based on generative models that can produce multiple outputs: generation of text or images for various purposes such as translation, production of computer code, chatbots, decision support and so on. These models, pre-trained on large datasets, can be optimised to produce a new application using little additional data specific to that task. The social and economic impact of generative AI systems is likely to be major in many potential uses, for example, in the environment or in healthcare. However, these generative AI systems raise many ethical, epistemological, anthropological, psychological, economic, social, political and cultural questions. Some of these issues will continue to occur as these technologies are put to new uses, and it is not yet possible to predict all the effects they will have on individuals and society. Since the end of 2022, economic and political actors in several countries have been discussing the impact of language models built with these generative AI systems. Some of these models have an impressive number of parameters. The race for the largest model is ongoing, but it is not certain that larger models would deliver higher performance. I was involved as a co-writer of the opinion n°7 of the ethical issues of generative artificial intelligence in the CNPEN (National Pilot Committee for Digital Ethics). In this opinion, CNPEN focuses on the most important ethical issues in light of current experience with generative AI systems, mainly on language models.
11:45 - 12:45

Thematic session: Metadata and Annotations

Chair: Andreas Witt
11:45 - 12:05

 Documenting Corpus Annotation in CMDI: State of Affairs (slides)

Jakob Lenardič
12:05 - 12:25
Do Chatbots Dream of Copyright? Copyright in AI-generated Language Data (slides)
Pawel Kamocki, Toby Bond, Krister Lindén and Thomas Margoni
12:25 - 12:45
Between Lexicon and Grammar: Towards Integrated Valencies for Bulgarian (slides)
Petya Osenova and Kiril Simov
12:45 - 13:00
  • Best PhD Poster Award
  • Closing Remarks (slides)
13:00 - 14:00 Lunch
14:00 - 16:00  SAB Meeting
Board room
14:00 - 17:00
K-Centre Workshop (Part I) (Invite-only)

Annual workshop for K-centre representatives, see the event page.
SSH Open Marketplace Workshop  (cancelled)

This workshop aims at supporting researchers interested in creating a workflow in the SSH Open Marketplace. Following a brief presentation of what the SSH Open Marketplace is and how it works, participants will be supported by members of the Editorial Board of this discovery portal to write and document their research scenarios, based on the use of CLARIN tools, services and data - for example the CLARIN Resource Families or tools from the Language Resource Switchboard. Workflows are an ideal way to share one’s research resources, and harness the power of the SSH Open Marketplace to contextualise tools and services with publications, datasets, and training resources, thus presenting a research activity from A to Z in an easy to follow and reproducible way.
EuReCo Workshop (Invite-only)

The EuReCo workshop brings together representatives of National Corpora from CLARIN countries. Its aim is to explore the possibilities of launching an initiative toward a large multilingual and distributed reference corpus for European languages that would connect these existing resources. Such an initiative could potentially develop into a new CLARIN flagship project. It would enable linguists to explore corpora of different languages, especially annotated ones, by means of the CLARIN infrastructure. Eventually, this project could lead to the creation of a large comparable corpus of European languages accessible through a single access point. For more details, including the agenda, please refer to this link.

You can find the agenda via this link.
  • CR2
  • CR7
  • Aula

Day Four

Time Thursday 19 October 2023 Room
09:00 - 13:00 K-Centre Workshop (Part II) (Invite-only) CR2