General Information

  • 21-22-23 June 2022: Main Conference
  • 20-24-25 June 2022: Workshops & Tutorials


LREC is the major event on Language Resources (LRs) and Evaluation for Human Language Technologies (HLT). The conference provides an overview of the state-of-the-art regarding LRs and their applications. Participants can exchange information, discuss methodologies, industrial use cases and requirements coming from e-science and e-society, with respect to scientific and technological issues as well as policy and organisational ones.

CLARIN-related activities at LREC 2022

Contributions to the Main Conference


ParlaCLARIN III Workshop  – organised by CLARIN ERIC

Monday 20 June, from 9:00 to 13:00 and from 14:00 to 18:00

Palais du Pharo, Old Palace Level 1, Room: Grand Large (floor map)

The ParlaCLARIN III workshop at LREC2022 will focus on the topic of ‘Creating, Enriching and Using Parliamentary Corpora’. Parliamentary (language) data serves as a communication channel between elected political representatives and members of society, thus reflecting socio-politically relevant information. The development of accessible, comprehensive and well-annotated parliamentary corpora is crucial for a number of disciplines, such as political science, sociology, history, and (socio)linguistics. The workshop will bring together developers, curators and researchers of regional, national and international parliamentary debates from across diverse disciplines in the humanities and social sciences.

LEGAL 2022: Legal and Ethical Workshop – co-organised by Ingo Siegert, Khalid Choukri, Mickaël Rigault, Paweł Kamocki, Andreas Witt, Krister Lindé

Friday 24, from 9:00 to 13:00 and from 14:00 to 18:00

Pharo Old Palace Level 2, Room: Mucem (floor map)

Deep learning technologies for language resources and the demand for high-quality data interactions have increased the need for data collections, which are largely subject to legal constraints. Legal frameworks continuously need to adapt to the advancements in technology, while also taking into consideration the interests of stakeholders. This workshop invites technology and legal experts to discuss current legal and ethical issues concerning human language technology.

SIGUL 2022 Workshop – organised by CLARIN-IT

Friday 24 Saturday 25, from 14:00 to 18:00

Palais du Pharo, Old Palace Level 1, Room: Grand Large (floor map)

The first annual meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022) will take place as part of the LREC2022 conference. The workshop will provide academic and industry researchers with a forum for networking, as well as discussing and presenting cutting-edge research in the sector of natural language processing for under-resourced languages. In the tradition of the CCURL-SLTU Workshop Series, SIGUL 2022 spans the research interest areas of less-resourced, under-resourced, endangered, minority and minoritised languages.

The 4th Financial Narrative Processing Workshop (FNP 2022) –co-organised, among others, by CLARIN ambassador Paul Rayson

Friday 24 June, from 9:00 to 13:00 and from 14:00 to 18:00

Palais du Pharo, Old Palace Level 1, Room: Estaque (floor map)

Oral and Poster Presentations

Day 1, Tuesday 21 June
11:40- 13:00 (Poster Area 1) 
Session P1: Language Resource Infrastructures and Policy issues. Chair: Labropoulou, Penny
Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project (Federica Gamba, Francesca Frontini, Daan Broeder and Monica Monachini) 
15:15 - 16:35 (Auditorium)
Session O5: Language Resource Policies and Management.
Chair: Di Persio, Denise, Co-Chair: Frontini, Francesca
 Ethical Issues in Language Resources and Language Technology – A Tentative Categorisation (Paweł Kamocki and Andreas Witt) 
16:55 - 18:15 (Poster Area 1)
Session P12: Evaluation and Validation Methodologies (1)
Chair: Refaee, Eshrag Ali A.
The Subject Annotations of the Danish Parliament Corpus (2009-2017) - Evaluation with Automatic Multi-label Classification. (Costanza Navarretta and Dorte Haltrup Hansen)
16:55 - 18:15  (Poster Area 1) 
Session: P10 - Lexicons (1)  
Chair: Olsen, Sussi
Making a Semantic Event-type Ontology Multilingual 
Zdenka Uresova, Karolina Zaczynska, Peter Bourgonje, Eva Fučíková, Georg Rehm, Jan Hajic
Charles University, German Research Center for Artificial Intelligence, 3Morningsun Technology, DFKI
NomVallex: A Valency Lexicon of Czech Nouns and Adjectives 
Veronika Kolářová, and Anna Vernerová, Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University
Day 2, Wednesday 22 June
9:30 - 10:50 (Poster Area 2)
Session P14: Corpora and Annotation (2) 
Chair: Ogrodniczuk, Maciej
11:10 - 12:30 (Poster Area 1)
Session P18: Corpora and Annotation (3)  
Chair: Montemagni, Simonetta
Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus (Starkaður Barkarson, Steinþór Steingrímsson, Hildur Hafsteinsdóttir)
15:15 - 16:35 (Poster Area 2)
Session P22: Lexicons (2)  
Chair: Yildiz, Olcay Taner
Constructing a Lexical Resource of Russian Derivational Morphology (Lukáš Kyjánek, Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky and Zdeněk Žabokrtský)
15:15 - 16:35 (Poster Area 2)
Session P26: Dialogue and Conversational Systems (2)
Chair: Hartholt, Arno
ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech (Anna Nedoluzhko, Muskaan Singh, Marie Hledíková, Tirthankar Ghosal and Ondřej Bojar)
15:15 - 16:35 (Poster Area 2)
Session: P24 - Evaluation and Validation Methodologies (2) 
Chair: Zeldes, Amir
Quality and Efficiency of Manual Annotation: Pre-annotation Bias
Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková, Jan Hajic
Charles University, Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics
16:55 - 18:35 (Poster Area 1)
Session P27: Corpora and Annotation (4)
Chair: Pęzik, Piotr  
The Bulgarian Event Corpus: Overview and Initial NER Experiments (Petya Osenova, Kiril Simov, Iva Marinova and Melania Berbatova)
Day 3, Thursday 23 June
9:30 - 10:50 (Salle 120)
Session O31: Document Classification, Text Categorisation
Chair: Volk, Martin 
Co-Chair: Zhang, Mike
HeLI-OTS, Off-the-shelf Language Identifier for Text (Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén)
9:50 - 10:10 (Salle 92)
Session O32: Lexicon and WordNet 
Chair: Vossen, Piek 
Co-Chair: Frontini, Francesca
Towards the Construction of a WordNet for Old English (Fahad Khan, Francisco J. Minaya Gómez, Rafael Cruz González, Harry Diakoff, Javier E. Diaz Vera, John P. McCrae, Ciara O'Loughlin, William Michael Short and Sander Stolk)
15:15 - 16:35 (Poster Area 2)
Session: P38 Less-Resourced Languages (2) 
Chair: Soroa, Aitor
Latvian National Corpora Collection – (Baiba Saulite, Roberts Darģis, Normunds Gruzitis, Ilze Auzina, Kristīne Levāne-Petrova, Lauma Pretkalniņa, Laura Rituma, Peteris Paikens, Arturs Znotins, Laine Strankale, Kristīne Pokratniece, Ilmārs Poikāns, Guntis Barzdins, Inguna Skadiņa, Anda Baklāne and Valdis Saulespurēns)
15:35 - 15:55 (Salle 120)
Session O37: Anaphora and Coreference
Chair: Magnini, Bernardo
Co-Chair: De Bruyne, Luna
CorefUD 1.0: Coreference Meets Universal Dependencies (Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes and Daniel Zeman)
Friday 24 June (remote)
Session: R2 - Corpora and Annotation
Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support. (António Branco, João Silva, Luís Gomes, João Rodrigues)

Contributions to Co-Allocated Events

Oral and Poster Presentations at Co-allocated Workshops

Monday 20 June
  • Immigration in the Manifestos and Parliament Speeches of Danish Left and Right Wing Parties between 2009 and 2020 (Costanza Navarretta, Dorte Haltrup Hansen and Bart Jongejan; Accepted at ParlaCLARIN III)
  • What if Ground Truth is Subjective? Personalized Deep Neural Hate Speech Detection (Kamil Kanclerz, Marcin Gruza, Konrad Karanowski, Julita Bielaniewicz, Piotr Milkowski, Jan Kocon and Przemyslaw Kazienko; Accepted at NLP Perspective workshop)
  • StudEmo: A Non-aggregated Review Dataset for Personalized Emotion Recognition (Anh Ngo, Agri Candri, Teddy Ferdinan, Jan Kocon and Wojciech Korczynski; Accepted at NLP Perspective workshop)
Friday 24 June
  • Advantages of a complex multilayer annotation scheme: The case of the Prague Dependency Treebank. (Eva Hajičová, Marie Mikulová, Jiří  Mírovský, Barbora Štěpánková; accepted at LAW workshop)
  • 9:30–9:50 Extending the SSJ Universal Dependencies Treebank for Slovenian: Was it Worth it? (Kaja Dobrovoljc and Nikola Ljubešić; Accepted at LAW XVI The 16th Linguistic Annotation Workshop)
  • 11:40 - 12:40 Advantages of a complex multilayer annotation scheme: The case of the Prague Dependency Treebank (Eva Hajicova, Marie Mikulová, Barbora Štěpánková and Jiří Mírovský; Accepted at LAW XVI The 16th Linguistic Annotation Workshop)

CLARIN Booth at LREC2022

CLARIN will be present throughout the whole conference with a booth, you can visit us to get to know CLARIN better, to talk to people from the CLARIN network or browse through our latest publications. At the booth you will also have the possibility to watch some tutorial videos like the CLARIN and EOSC and CLARIN and notebooks ones.
Booth Attendance Schedule
  Tuesday 21 Wednesday 22 Thursday 23
Morning coffee break
11:20 - 11:40
Members of CLARIN ERIC Board of Directors
  • Franciska de Jong
  • Dieter Van Uytvanck

11:50 - 11:10

Kaja Dobrovoljc

Dedicated to the paper ‘Spoken Language Treebanks in Universal Dependencies: an Overview’ (Kaja Dobrovoljc)

Petya Osenova, Kiril Simov

Dedicated to the paper ‘The Bulgarian Event Corpus: Overview and Initial NER Experiments' (Petya Osenova, Kiril Simov, Iva Marinova and Melania Berbatova)

Fahad Khan 

Dedicated to the paper
Towards the Construction of a WordNet for Old English (Fahad Khan, Francisco J. Minaya Gómez, Rafael Cruz González, Harry Diakoff, Javier E. Diaz Vera, John P. McCrae, Ciara O'Loughlin, William Michael Short and Sander Stolk)

13:00 - 14:30 Lunch
Francesca Frontini, Monica Monachini 
Dedicated to the paper ‘Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project’ (Federica Gamba, Francesca Frontini, Daan Broeder and Monica Monachini)

Starkaður Barkarson

Dedicated to the paper ‘Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus' (Starkaður Barkarson, Steinþór Steingrímsson, Hildur Hafsteinsdóttir)

Tommi Jauhiainen
Dedicated to the paper 
HeLI-OTS, Off-the-shelf Language Identifier for Text
Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén
University of Helsinki
Afternoon coffee break Paweł Kamocki 

Dedicated to the paper ‘Ethical Issues in Language Resources and Language Technology – A Tentative Categorisation’ (Paweł Kamocki and Andreas Witt)

