CLARIN Café - ParlaMint

General Information

This CLARIN Café is organised by Maciej Ogrodniczuk (Polish Academy of Sciences), Petya Osenova, (Bulgarian Academy of Sciences), and Tomaž Erjavec (Jožef Stefan Institute). 

The CLARIN host is Darja Fišer, CLARIN ’s Executive Director.

  • Date: 30/01/2024
  • Time: 14:00 - 16:00 (CEST)
  • Venue: CLARIN virtual Zoom meeting
ParlaMint was a CLARIN Flagship project which focused on the creation of comparable and uniformly annotated parliamentary debates in Europe. The project produced several releases of the corpora, with the latest being ParlaMint 4.0, which contains a set of 29 corpora, together containing over 1.1 billion words. The corpora are available in three versions: the base TEI encoded corpora, the linguistically annotated variant, and machine translated into English with semantic tags. In addition, the corpora are available in concordancers noSketch, kontext and teitok.

The corpora were built according to the Parla-CLARIN TEI recommendation, but following the much stricter ParlaMint encoding guidelines and schemas.

The transcriptions contain speeches marked by the speaker and their role and also contain the marked-up transcriber comments. The corpora have extensive metadata, most importantly on speakers (name, gender, MP and minister status, party affiliation), the political parties and parliamentary groups (name, coalition/opposition status, Wikipedia-sourced left-to-right political orientation, and CHES variables). The linguistic annotation includes tokenization; sentence segmentation; lemmatisation; Universal Dependencies part-of-speech, morphological features, and syntactic dependencies; and the 4-class CoNLL-2003 named entities.

The ParlaMint project was the first one of its kind that made the national and regional parliaments ‘talk’ to each other and to the interested parties from the fields of Natural Language Processing and Digital Humanities.

The presentations will cover the following topics: an overview of the project, the specifics of the ParaMint 4.0 corpora, the specifics of the machine translated and semantically tagged version, various experiences in building a ParlaMint corpus, the utility of the ParlaMint corpora and some inspiring impact stories, the future of the ParlaMint spirit.


How to Join

You can register for free using this link in order to receive the meeting room details. 



14:00 - 14:05 Opening and CLARIN 1-0-1 (Francesca Frontini, Member of the CLARIN Board of Directors)

14.05 - 14:15 Introduction to ParlaMint (Maciej Ogrodniczuk and Petya Osenova)

14.15 - 14.25 ParlaMint 4.0 corpora (Tomaž Erjavec)

14.25 - 14.30 Adding metadata (Katja Meden and Jure Skubic)

14.30 - 14.35 -ed version (Nikola Ljubešić and Taja Kuzman)

14.35 - 14.40 Semantic tagging (Paul Rayson)

14.40 - 14.45 Impact story from a Computational Linguistics point of view (Bojan Evkoski)

14.45 - 14.50 Talking War: Keeping the Past Alive in the Parliaments of former Yugoslavia (Michal Mochtak)

14.50 - 15.00 The Catalan ParlaMint corpus (Nuria Bel)

15.00 - 15.10 The Hungarian ParlaMint corpus (Noémi Ligeti-Nagy)

15.10 - 15.20 The Austrian ParlaMint corpus (Tanja Wissik and Hannes Pirker)

15.20 - 16.00 Q&A


Recording and Slides