Tour de CLARIN: Interview with Amalia Todirascu

Submitted by Karina Berger on 4 February 2021

Amalia Todirascu is a computational linguist who specialises in Natural Language Processing ( ). She is a member of the steering committee of CORLI, a group of experts in linguistics, and has successfully used CLARIN language technologies in teaching and research. The interview was conducted via email.

Please introduce yourself. Could you describe your academic background and your current academic position?

I have been trained in computer science with a specialisation in NLP. My thesis was about building ontologies from texts. My work is at the crossroads of NLP and linguistics. For instance, I was part of the ALECTOR project, which aimed to measure the complexity of a text and how to simplify it to facilitate access by children with dyslexia. I’m also involved in the adaptation of NLP resources for education, specifically for language learners.

I am now specialised in the construction and preparation of linguistic resources for automatic translation, semantic and discourse annotation etc. For instance, I was part of the team that built the DEMOCRAT corpus, which provides a large manually annotated corpus of coreference, but also develops specialised tools for automatic coreference detection (cofr), for manual coreference annotation (SACR) and for corpus exploration (TXM extensions for handling coreference annotated corpora). In 2004, I joined the LiLPa laboratory, which focuses on linguistics, phonetics, language learning, sociolinguistics and computational linguistics.

How did you get to know CLARIN?

I already discovered CLARIN a long time ago in the context of a collaborative project for collocation detection in German, Romanian and French. My German (Ulrich Heid) and Romanian colleagues (Dan Tufis) were very active in CLARIN at that time (2008/2009) and I was invited to attend a workshop in Berlin organised by CLARIN-DE. I was also able to participate in different workshops organised by the German CLARIN consortium.

At that time, France was not officially involved in CLARIN activities, but some people, like Jean-Marie Pierrel (ORTOLANG founder) and Laurent Romary (former director of DARIAH) were already very active in disseminating information about CLARIN among French researchers.

What is your role in the French consortium?

I am a member of the steering committee of CORLI, a group of experts in linguistics funded by Huma-Num, which is the cornerstone of the French CLARIN-FR consortium.

As a committee member, I was able to be actively involved in the establishment of the recently established CORLI Knowledge Center. I think the centre is important at a national level because it will provide help and training activities related to the use of French linguistic tools and resources, such as those that are offered by the COCOON and ORTOLANG Centres to different research communities in France.The K-center also aims to give more visibility to French resources and tools, especially by organising working groups aiming to create and moderate research networks that target tools and practices in French linguistics.

As a teacher, I realise that it is necessary to sensitise doctoral students and even Master's students to the tools and practices around digital technologies as early as possible. For this purpose, the K-center is an opportunity to generalise digital practices and the general philosophy of CLARIN.

What CLARIN tools and resources do you use in your own work?

I mostly use CLARIN resources in Master’s courses in linguistics and language technologies. CLARIN is particularly important in my introduction to corpus linguistics class, where I use resources (Language Resource Inventory, Virtual Language Observatory) to find corpora, but also tools like WebLicht to show students how they can search for information in a corpus, as well as how to build a simple corpus from scratch. I also use them to illustrate the usefulness of annotations and how to encode corpora in TEI.

WebLicht in particular is a very user-friendly tool for showing what can be done with language tools to students and young researchers that lack experience with NLP or computational linguistics. To give a simple example, the fact that you can annotate simple formats, like .docx files produced by Word, with most tools that are offered on the CLARIN Switchboard has proven itself to be crucial from the perspective of accessibility.

Generally, the tools and resources provided by CLARIN are very interesting not only for linguists, but also for other researchers in social science. For example, a historical corpus (The Chronicles of Jean Froissart about the first part of the Hundred Years’ War) was annotated with persons, organisations or place named entities. This information was used to retrieve all the occurrences of an entity in the corpus and to study the relations to the organisations, places or persons represented in these texts.

However, it must be taken into account that there is a lack of NLP tools for French in the CLARIN infrastructure; for instance, currently, only UDPipe, so one out of eight tools on the Switchboard, offers parsing for French. Going forward, I believe it necessary that CLARIN-FR identifies the missing tools, such as keyword extractor, terminology extractors, coreference annotator or NER tools, dedicated to French and integrate them into existing toolchains like WebLicht in order to facilitate their dissemination and use.

What do you think needs to be developed to enrich CLARIN and make it better known within the French communities?

First, we need to develop more training materials that showcase the use of CLARIN services, especially the language tools and how they can be applied to the resources in repositories like ORTOLANG in the case of France. For this purpose, it would be a good idea to invite external collaborators from the CLARIN network that have already prepared and used such materials.

Additionally, a good idea would be to for Huma-Num and the K-centre to organise workshops or webinars about the use of tools for French, but with a focus on practical examples that pertain to applied fields in digital humanities and social sciences; for instance, how to choose the right tools for 'named entity recognition' for a French text in geography. SSH communities would also benefit from CLARIN-FR which promotes the adoption of good practices, like the use of standards.

For young researchers (but not only them), the French consortium should make better use of mobility grants proposed by CLARIN to allow them to discover other research done at the other consortia, thus strengthening cross-borders collaboration.

An important point, which is perhaps unique to the situation in France, is to facilitate exchange between different research communities dealing with linguistics, which are still quite uncoordinated.