Skip to main content

New CLARIN Resource Family: Corpora of Disordered Speech

Submitted by l.gusan@uu.nl on

The CLARIN Resource Families (CRF) provide user-friendly overviews of available language resources in the CLARIN infrastructure for researchers from the digital humanities, social sciences and human language technologies. Many CRF are datasets of various types, including corpora, lexicons and software applications and tools. Until now, a missing resource family was the Corpora for Speech with Disorders (CSD), or the corpora with speech from individuals with language and speech disorders.

CSD are invaluable resources for education and research. However, they are costly, hard to build, and can be difficult to share given various issues, such as the preservation of privacy and confidentiality of the participants, as well as the possible extra work and cost required for formatting the datasets for comparable sharing and hosting in a repository. Overcoming these challenges is important, as sharing data enables better science in the future. Additionally, re-analysis of raw data fosters improvement in the reproducibility and robustness of research.

This CRF was set up by the members of the DELAD Steering group.  We made an inventory of the material (datasets and resources) offered through DELAD and CLARIN centres with expertise in CSD. For DELAD we consulted our members for any updates. For TalkBank, we concentrated on the relevant resources in the TalkBanks Clinical Banks. Further, we inspected any other relevant datasets in the CLARIN’s Virtual Language Observatory ( ), the ELRA catalogue, and ELRA’s LRE Map. Finally, we inspected potential candidates to be found in other resource families. We prepared an online questionnaire and asked potential contributors to submit information regarding each dataset that could be registered as a resource. Moreover, we included details that were required for the CLARIN resource family listing. These were details regarding the name, corpus URL (if available), description, speaker and disorder characteristics, language and size of the dataset, possible annotations and key publications.

Please feel encouraged to contact us with your feedback and new resources to add!

Contact persons: N.Bessell [at] ucc.ie (Nicola Bessell) anda.lee [at] ucc.ie ( Alice Lee) (DELAD, K-centre ACE)

 

Useful information:

  • Paper in CLARIN Annual Conference 2024: Henk Van den Heuvel, Nicola Bessell, Katarzyna Klessa, Alice Lee, Satu Saalasti and Eric Sanders - A CLARIN Resource Family for Corpora of Communication Disorders. [link to be added]