One of the fundamental services of the CLARIN infrastructure is making sure that language resources can be archived and made available to the community in a reliable manner. To help researchers to store their resources (e.g. corpora, lexica, audio and video recordings, annotations, grammars, etc.) in a sustainable way, many of the CLARIN centres offer a depositing service. They are willing to store the resources in their repository and assist with the technical and organisational details. This has a wide range of advantages:
- Long-term archiving: a storage guarantee can be given for a long period (up to 50 years in some cases)
- Resources can be cited easily with a persistent identifier
- The resources and their metadata will be integrated into the infrastructure, making it possible to search for them efficiently
- Password-protected resources can be made available via an institutional login
- Once resources are integrated in the CLARIN infrastructure, they can be analysed and enriched more easily with various linguistic tools (e.g. automated part-of-speech tagging, phonetic alignment or audio/video analysis).
Centres Offering Depositing Services
The following certified CLARIN B-centres offer depositing services:
Centre | Location | Depositing offer |
---|---|---|
ACDH-CH | Austria | Any linguistic and/or NLP data and tools |
LINDAT-CLARIAH/CZ | Czech Republic | Any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, machine translation systems, web services, etc. |
LINDAT-CLARIAH/CZ | Czech Republic | Language Resource Inventory: An easy-to-use inventory for language resources (and tools), which allows you to browse through submissions and submit metadata. It differs from other depositing services in that it does not require users to upload (meta)data and that it can be used immediately, without contacting the host first. |
CLARIN-DK-UCPH | Denmark | Danish language resources: The focus is on written and spoken language resources. Possible to deposit: Text corpora and texts with annotations, imdi-sessions containing audio, video and annotations of these resources, together with lexicons and other data. |
CELR | Estonia | Estonian language resources: Texts, corpora, audio and video recordings, lexical data, terminologies, tools for , etc. |
ORTOLANG | France | Archiving of oral and linguistic data. |
FIN-CLARIN | Finland | All language resources related to Finnish, Finland Swedish and the Fenno-Ugric languages, as well as other language resources created in Finland. |
BAS | Germany | Corpora of spoken languages which contain a minimum of at least one measured signal that is based on the physical processes of speech production (e.g. acoustic signals, videos, series of measurements, series of pictures). |
BBAW | Germany | German corpora or parallel corpora, historical prints and manuscripts (in German), lexical resources (also in German). |
IDS | Germany | Resources on the German language. |
IMS | Germany | Language resources (e.g. corpora, treebanks, lexical resources) and NLP tools (also language models, web services etc.); special focus on domain adaptation. |
SFS | Germany | All language resources. |
UdS | Germany | Multilingual corpora (parallel, comparable) and corpora including specific registers. |
CLARIN:el | Greece | Greek language resources. |
ILC4CLARIN | Italy |
Depositing services for language datasets and tools for research, especially for Italian and classical languages.
|
Dutch Language Institute | Netherlands | Dutch and Belgian Dutch language resources. |
Meertens Instituut/HuC | Netherlands | Resources pertaining to Dutch language and culture. |
The Language Archive | Netherlands | All language data, in particular to data related to the languages and cultures of small and endangered speech communities. |
CLARINO Bergen Centre | Norway | Depositing services for language datasets and tools for research. |
CLARIN-PL | Poland | Polish language resources. |
PORTULAN CLARIN | Portugal | Language data and tools: corpora, treebanks, lexica, language models, parsers, taggers, etc. |
CLARIN.SI | Slovenia | Any linguistic and/or NLP data and tools. |
Other Organisations
These organisations also offer reliable and largely compatible depositing services:
Centre | Location | Depositing offer |
---|---|---|
Language Archive Cologne | Germany | All audio and audio-visual language resources, in particular from endangered and under-resourced languages, as well as recordings of oral literature. |
ERCC | Italy | The Eurac Research CLARIN Centre (ERCC) is a dedicated repository for language data hosted by the Institute for Applied Linguistics (IAL), Eurac Research. Focus on learner corpora, CMC data and language variety data. |
Netherlands | All digital research data. | |
Språkbanken | Norway | C-centre with depositing services for language datasets for R&D involving Norwegian (Bokmål, Nynorsk) or official minority languages in Norway (Sami, Kven). |
TROLLing | Norway | The Tromsø Repository of Language and Linguistics (TROLLing), a repository of data, code, and other related materials used in linguistic research. |
Oxford Text Archive | UK | Electronic literary and linguistic resources. |
If your centre or repository is not listed here and you would like it to be added, please get in touch by sending a mail to clarin [at] clarin.eu (clarin[at]clarin[dot]eu).