Skip to main content

Standard recommendations

Several repositories point users to this page for advice on what formats are suitable for data submission. Given the decentralised nature of the network, and given that CLARIN's strength comes at least in part from combining various kinds of expertise offered by individual centres, no single list of "universally" recommended formats can exist. For the purpose of storing centres' preferences regarding data formats, the Standards Information System (SIS) has been set up.

The following are the possible ways of using the SIS in preparation for submitting your data for deposition at a CLARIN centre:

  1. access the centre's own list of format recommendations by choosing your centre from the list of centres in the SIS,
  2. check the list of most popular formats for the given purpose,
  3. check which centres are willing to accept data in the format that it is in right now (you can filter the list according to various criteria).

Solution (1) is the most effective as long as the centre has stored its preferences in the SIS. If you see a red warning on the information page of that centre in the SIS, do consider asking that centre to improve the listing -- the repository service is for users like you, and you have the right to access the list of recommendations well before you engage in the deposition process. (Making such a list available to users is a requirement for B-centres and courtesy (and common sense) in the case of C-centres).

Solution (2) is the next-best tactics. Given the variety of research interests among centres, it may happen that the format that is the most popular in general is nevertheless not fully recommended by your centre of choice, which might complicate the deposition process. Again, it is your right as a user to ask the centre to state its recommendations clearly (and, preferably, via the SIS -- be assured that the centre already knows that and may need just this one tiny nudge from you to address the issue).

Solution (3) is the most roundabout way, but it might turn out to be the most effective, by presenting the data depositor (= you) with an overview of centres that are willing to handle the given data format. Please bear in mind that centres that display a red warning about uncurated recommendations should be approached with a degree of caution, and perhaps even contacted before the deposition process is started.

 

For more information on the use of standards within CLARIN, see the following web pages:
 
The following document, put together by a team of experts in 2009, is included here mostly for its historical and sentimental value. It is probably the first overview of language research and technology standards recommended in the preparatory phase of the CLARIN project: "Standards for LRT" v6 (2009). If a centre points you to that document as providing information about "recommended data formats", do kindly consider doing yourself and them the favour of pointing out that the information value of that document is, nowadays, close to null (pieces of it have negative information value), and that the Standards Information System should be the way to go (again, the centre knows that and may just need a little nudge from you).