Skip to main content

CLARIN DSpace Workshop

Submitted by Pavel Straňák on

 

The CLARIN DSpace (formerly LINDAT DSpace) is a digital repository system meeting the requirements put on a modern, flexible and CLARIN compatible digital repository. Daily, it serves many GBs of linguistic data across different CLARIN centres. By November 2016 it has been deployed in eight CLARIN member countries and it is being evaluated by other member countries as well.

The CLARIN DSpace has been primarily developed by one centre, but with increasing level of deployment across more countries and centres, the time has come to revisit the development model to improve the experience and sustainability for all centres and users. To shape the future of development of the common CLARIN DSpace repository system a workshop was held in Prague on 8-10 November 2016.

The workshop explored the priorities of all CLARIN centres that use DSpace. The main goal of the workshop was reached by the gentlemen’s agreement to commit in the long term 0.3-0.5 FTE per centre to the project. Given the deployment base this is enough to ensure sustainable development and deployment support for the entire CLARIN centres’ community. A dedicated development and support space was created (a CLARIN Slack channel) to share knowledge efficiently and not depend on single CLARIN centre or a single developer. Besides this an implementation plan was created by concerted efforts of all involved centres, issues being suggested, commented and voted for by multiple centres and countries.

Everybody is welcome to join us at the CLARIN Slack: https://clarineric.slack.com/messages/dspace/ and follow the development at https://github.com/ufal/clarin-dspace/

The following roadmap for the post-workshop implementation project was initiated at the workshop:

  • Univocal agreement was reached on the principle that we establish a common implementation plan open to any developers from any centres.
  • Issues were created and documented in accordance with the original proposal, discussed and voted for till 12 December. Estimates of effort required were assigned to all of them. Progress on all issues will be recorded (links below).
  • Period to take up development of issues till 18 December. After that LINDAT/CLARIN team starts picking them up according to priorities from voting and implementing them. Any unassigned issues are still available for any interested developers.
  • When all the scheduled issues are implemented, budget reserved for the implementation is distributed to involved developers’ home centres according to the efforts spent.

Some of the works has already progressed in the last couple of weeks.

Workshop goals linked to CLARIN’s strategic priorities

The CLARIN strategy is based on nine pillars. The workshop output and implementation plan addresses 4 of these pillars:  

  1. Sustainability:
    1. The main goal of the workshop was reached by the gentlemen’s agreement to commit in the long term 0.3-0.5 FTE per centre to the project. Given the deployment base this is enough to ensure sustainable development. A dedicated development and support space was created (a CLARIN Slack channel) to share knowledge efficiently and not depend on single CLARIN centre or a single developer.
    2. Simplifying localisations and overlays (local customisations) as specified in 4) will minimise differences in local branches of the project thus making sure none of them gets left behind due to too much diversification.
  2. Crossing borders:
    1. All effort has been made to involve developers from as many countries as possible. The implementation plan is being kept open and support is provided for all new developers.
    2. The implementation plan was created by concerted efforts of all involved centres, issues being suggested, commented and voted for by multiple centres and countries.
  3. Integration of services:
    1. Currently there are several different prototypes of integration of DSpace repository (providing data) with other services like searching or processing the data: LINDAT/CLARIN, CLARIN-PL, partially others. In (5) we plan to unify the approach as much as possible.
    2. In (6) we improve on integration with statistics engine Piwik
    3. In (8) we provide integration with rudimentary processing and analysis of popular file formats
    4. In (3) we integrate the submission workflow with cloud storage services
  4. User Involvement:
    1. User friendly integration of popular cloud storage services to make the experience of providing data for submissions effortless and efficient (3)
    2.  Easing the use of the repository by simplifying transfers of big data (1,3), simplifying localisations, this making sure more content is localised and the localisation is better and up to date (4)
    3. Supporting users in reporting on their data by providing them with statistics and complete reports on usage (visits and downloads) of their data (6)

 

List of presentations

 Presenter  

 Download

Martin Wynne

OTA_repository_model.pdf

Menzo Windhouwer

MIandMPI-requirements.pptx

Agustin Caminero

CLARIN-DSpace_2016-11-07.pdf

Hemed Al Ruwehy

CLARINO_Repo.pptx

Riccardo Del Gratta

Dspace_ILC4CLARIN_CLARIN-IT.pdf

Cyprian Laskowski

dspace-clarin.si.pdf

Marcin Pol

CLARIN-PL_D-SPACE.pdf

Pavel Straňák

DSpace_in_CLARIN_overview.pdf

Jozef Mišutka

clarin-dspace_prague.pdf

Workshop organizers

Jan Hajič, Charles University in Prague (LINDAT/CLARIN)

Tomaž Erjavec, Jožef Stefan Institute (CLARIN.SI)

Martin Wynne, University of Oxford (CLARIN-UK)

Marcin Pol, Wrocław University of Technology (CLARIN-PL)

Menzo Windhouwer, Meertens Instituut (Clariah)

Martin Matthiesen, CSC - IT Center for Science (Fin-CLARIN)

Pavel Straňák, Charles University in Prague (LINDAT/CLARIN)

Jozef Mišutka, Charles University in Prague (LINDAT/CLARIN)