Program (December 10)
Time | Subject | Authors |
09:00 | Keynote talk | |
10:00 | Introduction to the Workshop [slides] | P. Wittenburg |
10:15 | No Claims for Universal Solutions | T. Blanke, A. Aschenbrenner, M. Küster, C. Ludwig |
11:15 | Coffee Break | |
11:30 | Managing and Integrating very large Multimedia Archives | D. Broeder, E. Auer, M. Kemps-Snijders, H. Sloetjes, P. Wittenburg, C. Zinn |
12:30 | Lunch Break | |
13:30 | The e-Linguistics Toolkit | S. Farrar, S. Moran |
14:30 | Visualization of Dialect Data | E. Hinrichs, T. Zastrow |
15:30 | Putting Data Categories in their Semantic Context | M. Kemps-Snijders, M. Windhouwer, S. Wright |
16:30 | eAQUA - Bringing modern Text Mining approaches to two thousand years old ancient texts | M. Buechler, G. Heyer, S. Gründer |
17:30 | Discussion and Conclusions | |
18:00 | End workshop & start poster session |
In the Humanities the availability of new digital technology and increasing amounts of digitized data has triggered the development of several novel research methods. The capability of creating and using large digital collections of structured and unstructured resources and the emergence of powerful algorithms for processing the data from multiple perspectives is already affecting all Humanities disciplines. However, to reap the full benefit of e-Science approaches, a number of issues that are specific for the Humanities must be addressed. It is the aim of this workshop to do just this.
In the past many resources have been made available in digital form. These include texts, multimedia documents, but also a wide range of meta-data, from annotations of documents, via lexicons and taxonomies to grammatical descriptions of many natural languages. Since these resources have been created independently, in the absence of standards for character encoding, file formats, annotation systems, access rights and IPR, these resources do not interoperate. Yet, the full benefits of e-Humanities can only be had if independently created resources can be combined, as if they formed one large resource. Therefore, substantial work remains to be done to reach a situation in which each scholar can peruse the combined resources with the same ease as if they formed one homogeneous resource.
So far only a fraction of the existing documents that are of interest to the Humanities has been digitized. The same holds for knowledge sources such as lexicons and grammars. Thus, we are seeing, and we will be seeing, projects aimed at digitizing additional resources. To avoid the need for expensive repair measures to enable interoperability after the completion of these projects, standards for all levels –from character encoding to the semantics of meta-data- must be developed. Standardization activities are under way, but they are far from completion.
The distributed character of the resources, in combination with local expertise that is needed to keep them up-to-date, naturally leads to a Data Grid. The enormous amounts of computations necessary for advanced automatic pattern detection and other machine learning techniques gives rise to the need for using Grid Computing. Both aspects of the Grid-based processing are likely to pose special requirements related to the type of data, the type of questions that scientists ask, and the access rights.
The specific questions addressed in the Humanities and the specific types of data that are of interest require the development of dedicated algorithms. Even if these algorithms can be adapted from related disciplines, there is still a large amount of work to be done before the toolbox for e-Humanities research is reasonably complete and before existing tools can easily be combined to workflow chains by the humanities scholar who is not an expert.
e-Humanities can only be successful if it is possible to provide computer tools that support scholars in their research, rather than forces them to spend lots of time learning how to use new tools, or even worse, developing new tools. To prepare researchers for using the emerging e-Humanities tools, novel courses must be developed for undergraduate and graduate programs. However, even the best possible education cannot compensate for bad design of the tools. Therefore, the e-Humanities toolbox must come with an excellent user interface.
Papers submitted for presentation on the workshop should report original research that has not been published elsewhere. In addition, we invite position papers that make solid contributions to the design of a research roadmap for the e-Humanities.
All papers submitted for presentation in the workshop will be reviewed by at least three members of the Program Committee.
Against the background of the general aim of the workshop we invite papers in all areas indicated above. Thus, the following topics will be covered:
- advanced e-Humanities research scenarios supported by language resources and technology
- advanced collaboration scenarios for geographically distributed collaborative research
- text and media integration, interoperability
- advanced computational modeling
- development of novel tools for Humanities research
- flexible knowledge weaving technology
- data and compute Grids
- advanced user interfaces supporting advanced e-Humanities methods
- education and training for e-Humanities researchers
- accessibility, legal and ethical issues involved in e-Humanities scenarios
- impact of e-Humanities on the research process and changes of the role of the researcher
- other topics that fit in the general goal of the workshop
The full-day workshop will comprise two invited lectures, oral and poster presentations. The workshop will conclude with a discussion that should contribute to the roadmap for future research in the field.
Accepted papers will be published in the workshop proceedings. We intend to publish extended versions of the most interesting papers and the result of the panel discussion in the form of a book, or as a special issue of a leading journal in the field.
Final submission of camera-ready papers: 24 October 2008
Submissions of papers with a maximum length of eight pages must use the conference format instructions and only PDF documents without page numbering will be accepted.
Please send your paper as an email attachment to ehumanities [at] clarin.eu (ehumanities[at]clarin[dot]eu)
Organizers
(CLARIN (http://www.clarin.eu/) and DARIAH (http://www.dariah.eu/) will take care of continuity)
Peter Wittenburg
|
MPI, Nijmegen (chair) |
Laurent Romary
|
MPDL, Berlin
|
Sheila Anderson
|
AHDS, London
|
Peter Doorn
|
, Den Haag
|
Tamas Varadi
|
Academy of Science, Budapest
|
Steven Krauwer
|
University Utrecht
|
Program Committee
Nicoletta Calzolari | CNR, Pisa |
Martin Wynne | OTA, Oxford |
Gerhard Budin | U. Vienna |
Tamas Varadi | Academy of Sciences, Budapest |
Stelios Piperidis | ILSP, Athens |
Carlos Levinho | Museo d'Indio, Rio |
Sven Strömquist | U. Lund |
Kiril Simov | Academy of Sciences, Sofia |
Bente Maegaard | U. Copenhagen |
Jost Gippert | U. Frankfurt |
Eva Hajicova | CU Prague |
Dan Tufis | Academy of Sciences, Bukarest |
Walter Daelemans | U. Antwerp |
Kee-Sun Choi | KAIST, Daejon |
Helen Aristar-Dry | Eastern Michigan U. |
Gary Simons | SIL, Atlanta |
Sadaoki Furui | Tokyo Institute of Technology |
Marc Kemps-Snijders | MPI, Nijmegen |
Laurent Romary | MPDL Berlin |
Sheila Anderson | AHDS, London |
Steven Krauwer | Utrecht University |
Peter Wittenburg | MPI, Nijmegen |
Chu Ren Huang | HK Poly U. HK and Acad. Sinica, Taipei |
Peter Doorn | DANS, Den Haag |
Sue Ellen Wright | Kent State University, Ohio |
Linda Barwick | Paradisec, Sydney University |
Paul Doorenbosch | Dutch Royal Library, Den Haag |
Heike Neuroth | SUB Göttingen |
Peter Gietz | DAASI, Tübingen |
Fotis Jannidis | TU Darmstadt |
Tony Hey | Microsoft Research |
Abstracts
No Claims for Universal Solution [download]
The aim of this paper is to review some emerging successful practices in e-Humanities from the perspective of Germany and the UK. All the reviewed projects work on the collaboration of science and computing practices with arts and humanities. This paper argues that claims to universality are the wrong way of promoting future research in e-Humanities. We need to open local spaces in which arts and humanities researchers can engage with e-Science tools and methodologies in a way they are used to engage with other research in their domain. This suggests to ‘embed’ e-Science in arts and humanities research practices.
Managing and Integrating very large Multimedia Archives [paper] [slides]
Research in the humanities is relying more and more on digital data. In this paper we highlight the specific needs and issues in this field, based on our experience with setting up a large-scale multimedia archive. The focus goes especially to interoperability, long-term preservation and future evolutions.
The e-Linguistics Toolkit [download]
In order to achieve the objectives of the e-Humanities framework, it is necessary for individual humanities fields to take charge and to create their own specific techniques that apply to their own unique varieties of data. The e-Linguistics Toolkit is presented to aid the move of linguistics into this new digital era. We show that achieving this requires data interoperability in terms of encoding, format, and content. Once data are made interoperable, data manipulation in the form of validation and merging is necessary. As one way to ensure immediate usefulness to the ordinary working linguist, the toolkit offers data transformation services among various working formats and lays the foundations for intelligent search.
Visualization of Dialect Data [download]
The field of dialectology relies on lexical knowledge in the form of pronunciation and lexical data. The present paper focuses on the recently developed approach of computational dialectometry, particularly on the scientific visualization techniques that have been developed within this approach. Existing visualization software packages are mature enough to automatically generate maps and other visualization formats in a matter of minutes. This provides the basis for a more in-depth analysis of the primary data. Elec-tronic archives offer the possibility of including scientific visualizations of the analytic results in conjunction with the primary data and the raw output of the analytic algorithms used. This in turn enables other researchers to replicate the results or to use the primary data for further analysis.
Putting Data Categories in their Semantic Context [download]
The TC 37 Data Category Registry (
; www.isocat.org) specifies names, authoritative definitions, and other information and constraints for data categories used in a wide range of linguistic resources. Data category selections subsetted and exported from the DCR in the Data Category Interchange Format can be used as the basis for configuring diverse applications. Furthermore, authoritative standardized data category definitions can contribute trustworthy semantic content for the creation of Relation Registries in the extended environ-ment of the DCR in support of external ontologies and other semantic web resources. Resources that refer-ence DCR specifications will require annotation using unique, location-independent, persistent identifiers, and procedures must be established for maintaining ongoing coordination of external resources referenc-ing the dynamic DCR. Developers are currently exploring approaches to data category modeling in RDF(S) and OWL-DL and plotting navigation strategies for traversing a network of data category and relation registries, as well as linguistic resources.
eAQUA - Bringing modern Text Mining approaches to two thousand years old ancient texts [download]
In this paper we give an overview of our work on a new research project, which brings together ancient texts and modern methods from the field of text mining. The project is structured so that is comprises data, algorithms, and applications. In this paper we first give a short introduction of the current state of the art. After that we describe what eAQUA will do and what is our methodology.
Indianapolis
United States