IEEE e-Humanities Workshop

Tuesday, 9 December 2008 , 23:00 - Wednesday, 10 December 2008 , 22:59

e-Humanities – an emerging discipline: Workshop in the 4th IEEE International Conference on e-Science

Program (December 10)

Time	Subject	Authors
09:00	Keynote talk
10:00	Introduction to the Workshop [slides]	P. Wittenburg
10:15	No Claims for Universal Solutions	T. Blanke, A. Aschenbrenner, M. Küster, C. Ludwig
11:15	Coffee Break
11:30	Managing and Integrating very large Multimedia Archives	D. Broeder, E. Auer, M. Kemps-Snijders, H. Sloetjes, P. Wittenburg, C. Zinn
12:30	Lunch Break
13:30	The e-Linguistics Toolkit	S. Farrar, S. Moran
14:30	Visualization of Dialect Data	E. Hinrichs, T. Zastrow
15:30	Putting Data Categories in their Semantic Context	M. Kemps-Snijders, M. Windhouwer, S. Wright
16:30	eAQUA - Bringing modern Text Mining approaches to two thousand years old ancient texts	M. Buechler, G. Heyer, S. Gründer
17:30	Discussion and Conclusions
18:00	End workshop & start poster session

Aim of the workshop

In the Humanities the availability of new digital technology and increasing amounts of digitized data has triggered the development of several novel research methods. The capability of creating and using large digital collections of structured and unstructured resources and the emergence of powerful algorithms for processing the data from multiple perspectives is already affecting all Humanities disciplines. However, to reap the full benefit of e-Science approaches, a number of issues that are specific for the Humanities must be addressed. It is the aim of this workshop to do just this.

In the past many resources have been made available in digital form. These include texts, multimedia documents, but also a wide range of meta-data, from annotations of documents, via lexicons and taxonomies to grammatical descriptions of many natural languages. Since these resources have been created independently, in the absence of standards for character encoding, file formats, annotation systems, access rights and IPR, these resources do not interoperate. Yet, the full benefits of e-Humanities can only be had if independently created resources can be combined, as if they formed one large resource. Therefore, substantial work remains to be done to reach a situation in which each scholar can peruse the combined resources with the same ease as if they formed one homogeneous resource.

So far only a fraction of the existing documents that are of interest to the Humanities has been digitized. The same holds for knowledge sources such as lexicons and grammars. Thus, we are seeing, and we will be seeing, projects aimed at digitizing additional resources. To avoid the need for expensive repair measures to enable interoperability after the completion of these projects, standards for all levels –from character encoding to the semantics of meta-data- must be developed. Standardization activities are under way, but they are far from completion.

The distributed character of the resources, in combination with local expertise that is needed to keep them up-to-date, naturally leads to a Data Grid. The enormous amounts of computations necessary for advanced automatic pattern detection and other machine learning techniques gives rise to the need for using Grid Computing. Both aspects of the Grid-based processing are likely to pose special requirements related to the type of data, the type of questions that scientists ask, and the access rights.

The specific questions addressed in the Humanities and the specific types of data that are of interest require the development of dedicated algorithms. Even if these algorithms can be adapted from related disciplines, there is still a large amount of work to be done before the toolbox for e-Humanities research is reasonably complete and before existing tools can easily be combined to workflow chains by the humanities scholar who is not an expert.

e-Humanities can only be successful if it is possible to provide computer tools that support scholars in their research, rather than forces them to spend lots of time learning how to use new tools, or even worse, developing new tools. To prepare researchers for using the emerging e-Humanities tools, novel courses must be developed for undergraduate and graduate programs. However, even the best possible education cannot compensate for bad design of the tools. Therefore, the e-Humanities toolbox must come with an excellent user interface.

Call for Papers

Papers submitted for presentation on the workshop should report original research that has not been published elsewhere. In addition, we invite position papers that make solid contributions to the design of a research roadmap for the e-Humanities.

All papers submitted for presentation in the workshop will be reviewed by at least three members of the Program Committee.

Against the background of the general aim of the workshop we invite papers in all areas indicated above. Thus, the following topics will be covered:

advanced e-Humanities research scenarios supported by language resources and technology
advanced collaboration scenarios for geographically distributed collaborative research
text and media integration, interoperability
advanced computational modeling
development of novel tools for Humanities research
flexible knowledge weaving technology
data and compute Grids
advanced user interfaces supporting advanced e-Humanities methods
education and training for e-Humanities researchers
accessibility, legal and ethical issues involved in e-Humanities scenarios
impact of e-Humanities on the research process and changes of the role of the researcher
other topics that fit in the general goal of the workshop

The full-day workshop will comprise two invited lectures, oral and poster presentations. The workshop will conclude with a discussion that should contribute to the roadmap for future research in the field.

Accepted papers will be published in the workshop proceedings. We intend to publish extended versions of the most interesting papers and the result of the panel discussion in the form of a book, or as a special issue of a leading journal in the field.

Important dates

1^st Call for Papers: 5 May 2008

2^nd Call for Papers: 16 June 2008

Deadline for Submission of full papers: 5 September 2008 (extended deadline)

Notification of Acceptance: 7 October 2008

Final submission of camera-ready papers: 24 October 2008

Final Program published on the Web: 24 October 2008

Conference and Workshop: 7-12 December 2008

Submissions of papers with a maximum length of eight pages must use the conference format instructions and only PDF documents without page numbering will be accepted.

Please send your paper as an email attachment to ehumanities [at] clarin.eu (ehumanities[at]clarin[dot]eu)

Organizers

(CLARIN (http://www.clarin.eu/) and DARIAH (http://www.dariah.eu/) will take care of continuity)

Peter Wittenburg	MPI, Nijmegen (chair)
Laurent Romary	MPDL, Berlin
Sheila Anderson	AHDS, London
Peter Doorn	, Den Haag
Tamas Varadi	Academy of Science, Budapest
Steven Krauwer	University Utrecht

Program Committee

Nicoletta Calzolari	CNR, Pisa
Martin Wynne	OTA, Oxford
Gerhard Budin	U. Vienna
Tamas Varadi	Academy of Sciences, Budapest
Stelios Piperidis	ILSP, Athens
Carlos Levinho	Museo d'Indio, Rio
Sven Strömquist	U. Lund
Kiril Simov	Academy of Sciences, Sofia
Bente Maegaard	U. Copenhagen
Jost Gippert	U. Frankfurt
Eva Hajicova	CU Prague
Dan Tufis	Academy of Sciences, Bukarest
Walter Daelemans	U. Antwerp
Kee-Sun Choi	KAIST, Daejon
Helen Aristar-Dry	Eastern Michigan U.
Gary Simons	SIL, Atlanta
Sadaoki Furui	Tokyo Institute of Technology
Marc Kemps-Snijders	MPI, Nijmegen
Laurent Romary	MPDL Berlin
Sheila Anderson	AHDS, London
Steven Krauwer	Utrecht University
Peter Wittenburg	MPI, Nijmegen
Chu Ren Huang	HK Poly U. HK and Acad. Sinica, Taipei
Peter Doorn	DANS, Den Haag
Sue Ellen Wright	Kent State University, Ohio
Linda Barwick	Paradisec, Sydney University
Paul Doorenbosch	Dutch Royal Library, Den Haag
Heike Neuroth	SUB Göttingen
Peter Gietz	DAASI, Tübingen
Fotis Jannidis	TU Darmstadt
Tony Hey	Microsoft Research

Abstracts

No Claims for Universal Solution [download]
The aim of this paper is to review some emerging successful practices in e-Humanities from the perspective of Germany and the UK. All the reviewed projects work on the collaboration of science and computing practices with arts and humanities. This paper argues that claims to universality are the wrong way of promoting future research in e-Humanities. We need to open local spaces in which arts and humanities researchers can engage with e-Science tools and methodologies in a way they are used to engage with other research in their domain. This suggests to ‘embed’ e-Science in arts and humanities research practices.

Managing and Integrating very large Multimedia Archives [paper] [slides]
Research in the humanities is relying more and more on digital data. In this paper we highlight the specific needs and issues in this field, based on our experience with setting up a large-scale multimedia archive. The focus goes especially to interoperability, long-term preservation and future evolutions.

The e-Linguistics Toolkit [download]
In order to achieve the objectives of the e-Humanities framework, it is necessary for individual humanities fields to take charge and to create their own specific techniques that apply to their own unique varieties of data. The e-Linguistics Toolkit is presented to aid the move of linguistics into this new digital era. We show that achieving this requires data interoperability in terms of encoding, format, and content. Once data are made interoperable, data manipulation in the form of validation and merging is necessary. As one way to ensure immediate usefulness to the ordinary working linguist, the toolkit offers data transformation services among various working formats and lays the foundations for intelligent search.

Visualization of Dialect Data [download]
The field of dialectology relies on lexical knowledge in the form of pronunciation and lexical data. The present paper focuses on the recently developed approach of computational dialectometry, particularly on the scientific visualization techniques that have been developed within this approach. Existing visualization software packages are mature enough to automatically generate maps and other visualization formats in a matter of minutes. This provides the basis for a more in-depth analysis of the primary data. Elec-tronic archives offer the possibility of including scientific visualizations of the analytic results in conjunction with the primary data and the raw output of the analytic algorithms used. This in turn enables other researchers to replicate the results or to use the primary data for further analysis.

Putting Data Categories in their Semantic Context [download]
The TC 37 Data Category Registry ( ; www.isocat.org) specifies names, authoritative definitions, and other information and constraints for data categories used in a wide range of linguistic resources. Data category selections subsetted and exported from the DCR in the Data Category Interchange Format can be used as the basis for configuring diverse applications. Furthermore, authoritative standardized data category definitions can contribute trustworthy semantic content for the creation of Relation Registries in the extended environ-ment of the DCR in support of external ontologies and other semantic web resources. Resources that refer-ence DCR specifications will require annotation using unique, location-independent, persistent identifiers, and procedures must be established for maintaining ongoing coordination of external resources referenc-ing the dynamic DCR. Developers are currently exploring approaches to data category modeling in RDF(S) and OWL-DL and plotting navigation strategies for traversing a network of data category and relation registries, as well as linguistic resources.

eAQUA - Bringing modern Text Mining approaches to two thousand years old ancient texts [download]
In this paper we give an overview of our work on a new research project, which brings together ancient texts and modern methods from the field of text mining. The project is structured so that is comprises data, algorithms, and applications. In this paper we first give a short introduction of the current state of the art. After that we describe what eAQUA will do and what is our methodology.

Address

Indianapolis
United States