Digitization Lifecycle Mapping the Landscape of eResearch

Mapping the Landscape of eReseach[edit]

Text - Image - Annotation Harnack Haus, Berlin, Germany February 22-23, 2012

The workshop at a glance[edit]

To extend insight about current eResearch applications and to leverage broader discussions, several Max Planck Institutes within the Humanities and Social Sciences Section launched this workshop organized by the Max Planck Digital Library.

The workshop aims to get researchers and scholars, IT-professionals, "cybrarians" and project staff in touch with each other. Selected professionals will present their projects and findings in the context of text, image and annotation to an expert audience.

It is an explicit goal of the workshop that participants think out of the box and beyond their institutional and technical context. Contributors and participants are asked to report in depth on their topics, solutions, tools and especially about encountered problems and ways how to address them. Additionally, the workshop wants to provide an open space for discussion and a platform for networking. Communication, exchange and potentially resulting collaboration are intended and desirable.

Mapping the Landscape of eReseach addresses core fields, key issues and problems of digitization projects and Virtual Research Environments that appear recurrently in very different research contexts when such projects are planned and realized:

Status and future development of the Text Encoding Initiative (TEI)

Linguistic tools and processes, linguistic computing

Images: Viewing specifications, image administration and presentation tools, visualization

Referencing: reference data and annotations, layer solutions, markup tools

Background[edit]

This workshop was planned and realized on behalf of the MPG project Digitization Lifecycle. For more information about Digitization Lifecycle please visit our project website (text in German) or download the project description in English.

Programme[edit]

(printer-friendly version)

Duration of main lectures = 45 minutes: 30 minutes for contribution and 15 minutes for discussion

22.02.2012	Topic
12:30-13:00	Registration
13:00-13:10	Karl Härter (Max Planck Institute for European Legal History): Workshop Introduction
13:10-13:50	Introducing Digitization Lifecycle (DLC) Malte Dreyer (Max Planck Digital Library): Technical Implications Jan Simane (Kunsthistorisches Institut in Florenz, Max-Planck-Institut): Scientific Implications
13:50-14:00	Introduction: Text in DLC
14:00-15:30	Lou Burnard (formerly Oxford University Computing Services), Sebastian Rahtz (Oxford University Computing Services): The TEI: private and public concerns. Christoph Ringlstetter (Center for Information and Language Processing CIS, University of Munich): Improving Access to Historical Documents - Special Lexica and Resources
15:30-16:00	Coffee Break
16:00-16:10	Introduction: Images in DLC
16:10-17:40	Ute Dercks (Photo Library of the Kunsthistorisches Institut in Florenz, Max-Planck-Institut): Cutting-edge technology meets the Middle Ages. CENOBIUM - A Project for the Multimedia Representation of Romanesque Cloister Capitals in the Mediterranean Region Irena Murray (Royal Institute of Britich Architects): Architecture Global: Linking Collections in the Digital Universe
19:00	Conference Dinner (*)
edit table

(*) Please pay individually

23.02.2012	Topic
9:00-9:10	Introduction: Synthesis First Day, Text in DLC
9:10-10:40	Georg Vogeler (University of Graz): Lessons from Monasterium.net: More Efficient Cooperation between Science and Cultural Heritage Institutions through Online Collaboration Christian Thomas (Berlin-Brandenburg Academy of Sciences and Humanities): DTAE: Enlarging the Reference Corpus of the Deutsches Textarchiv (DTA) - Production, Conversion and Interchange of XML/TEI Encoded Full Text
10:40-11:10	Coffee Break
11:10-11:20	Introduction: Annotations in DLC
11:20-13:00	Carsten Blüm (Goethe University Frankfurt): Sandrart.net: An Enriched Online Edition of a 17th Century Text Erhard Hinrichs, Kathrin Beck (Eberhard Karls University Tübingen): Web-Based Linguistic Annotation: Current Practise and Future Directions
13:00-14:00	Lunch
14:00-15:30	Rainer Simon (Austrian Institute of Technology): Collaborative Media Annotation with YUMA Georg Schelbert (Humboldt-University Berlin): The Topography of Knowledge. On Georeferencing of Cultural History Data
15:30-16:00	Andreas Thielemann (Bibliotheca Hertzinana, Max-Planck-Institut für Kunstgeschichte): Final Remarks and Farewell
edit table

Abstracts[edit]

Beck, Kathrin; Hinrichs, Erhard (Eberhard Karls University Tübingen): Web-Based Linguistic Annotation: Current Practise and Future Directions

In this talk, we will discuss the potential and the challenges involved in web-based linguistic annotation in an eHumanities context. We will introduce the virtual research environment WebLicht as a case study in order to illustrate the general issues that arise in web-based annotation. WebLicht is available as part of the ESFRI infrastructure project CLARIN, whose mission it is to establish an integrated and interoperable research infrastructure of language resources and its technology. It aims at lifting the current fragmentation, offering a stable, persistent, accessible and extendable infrastructure.

Blüm, Carsten (Goethe University Frankfurt): Sandrart.net: An enriched online edition of a 17th century text

Sandrart.net is a cooperation project between the Goethe-Universität Frankfurt am Main and the Kunsthistorisches Institut in Florence (Max-Planck-Institut), funded by the Deutsche Forschungsgemeinschaft. The initial goal was a web-based edition of Joachim von Sandrart’s “Teutsche Academie der Bau-, Bild- und Mahlerey-Künste” (1675/1679/1680), which meanwhile has evolved into a website where the original content has been augmented with metadata, images, translations, annotations and accompanying tools. Technically, the project is a TEI-/database-backed hybrid application which has been created largely using web-based tools.

Burnard, Lou (formerly Oxford University Computing Services); Rahtz, Sebastian (Oxford University Computing Services): The TEI: private and public concerns

The Text Encoding Initiative was designed from the start as a dynamic model which could provide both a firmly-anchored model for well-understood structural components and analyses of digital texts, and a framework in which scholars could freely record in an open-ended and non-prescriptive way. Underlying this was an assumption that the results would be interoperable, but only relatively recently has this been tested in large-scale practice. Tensions have now started to emerge between those who want the TEI to be entirely prescriptive, or to have more mandatory components, and those who argue that it is a purely descriptive decoration whose appearance of general machine interoperability was never a real possibility.

In this talk we will look at some of the components of the TEI which cause tensions (loose and multi-choice content models, short cuts, open-ended attribute values etc), and some of the ways the TEI community can consider safely exposing texts to interchange (data extraction to RDF, mapping equivalences to simplify markup, manifesting constraints in ODD etc).

Dercks, Ute (Photo Library of the Kunsthistorisches Instituts in Florenz, Max-Planck-Institut): Cutting-edge technology meets the Middle Ages. CENOBIUM - A Project for the Multimedia Representation of Romanesque Cloister Capitals in the Mediterranean Region

The use of new technologies in the documentation and study of Cultural Heritage sites has been an important issue since the 19th Century. The invention and diffusion of new means to acquire and visualize information has brought revolutionary changes in the way objects and monuments have been analyzed in art history and other contexts. The CENOBIUM project combines new techniques of visual representation with web technology in the pursuit of new insights regarding the artifacts focused on trans-cultural contacts in twelfth and thirteenth centuries’ architectural decoration.

Murray, Irena (Royal Institute of British Architects): Architecture Global: Linking Collections in the Digital Universe

Envisioning a progressively linked network of digitized collections in the architecture field is a natural ambition for the Royal Institute of British Architects in London. With collections of drawings, photographs, models, artifacts, photographs and rare books of over four million items, the Institute is engaged in developing an innovative model for their digital delivery not just to its 45 000 architects-members, but to students, curators, scholars and the interested public world-wide. The model anticipates a progressive network of international partnerships through which digitized collections can be linked to an underlying structure for world-wide access. The initial concept and modeling will be discussed.

Ringlstetter, Christoph (Center for Information and Language Processing CIS, University of Munich): Improving access to historical documents - special lexica and resources

The vocabulary of historical texts differs from modern vocabulary in various ways. Especially for OCR and Information Retrieval on historical text collections, special electronical lexica are needed. We report on our experience in collecting and analyzing suitable diacronic corpus material and in the construction of special lexical resources for historical texts from four centuries: the "hypothetical" lexicon covering rule based variants, the manually verified lexicon for Information Retrieval and first experiments on historical Named Entity recognition with special background resources.

Schelbert, Georg (Humboldt-University Berlin): The Topography of Knowledge. On Georeferencing of Cultural History Data

In geography and its fields of application, GIS systems have become indispensable for many years. With the increasing use also in archaeology or in the domain of monument preservation, such systems have reached the territory of historical studies as well. As documentation theory always tends more towards network models (linked data, semantic web), the concept of place- as a language-independent entity and reference - gains increased importance in general, too. However, hitherto used GIS systems are often equipped only with a relatively simple database that is not able to handle complex metadata models. In addition, information technology expertise and standards are not yet widespread in the cultural history disciplines, so that there is still a considerable development work to do, as I would like to show with the help of a few practical examples from the domain of Art History and the History of Architecture.

Simon, Rainer (Austrian Institute of Technology): Collaborative Media Annotation with YUMA

The practice of annotation has traditionally been playing a crucial role in scholarly research: on the one hand, annotations enable scholars to share and exchange knowledge, and work collaboratively in the interpretation and analysis of source material. On the other hand, annotations are a valuable addition to traditional metadata, which is essential for organising and cataloguing, as well as for searching and retrieving objects within collections. The YUMA Universal Media Annotator (YUMA) is an end-user annotation toolkit for different types of digital media content. With YUMA, users create 'Post-It'-style free-text annotations, as well as Semantic Tags to add structured context information. YUMA was developed as a prototype in the scope of the EU-funded EuropeanaConnect project, and is currently in the transition phase to an Open Source community project, located at http://yuma-js.github.com.

Thomas, Christian (Berlin-Brandenburg Academy of Sciences and Humanities): DTAE: Enlarging the Reference Corpus of the Deutsches Textarchiv (DTA) - Production, Conversion and Interchange of XML/TEI Encoded Full Text

In the course of the project (which runs until 2014), the DTA aims to publish around 1,300 volumes on its own. To even enhance this ›core collection‹, the software module DTAE (“E” stands for Enlargement or Extension) was developed. With the help of DTAE, external projects can integrate their historical text collections into the DTA reference corpus. They can present their data in a larger context and benefit from the elaborate linguistic search engine and text processing routines of the DTA. In addition, external contributors can integrate resp. re-import the processed text and metadata into their own web site via <iframe>. DTAE provides routines for uploading metadata, text and images, as well as semiautomatic conversion tools from different source formats (plain text, MS Word, TUSTEP, HTML, TEI-XML, …) into the XML/TEI conformant ›base format‹ of the DTA. DTAE thus demonstrates how interchange and interoperability among projects can work on a large scale. The presentation illustrates the described approach by different examples of text interchange resp. text production partnerships between the DTA and its external partners, i.e. the MPI, the HAB Wolfenbüttel and the Göttingen Academy of Sciences and Humanities. Possibilities and challenges of the exchange of XML/TEI documents will be discussed. (read the full abstract)

Vogeler, Georg (University of Graz): Lessons from Monasterium.net: More Efficient Cooperation between Science and Cultural Heritage Institutions through Online Collaboration

Crowdsourcing and online collaboration are “hype” words in the current public discussion. Monasterium.net is the largest charter database in Europe. It tried to implement an environment supporting the ideas of online cooperation between archives and their users from the very beginning. The talk reports on the experiences made on the way to the current state of the project. It presents the concepts of the core application in this approach – the Monasterium Collaborative Archive (MOM-CA). Finally it will discuss why important obstacles for the development of an effective cooperation between cultural heritage institutions and scholars cannot be solved technically.

Digitization Lifecycle Mapping the Landscape of eResearch

Contents

Mapping the Landscape of eReseach[edit]

The workshop at a glance[edit]

Background[edit]

Programme[edit]

Abstracts[edit]

Navigation menu

Search