ViRR

From MPDLMediaWiki
Revision as of 10:19, 12 October 2007 by Uat (talk | contribs) (→‎Data formats)
Jump to navigation Jump to search

This is a protected page.

ViRR (Virtueller Raum Reichsrecht) is a collaboration of the MPDL with the Max Planck Institute for European History of Law.


Introduction[edit]

The institute already has some experiences with digital collections, e.g in the scope of their Digital Library.


Vision[edit]

A lot of information about the law of the Holy Roman Empire exists, but is distributed over several institutions and libraries. For decentralization, an overall internet platform is desirable.

The beginning should be to provide the already existing digital scans of 15 books (together more than 16000 pictures) online with free browsing and navigation functionalities. Further on, the collection should be expanded with the transcriptions of the Table of Contents of the books and the later on with the whole transcriptions of all pages.

One step further would be to expand the collection with pictures, secondary literature, sources, publications and digitized works from other institutions and projects like the DRW, the cooperation between BSB and GoogleBooks, information from library catalogs, databases of the institute and further documentations like the Polizeyordnungen.

Further on, a overall collection of links suitable for the "Reichsrecht" should be integrated analog to the link collection of the institute concerning the whole European History of Law.

Last but not least a possibility for an overall search in all integrated sources is desirable. This search should include an index and a list of synonyms for the different spellings of the titles.


Aim[edit]

The "Virtueller Raum Reichsrecht" will provide a digital compilation and working environment for various artefacts of the period of the German Holy Empire. VIRRR will be a published collection and also an cooperative working environment. The collection will be indexed, edited, and enlarged cooperatively within the discipline.

Possible working scenarios are:

  • Submission of digitized work
  • Transcription and markup of the digitized work
  • Assignment of keywords and synonyms to the digitized work
  • Annotations (with different visibilities) of the digitized work
  • Adding (external) lexical corpora
  • Publishing of digitised works
  • search in collection
  • browse and display collection
  • create personal working sets within the environment

Current Status[edit]

Currently, only the digitized work ("Digitalisate") of the relevant books are available here (belongs to other digitized work from the cooperation of the institute with the Heidelberger Akademie). All pictures are saved in four different sizes: fil < film < indiv < max. The original Tiffs are hosted by the GWGD. The names of the folders and files are derived from the signatures of the institute (analog to the list of digitized work).

  • The images are available as tiff, pdf, jpeg (for colored pictures) and giff (for black and white pictures).
  • During the digitalization, uncompressed images with a high resolution and a 24 bit color depth were created. This images serve as archive format. Based on them, the other formats for the internet presentation and the download are created.
  • The bibliographical data of the digitized works can be delivered by the institute in sisis MAB ("Maschienenlesbares Austauschformat für Bibliotheken"). MAB describes one special data structure which is used on libraries for the exchange of metadata.

Requirements[edit]

Functional Requirements[edit]

Visibility and privileges

  • Two different views of the data are required. One for registered users and one for unregistered users.
  • The digitized works are all public visible.
  • Privileges for a registered user have to be defined


Browsing

  • structural navigation through digitised works (Browsing in all digitized items)
  • Paginator (go to original or scanned page number)
  • Google site map or similar site structure view
  • Zoom functionality (e.g. Digilib)

Display

  • display of metadata (details? configurable? re-use?)

Download

  • download of single image in original (or different?) format
  • Downloading (and printing) several digitized work in one document (conversion from several tiffs to one pdf)

Export

  • Export to which formats required?
  • What is exported?

Relations

  • page is part of book
  • toc is part of book
  • transcription is for page
  • translations?
  • versions?
  • other relations?

Persistent Identifier (PID)

PIDs are needed not only for the books, but also for special parts of the books (details have to be specified).

Synonym Lists

  • a controlled list for synonyms for subject headings (first: generate basic subject headings out of title, second: enrich headings with help of synonym list)
  • controlled by whom and which privileges?
  • how to create start content?

Technical Requirements[edit]

Corporate Design Requirements[edit]

  • The name of the institute should be visible on every page

Metadata[edit]

Technical Metadata

One component of the eSciDoc framework called JHOVE is able to extract all technical metadata currently saved in the pictures.



Data formats[edit]

  • Requirement: interoperability.

to which systems? to which formats?

  • The usage of an editor for the metadata has to be considered (e.g. GOBI for METS) for offline usage/preparation of metadata.

Several metadata standards for the representation of the structural metadata are in discussion.

1. eBind eBind was developed in (1996) but has never been finished as a international standard. But because a small tool for converting eBind files into HTML pages was available (eBind2HTML), eBind was used very often. In 1998 "MOA2" (Making of America) was developed out of eBind. MOA2 then become METS as part of the standardization process.


Pro:

  • eBind enables to structure the texts in paragraphs.

Con:

  • eBind does not enable to mark up journals, because its not possible to define one author per article, only per book.


2. METS

Pro:

  • METS enables the mark up of front, cover, etc.
  • METS is an international standard. It displays the hierarchical structure, the name and the location of the data storage and the metadata of objects. --> METS can be used for descriptive metadata as well as container format.


Contra: no direkt translation into eSciDoc internal format (basic services)

3. eSciDoc native format

will be checked.

Further information[edit]

Tools

The following tools are needed for the production of raw data

  • A xml editor for the acquisition of structural data.
  • OCR for the production of transcriptions.
  • A tool for the transformation of the archive pictures (tiff) to formats appropriate for the internet (gif, jpeg, PDF). This tool should include an automatic image quality correction.

Copyright

  • The source of the digitized works are in the public domain. The digitized works them selfs don't have a copyright, they are free for further usage.
  • Only integrated secondary literature have to be checked for copyright licenses.


Related links/interesting digitization projects

  • Later on, further books should be digitalized. Therefore, a collaboration with the DTA (Deutsches Text Archiv) will be discussed. The aim of the DTA is to create a digital collection with several hundred million tokens (words) from German documents. This collection should contain the scans and transcriptions of the documents and should reflect a representative picture of the linguistic and cultural development of starting from the middle of the 17th century until now.

Project Management[edit]

Meetings[edit]

18.06.07 (Frankfurt): Kick off Meeting

22.10.07 (Frankfurt): Follow up Meeting


Work in Progress[edit]

Next steps:

  • preparation work for starting ingestion of 2 volumes (scanned pages and bibliographic data)
  • conversion local MAB data to eSciDoc xml
  • conversion eBind to METS (METS as import, export format)
  • evaluate METS as possible format for local generation/enrichment of metadata (until 22nd oct latest, next meeting)
  • check editor tool for prepare/enrich metadata via METS (GOOBI)


ViRR Schedule ViRR Scope