Difference between revisions of "ViRR"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 172: Line 172:




'''Digitalization'''
'''Related links/interesting digitisation projects'''


Later on, further books should be digitalized. Therefore, a collaboration with the DTA ([http://www.bbaw.de/bbaw/Forschung/Forschungsprojekte/dta/de/Startseite Deutsches Text Archiv]) will be discussed. The aim of the DTA is to create a digital collection with several hundred million tokens (words) from German documents. This collection should contain the scans and transcriptions of the documents and should reflect a representative picture of the linguistical and cultural development of starting from the middle of the 17th century until now.
* Later on, further books should be digitalized. Therefore, a collaboration with the DTA ([http://www.bbaw.de/bbaw/Forschung/Forschungsprojekte/dta/de/Startseite Deutsches Text Archiv]) will be discussed. The aim of the DTA is to create a digital collection with several hundred million tokens (words) from German documents. This collection should contain the scans and transcriptions of the documents and should reflect a representative picture of the linguistical and cultural development of starting from the middle of the 17th century until now.


* Rules for digitisation (DFG, March 2007, in german)
[http://www.gdz-cms.de/uploads/media/Praxisregeln_Digitalisierung_Maerz_2007_DFG.pdf Praxisregeln Digitalisierung]


== Project Management ==
== Project Management ==

Revision as of 15:42, 11 October 2007

This is a protected page.

ViRR (Virtueller Raum Reichsrecht) is a collaboration of the MPDL with the Max Planck Institute for European History of Law.


Introduction[edit]

The institute already has some experiences with digital collections, e.g in the scope of their Digital Library.


Vision[edit]

A lot of information about the law of the Holy Roman Empire exists, but is distributed over several institutions and libraries. For decentralization, an overall internet platform is desirable.

The beginning should be to provide the already existing digital scans of 15 books (together more than 16000 pictures) online with free browsing and navigation functionalities. Further on, the collection should be expanded with the transcriptions of the Table of Contents of the books and the later on with the whole transcriptions of all pages.

One step further would be to expand the collection with pictures, secondary literature, sources, publications and digitized works from other institutions and projects like the DRW, the cooperation between BSB and GoogleBooks, information from library catalogs, databases of the institute and further documentations like the Polizeyordnungen.

Further on, a overall collection of links suitable for the "Reichsrecht" should be integrated analog to the link collection of the institute concerning the whole European History of Law.

Last but not least a possibility for an overall search in all integrated sources is desirable. This search should include an index and a list of synonyms for the different spellings of the titles.


Aim[edit]

The "Virtueller Raum Reichsrecht" will provide a digital collection and working environment for various artefacts of the period of the German Holy Empire. VIRRR will not only be a published collection, but an cooperative working environment. The collection will be indexed, edited, and enlarged cooperatively within the discipline.

Possible working scenarios are:

  • Assignment of keywords and synonyms to the digitized work
  • Transcription and markup of the digitized work
  • Submission of documents
  • Annotations (with different visibilities) of the digitized work
  • Adding (external) lexical corpora (?)

Current Status[edit]

Currently, only the digitized work ("Digitalisate") of the relevant books are available here (belong other digitized work from the cooperation of the institute with the Heidelberger Akademie). All pictures are saved in four different sizes: fil < film < indiv < max. The original Tiffs are hosted by the GWGD. The names of the folders and files are derived from the signatures of the institute (analog to the list of digitized work).


Requirements[edit]

Functional Requirements[edit]

Visibility

  • Two different views of the data are required. One for registered users and one for unregistered users.
  • The digitized works are all public visible.
  • The privileges for a registered user are still unclear.

Indexing

If available following information should be indexed:

  • Transcriptions
  • Table of Content and registers
  • Titles of the laws
  • Metadata (details need to be specified)

Search

Advanced search in all Metadata (and in transcriptions)

  • Search for keywords (based on list of synonyms and thesauri)
  • Search for bibliographical data like
  • Form of the law (e.g. order, mandate, edict) - unstandardized terms
  • Place, where the law was created - normed terms
  • Date (period), when the law was ratified (Problem: until 1692, two calenders exist)
  • Lawmaker
  • Legislature ("Körperschaft", e.g. imperator, parliament) - normed terms
  • Genre (e.g. digitized work, secondary literature, collections of images like logos)

Browsing

  • Paginator (go to original page xy)
  • Browsing in all digitized items (title list, list with some metadata)
  • Google site map or similar site structure view
  • Zoom functionality (e.g. Digilib)

Display

  • The metadata should be displayed in different, flexible defined formats

Download

Report creation for downloading (and printing) several digitized work in one document (conversion from several tiffs to one pdf)

Relations

  • Translations are not the norm (will only occur in special cases)
  • One item is part of an other item
  • Relations between different versions of one law

Persistent Identifier (PID)

PIDs are needed not only for the books, but also for special parts of the books (details have to be specified).

Synonym Lists

  • A synonym list for subject headings (first: generate basic subject headings out of title, second: enrich headings with help of synonym list)


Technical Requirements[edit]

Data Formats

  • The images are available as tiff, pdf, jpeg (for colored pictures) and giff (for black and white pictures).
  • During the digitalization, uncompressed images with a high resolution and a 24 bit color depth were created. This images serve as archive format. Based on them, the other formats for the internet presentation and the download are created.


Corporate Design Requirements[edit]

  • The institute don't have an own logo
  • The name of the institute should be visible on every page


Metadata[edit]

Technical Metadata

One component of the eSciDoc framework called JHOVE is able to extract all technical metadata currently saved in the pictures.

Structural Metadata

The bibliographical data of the digitized works can be delivered by the institute in sisis MAB ("Maschienenlesbares Austauschformat für Bibliotheken"). MAB describes one special data structure which is used on libraries for the exchange of metadata.

Markup

The aim is to display a hierarchical structure of the texts (chapter, subchapter). The detailedness of this structure still have to be specified.


Metadata Standards[edit]

Several metadata standards for the representation of the structural metadata are in discussion.

eBind was developed in (1996) but has never been finished as a international standard. But because a small tool for converting eBind files into HTML pages was available (eBind2HTML), eBind was used very often. In 1998 "MOA2" (Making of America) was developed out of eBind. MOA2 then become METS as part of the standardization process.

Another markup language, TEI (Text Encoding Initiative) is already eliminated for this project, because it needs a lot of effort for the creation of the markups.


1. eBind

Pro:

  • eBind enables to structure the texts in paragraphs.

Con:

  • eBind does not enable to mark up journals, because its not possible to define one author per article, only per book.


2. METS

Pro:

  • METS enables the mark up of front, cover, etc.
  • METS is an international standard. It displays the hierarchical structure, the name and the location of the data storage and the metadata of objects. --> METS is like a container.

If working with METS, a METS editor is needed!


Further information[edit]

Tools

The following tools are needed for the production of raw data

  • A xml editor for the acquisition of structural data.
  • OCR for the production of transcriptions.
  • A tool for the transformation of the archive pictures (tiff) to formats appropriate for the internet (gif, jpeg, PDF). This tool should include an automatic image quality correction.

Copyright

  • The source of the digitized works are in the public domain. The digitized works them selfs don't have a copyright, they are free for further usage.
  • Only integrated secondary literature have to be checked for copyright licenses.


Related links/interesting digitisation projects

  • Later on, further books should be digitalized. Therefore, a collaboration with the DTA (Deutsches Text Archiv) will be discussed. The aim of the DTA is to create a digital collection with several hundred million tokens (words) from German documents. This collection should contain the scans and transcriptions of the documents and should reflect a representative picture of the linguistical and cultural development of starting from the middle of the 17th century until now.
  • Rules for digitisation (DFG, March 2007, in german)

Praxisregeln Digitalisierung

Project Management[edit]

Meetings[edit]

18.06.07 (Frankfurt): Kick off Meeting

22.10.07 (Frankfurt): Follow up Meeting


Work in Progress[edit]

Next steps:

  • preparation work for starting ingestion of 2 volumes (scanned pages and bibliografic data)
  • conversion local MAB data to eSciDoc xml
  • conversion eBind to METS (METS as import, export format)
  • evaluate METS as possible format for local generation/enrichment of metadata (until 22nd oct latest, next meeting)
  • check editor tool for prepare/enrich metadata via METS (GOOBI)


ViRR Schedule

ViRR Scope