Difference between revisions of "Talk:ViRR Specification"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 1: Line 1:
== Discussion on data formats ==
ToDo's, comments for each Rélease


=== Possible formats ===
== FIRST PHASE - Publication of the digital collection ==


==== 1. eBind ====
=== Release one ===
[http://sunsite.berkeley.edu/Ebind/ eBind] was developed in (1996) but has never been finished as a international standard. But because a small tool for converting eBind files into HTML pages was available ([http://sunsite.berkeley.edu/cgi-bin/ebind2html/2/breen?cap eBind2HTML]), eBind was used very often. In 1998 "MOA2" ([http://sunsite3.berkeley.edu/MOA2/ Making of America]) was developed out of eBind. MOA2 then become [http://www.loc.gov/standards/mets/ METS] as part of the standardization process.


'''Pro:'''
# Ingestion
*eBind enables to structure the texts in paragraphs.
#* scans --> derive from file structure a first skeleton of toc (name of book, sequence of pages)
#* bibliographic metadata (currently in MAB) - either entered manually or ingested
#* derive basic keywords from title of book (to be checked with institute)
# Display (basic)
#* thumbnail lists
#* basic bibliographic metadata (name of book, page)
#* scans
# Browsing (basic)
#* browsing tree (books and pages), sorted alphabetically by title


'''Contra:'''
ToDos:
*eBind does not enable to mark up journals, because its not possible to define one author per article, only per book.
* ingestion of bibliografic data: either mapping MAB to eSciDoc XML or to Dublin Core, or enter data manually (evaluation by inga?)
* relation to eSciDoc METS profile (first draft based on Ingas mapping ebinds<=>METS)?


==== 2. METS ====
''Already some input from Inga: Generic Mapping MAB=> eSciDoc is not of great help, as MAB is poor in bibliografic information, in addition, each MAB user uses his own adapted MAB and using MAB means running into severe character set problems. If a mapping is needed, mapping to DC might be sufficient. In addition, instead of mapping, manual data entry should be considered, especially when dealing only with 2 books. In any case, new "eSciDoc VIRR profile" might be needed, as genre types and current PubMan Metadata won't cover the VIrr material. ''
'''Pro:'''
* improve quality of image files (based on TIFFs) => for improved thumbnails + additional resolution for web presentation => check concrete requs with Institute ("schwarze Ränder" on TIFFs would have to be done by institute). Check requirements for resolution needed by digilib
* METS enables the mark up of front, cover, etc.
* functional prototype for Display and browsing
* METS is an international standard. It displays the hierarchical structure, the name and the location of the data storage and the metadata of objects. --> METS can be used for descriptive metadata as well as container format.
* start collecting requirements for the viewing environment [[Digilib#Requirements_for_Solutions|DigiLib]] and set up meeting with user group (Contact@FIZ: Frank Schwichtenberg) (Kristina, Tobias)


'''Contra:'''
=== Release two ===
no direct translation into eSciDoc internal format (basic services)


Im Interesse weiterer Projekte (fuer die ggf. eine Foerderung der DFG beantragt werden koennte) ist METS nahezu zwingend. Alternativ zu diesem seitenorientierten Format kaeme TEI als dokumentorientiertes Format in Frage. Das MPI praeferiert eindeutig METS, da es den gegenwaertigen Anforderungen genuegt und einen geringeren Aufwand nach sich zieht. s. auch DFG-Praxisregeln, S. 17. 18.10.2007, S. Amedick
# Editing
#* enrich toc sceleton with information on chapters (i.e. bundles)(e.g. page 1-5 = chapter 1)
#* add metadata about the chapters, e.g. keywords
#* editing via simple edit mask or already with METS editor (selection of Editor depends on eSciDoc METS profile)


==== 3. eSciDoc native format ====
'''TO DO:'''
will be checked together with FIZ
* prepare first draft eSciDoc METS profile (based on bibliografic data needed, descriptive data needed)
* decide on recommended METS editor
* prepare requirements for FIZ for the METS integration


=== Release three ===


=== Discussion ===
# Display (detailled)
#* integration of digilib functionalities (minimum: zoom in, zoom out)
#* dynamic generation and integration of "identification stamp" ("Herkunftsnachweis") on the images (whole image, selected part of image) --> new Digilib requirement
# Browsing (detailled)
#* extension of the alphabetical browsing tree (chapters)
#* chronological navigation  on book and/or chapter level?(depends on descriptive Metadata! )
#* paginator (for lists)
#* paging for images (i.e. "im Buch blaettern")
# Search
#* simple search (one search field "any field")
#* advanced search (several special search fields, e.g. one for title, one for author)


Results form the internal meeting on 12th of October
=== Release four ===


In general, an external format (like METS/eBinds/eSciDoc) can be used in three different ways:
# Functional definition of eSciDoc METS profile
# importing digital objects in eSciDoc's native format
#* needed for import / export
# importing from METS format - might be very problematic from Natasa's point of view, e.g. because METS is very broad and only a specific import for ViRR METS can be done
# Export
# supporting METS as native format in eSciDoc -> this would require a lot of redesign in the basic services. According to Malte there are related requirements coming from the GBV
#* image selection
# exporting to METS -> export is probably not very problematic
#* downloading/printing of selected images(in separate jpgs)
#* downloading/printing of selected images(in one pdf with a cover page)
#* downloading/printing selected part of an image
#* downloading of METS-xml
# Display keywords as list (cf. Index in a book)
# Persistent Identifier (PID)


Questions:
== SECOND PHASE - Virtual research environment ==
# from where does the concrete METS requirement come from? Does the MPIeR have concrete needs or is it more a "best practice" & assumption?
Following is a list of requirements to be met...detailled release planning at a later stage.
# is the eSciDoc native format rich/flexible enough to represent the [structure of the] digital objects as required by MPIeR?
# If yes, does this mean we need to provide an offline editor for the eSciDoc native format ourselves?


Result: This question is the mayor decision in the project and will influence the required/chosen implementation essentially. The decision needs to be taken until January 2008! We decided to prepare an detailed evaluation together with FIZ
* Workflow for edition process of collection, incl. metadata, images, annotations, external sources (upload, editing, annotating, scientific review etc.)
* User Management to support workflow
* Fulltext transcription online (offline client at later stage) - in METS
* Ingestion/Upload of additional books (digital images + bibliographic metadata) - local ressources, BBAW-DTA
* Adding and editing of bibliografic and descriptive metadata
* Adding annotations
* Adding relations
* Adding comments
* Integration of external ressources (Deutsches Rechtswoerterbuch/Heidelberg)
* Creation and maintenance of synonyms
* Offering metadata to the ZVDD and other virtual libraries - OAI interface
* Sitemap protocol for crawlers
* Integration of research literature for download (bibliografic lists? articles?)
* Linking to other digital archives / OPACs /research projects
 
 
'''TO DO:'''
* Structural analyzes of the data of the Deutsche Rechtswörterbuch
* Analyzes of the requirements of the ZVDD
* Text editor for the creation of transcriptions is needed
 
== THIRD PHASE - Productive environment ==
* Preparation of productive environment (hardware, support, policies)
* Offline tool for image processing to improve image quality
* Fulltext transcription in TEI?
* Additional functionality for historisch-kritische Editionsarbeit?
* Concept ViRR for other local/MPG projects (e.g. Policey-Ordnung)

Revision as of 12:45, 30 January 2008

ToDo's, comments for each Rélease

FIRST PHASE - Publication of the digital collection[edit]

Release one[edit]

  1. Ingestion
    • scans --> derive from file structure a first skeleton of toc (name of book, sequence of pages)
    • bibliographic metadata (currently in MAB) - either entered manually or ingested
    • derive basic keywords from title of book (to be checked with institute)
  2. Display (basic)
    • thumbnail lists
    • basic bibliographic metadata (name of book, page)
    • scans
  3. Browsing (basic)
    • browsing tree (books and pages), sorted alphabetically by title

ToDos:

  • ingestion of bibliografic data: either mapping MAB to eSciDoc XML or to Dublin Core, or enter data manually (evaluation by inga?)
  • relation to eSciDoc METS profile (first draft based on Ingas mapping ebinds<=>METS)?

Already some input from Inga: Generic Mapping MAB=> eSciDoc is not of great help, as MAB is poor in bibliografic information, in addition, each MAB user uses his own adapted MAB and using MAB means running into severe character set problems. If a mapping is needed, mapping to DC might be sufficient. In addition, instead of mapping, manual data entry should be considered, especially when dealing only with 2 books. In any case, new "eSciDoc VIRR profile" might be needed, as genre types and current PubMan Metadata won't cover the VIrr material.

  • improve quality of image files (based on TIFFs) => for improved thumbnails + additional resolution for web presentation => check concrete requs with Institute ("schwarze Ränder" on TIFFs would have to be done by institute). Check requirements for resolution needed by digilib
  • functional prototype for Display and browsing
  • start collecting requirements for the viewing environment DigiLib and set up meeting with user group (Contact@FIZ: Frank Schwichtenberg) (Kristina, Tobias)

Release two[edit]

  1. Editing
    • enrich toc sceleton with information on chapters (i.e. bundles)(e.g. page 1-5 = chapter 1)
    • add metadata about the chapters, e.g. keywords
    • editing via simple edit mask or already with METS editor (selection of Editor depends on eSciDoc METS profile)

TO DO:

  • prepare first draft eSciDoc METS profile (based on bibliografic data needed, descriptive data needed)
  • decide on recommended METS editor
  • prepare requirements for FIZ for the METS integration

Release three[edit]

  1. Display (detailled)
    • integration of digilib functionalities (minimum: zoom in, zoom out)
    • dynamic generation and integration of "identification stamp" ("Herkunftsnachweis") on the images (whole image, selected part of image) --> new Digilib requirement
  2. Browsing (detailled)
    • extension of the alphabetical browsing tree (chapters)
    • chronological navigation on book and/or chapter level?(depends on descriptive Metadata! )
    • paginator (for lists)
    • paging for images (i.e. "im Buch blaettern")
  3. Search
    • simple search (one search field "any field")
    • advanced search (several special search fields, e.g. one for title, one for author)

Release four[edit]

  1. Functional definition of eSciDoc METS profile
    • needed for import / export
  2. Export
    • image selection
    • downloading/printing of selected images(in separate jpgs)
    • downloading/printing of selected images(in one pdf with a cover page)
    • downloading/printing selected part of an image
    • downloading of METS-xml
  3. Display keywords as list (cf. Index in a book)
  4. Persistent Identifier (PID)

SECOND PHASE - Virtual research environment[edit]

Following is a list of requirements to be met...detailled release planning at a later stage.

  • Workflow for edition process of collection, incl. metadata, images, annotations, external sources (upload, editing, annotating, scientific review etc.)
  • User Management to support workflow
  • Fulltext transcription online (offline client at later stage) - in METS
  • Ingestion/Upload of additional books (digital images + bibliographic metadata) - local ressources, BBAW-DTA
  • Adding and editing of bibliografic and descriptive metadata
  • Adding annotations
  • Adding relations
  • Adding comments
  • Integration of external ressources (Deutsches Rechtswoerterbuch/Heidelberg)
  • Creation and maintenance of synonyms
  • Offering metadata to the ZVDD and other virtual libraries - OAI interface
  • Sitemap protocol for crawlers
  • Integration of research literature for download (bibliografic lists? articles?)
  • Linking to other digital archives / OPACs /research projects


TO DO:

  • Structural analyzes of the data of the Deutsche Rechtswörterbuch
  • Analyzes of the requirements of the ZVDD
  • Text editor for the creation of transcriptions is needed

THIRD PHASE - Productive environment[edit]

  • Preparation of productive environment (hardware, support, policies)
  • Offline tool for image processing to improve image quality
  • Fulltext transcription in TEI?
  • Additional functionality for historisch-kritische Editionsarbeit?
  • Concept ViRR for other local/MPG projects (e.g. Policey-Ordnung)