Difference between revisions of "ViRR Scope"

From MPDLMediaWiki
Jump to navigation Jump to search
Line 47: Line 47:
# Ingestion all pages of the 2 books
# Ingestion all pages of the 2 books
# Testing of fulltext ingestion
# Testing of fulltext ingestion
=== User Goals ===
The functional Prototype for R2 should enable the following user goals:
# (Multi)Volumes
#* The user wants to enhance the metadata of a multivolume or a volume
# Chapters
#* The user wants to define chapters as a sum of separate pages (mark which pages belong together in form of a chapter)
#* The user wants to number serially the defined chapters
#* The user wants to define metadata for each chapter
# Pages
#* The user wants to assign an order label to each page
#* The user wants to assign a page label to some pages





Revision as of 09:31, 16 June 2008

eSciDoc Solutions

PubMan:
Overview · Functionalities
Interfaces · Support

Faces:
Overview · Functionalities
Scope · Support

ViRR:
Overview · Functionalities
Scope · Support

imeji
Digitization Lifecycle

edit

This is a protected page.

Planning[edit]

The solution ViRR will be planned and developed in several phases, each of them with different focus.

The first phase focuses on publication of the start content on the web, with browsing, display and search features. In the first phase of the prototype we will start with 2 books to get a deeper understanding of data formats, metadata and mappings needed. Each release will integrated some more books so that at the end, all books are ingested. In addition, first assumptions for GUI can be taken.

The second phase focuses on the collaborative and interdisciplinary aspects to foster a virtual research environment.

The productive environment can start for each phase of the solution after it is finished. It contains re-usage of the solution for other projects and might include "nice-to-have" add-ons for e.g. improving image quality.

Each phase will be divided into several implementation steps (releases), to allow gradual but ongoing specification and development.

Specification and functional prototypes for each release will be available on the Wiki.

Please note that phases and respective releases might be adapted during development life cycle.

See ToDos for each release on the discussion page.


FIRST PHASE - Publication of the digital collection[edit]

Release one[edit]

  1. Ingestion (no user interface)
    • scans --> derive from file structure a basic skeleton of toc
    • bibliographic metadata: MAB mapping to MODS
    • structural metadata: eSciDoc container
    • derive basic keywords from bibliographic metadata
  2. Browsing and Display (basic)
    • alphabetical sorted browsing tree (multi-volume works, parts and pages)
    • display of basic bibliographic metadata (name of book, page)
    • display of scans in detailed view

Related Links[edit]


Release two[edit]

  1. Editing
    • enrich toc skeleton with information on chapters (i.e. bundles)(e.g. page 1-5 = chapter 1)
    • add metadata about the chapters, e.g. keywords
  2. Support viewing content by DFG viewer
  3. Ingestion all pages of the 2 books
  4. Testing of fulltext ingestion


Release three[edit]

  1. Browsing and Display (detailed)
    • extension of the alphabetical browsing tree (chapters)
    • systematical browsing tree as alternative entrance to the collection
    • paginator (for lists)
    • paging for images (i.e. "im Buch blaettern")
    • integration of Digilib functionalities (minimum: zoom in, zoom out)
    • dynamic generation and integration of "identification stamp" ("Herkunftsnachweis") on the images (whole image, selected part of image) --> new Digilib requirement
  2. Search
    • simple search (one search field "any field")
    • advanced search (several special search fields, e.g. one for title, one for author)
  3. Import and Export of METS xml
    • based on eSciDoc METS profile (i.e. minimum DFG viewer METS profile, plus local extensions needed)
  4. Ingestion all 15 books of start content


Release four[edit]

  1. Export
    • image selection
    • downloading of selected images(in separate jpgs)
    • downloading of selected images(in one pdf with a cover page)
    • downloading selected part of an image
    • downloading of METS-xml
  2. Display keywords as list (cf. Index in a book)
  3. Persistent Identifier (PID)


Release ???[edit]

required for DFG

  1. Collection description
  2. URN handling (in the context of an assignment of parts to a multi-volume work)


SECOND PHASE - Virtual research environment[edit]

Following is a list of requirements ...detailed release planning will come at a later stage.

  • Workflow for edition process of collection, incl. metadata, images, annotations, external sources (upload, editing, annotating, scientific review etc.)
  • User Management to support workflow
  • Fulltext transcription online (offline client at later stage) - in METS
  • Ingestion/Upload of additional books (digital images + bibliographic metadata) - local resources, BBAW-DTA
  • Adding and editing of bibliographic and descriptive metadata
  • Adding annotations / comments
  • Adding relations
  • Integration of external resources (Deutsches Rechtswoerterbuch/Heidelberg)
  • Creation and maintenance of synonyms
  • Offering metadata to the ZVDD(zentrales Verzeichnis digitalisierter Drucke) and other virtual libraries - OAI interface for the exchange of metadata
  • Sitemap protocol for crawlers
  • Integration of research literature for download (bibliographic lists? articles?)
  • Linking to other digital archives / OPACs /research projects
  • Delivery of one complete dataset for the DNB for long term archiving

ToDos (Discussion):

  • Structural analyzes of the data of the Deutsche Rechtswörterbuch
  • Analyzes of the requirements of the ZVDD
  • Text editor for the creation of transcriptions is needed

Productive Environment[edit]

Each phase can go productive after decision by institute

  • Preparation of productive environment (hardware, support, policies)
  • Offline tool for image processing to improve image quality
  • Fulltext transcription in TEI?
  • Additional functionality for historisch-kritische Editionsarbeit?
  • Concept ViRR for other local/MPG projects (e.g. Policey-Ordnung)


Expectations[edit]

Expectations MPIeR[edit]

  1. The content of the collection ViRR will be digitally preserved and persistently identified.
  2. The data of the collection ViRR will be published open access.
  3. ViRR will be an open collection, so the import of further digitized work will be possible after the solution is in production.
  4. The ViRR solution has to be configurable so that the institute will be able to use it independently for further digitization projects.
  5. The solution, services and framework are continuously maintained and further developed by a central unit.
  6. The ViRR project has to follow the DFG Praxisregeln (Tischvorlage von Frau Amedick). The paper covers following aspects:
    • selection of works
    • digitization techniques
    • digital preservation
    • metadata (METS and TEI for structure metadata, should)
    • re-use and integration with portals (OAI-PMH, must)
    • persistent identification (URNs and/or DOIs, should)
    • accessibility of metadata & digitized works (open access, must)
    • required functionalities for representation


Expectations MPDL[edit]

  • ViRR will be a service based on the eSciDoc infrastructure for handling scanned books.
  • ViRR will be delivered as an open source self-contained solution, which can be installed and run with predefined standard set-up.
  • The MPDL will use the ViRR solution as showcase for demonstrating possible research data scenarios based on the infrastructure. The institute's staff will support respective outreach activities by reporting on their experiences.
  • The MPDL has access to the root account for administration purposes.
  • The data of ViRR will be hosted at the MPDL (and/or its partners like the GWDG), who are also responsible for the server administration. Details will be terminated in a service level agreement.