ViRR Specification

From MPDLMediaWiki
Jump to navigation Jump to search
eSciDoc Solutions

PubMan:
Overview · Functionalities
Interfaces · Support

Faces:
Overview · Functionalities
Scope · Support

ViRR:
Overview · Functionalities
Scope · Support

imeji
Digitization Lifecycle

edit

This is a protected page.

Restricted Access

in rework, please don´t edit!

The start content (the images) are available as tiff, pdf and jpeg. The tiffs (up to 50 MB per picture) are the original archive version, the pdfs are needed for the download and the jpgs for the web presentation.


FIRST PHASE[edit]

Release one[edit]

Ingestion[edit]

  • Technical (without an user interface) ingestion of the scans
  • Automatic derivation of a first skeleton of the ToC (based on the signature of the book and the page numbers)
  • Ingestion or manually submission of the bibliographic metadata (available in MAB)
  • Automatic derivation of the basic keywords from the book titles as basis for an index list

Browsing[edit]

  • Alphabetical (sorted by title) browsing tree as entrance to the collection´s content

Display[edit]

  • A thumbnail list with all pictures of one book
  • Bibliographic metadata of book
  • Each picture in a "detailed view"

Open Points[edit]

  1. Bibliographic Metadata
    • A list of all MABs fields used by the institute is necessary to prepare the mapping.
  2. Quality correction of the pictures
    • The institute is not pleased with the quality of the original TIFFs (e.g. the black frames of the scans should be removed) and also with the created jpgs. The MPDL can only offer an automatic refinement of the presentation of the pictures (jpgs), not of the original files. If a correction of the original files is needed, this has to be done manually by the institute.


Release two[edit]

Editing[edit]

  • Enrichment of the already available ToC skeleton with information about chapters (e.g. page 13-33 are belong to chapter 1)
  • Manually adding of metadata to the chapters (e.g. keywords)

Open Points[edit]

  1. Descriptive Metadata
    • A list of all metadata that should be marked up is needed from the institute.


Release three[edit]

Browsing[edit]

  • Extension of the alphabetical browsing tree with the chapters of the books
  • Chronological (sorted by the creation date) browsing tree as entrance to the collection´s content
  • Navigating functionalities within the thumbnail lists (paginator)
  • Paging (blättern) within the "detailed view" of the pictures of one book

Display[edit]

  • Integration of Digilib functionalities to navigate within one picture (in the "detailed view")
    • Zooming in, zooming out, framing of a zoom area, switching back to see the whole picture
    • Left, right, up and down navigation within a zoomed picture
    • "View negative" functionality
    • Dynamic creation of a "identification stamp" (Herkunftsnachweis) on all pictures in the "detailed view" (also after zooming)
  • Metadata (bibliographic, descriptive, administrative/technical) of a book and a chapter

Search[edit]

  • Simple Search
    • One search field, which searches in every metadata field
  • Advanced Search
    • Search for special metadata

Open Points[edit]

  1. Browsing
    • Shall the chronological browsing tree be sorted by the creation date of the books or the chapters within the books?
    • Is a browsing via the names of the documents (law titles) needed?
  2. Search
    • Are auto suggested fields in terms of lists needed for special metadata fields (e.g. author)?


Release four[edit]

Keywords[edit]

Keywords are used as basis for a browsable index list.

  • The following content should be indexed by the system and displayed in the list:
    • Bibliographic metadata
    • Descriptive metadata
  • Manually editing of keyword lists (assignment of keywords)
    • Keywords are part of the bibliographic and the descriptive metadata.



Open Questions:

  1. Keyword list als eine Browse Möglichkeit?
  2. Suche: Sollen Vorschläge gemacht werden (nachdem die ersten Buchstaben eines Begriffs eingegeben werden) bezüglich des Feldes "keywords"?
  3. Wird ein Webinterface zur Erstelung der Keyword Liste gebraucht?
  4. Handelt es sich hierbei um frei wählbare Stichwörter oder um standardisierte Schlagworte (unterliegen bestimmten Regeln)?
  5. Sollte es zwei separate Listen geben (eine mit den keywords und eine mit dem Index)?


Export[edit]

Each picture can be selected for a later export.

  • Downloading of pictures
    • Download of one or several single pages as jpgs (each image is one file)
    • Saving and downloading of a (zoomed) part of the picture (as jpg)
    • Downloading of several selected pages of one book (or a whole book) in one document (conversion from several jpgs to one pdf). This pdf should be automatically expanded with a cover page containing (identification) information like the origin of the pages, the usage rights, the URL and the bibliographical metadata about the book.
  • Downloading of metadata
    • Downloading of the METS document in XML for one book

The generated pdfs should not be to big, so a compression method is needed.

Persistent Identification[edit]

Following objects should be referenced:

  1. the book
  2. the toc
  3. each page

Later on, (after the creation of the METS data) we also need:

  1. each document (separate law, written be different authors)
  2. each chapter, paragraph, etc.

Open Points[edit]

  1. Export
    • Is the downloading of the metadata of a book or chapter in (several) citation styles needed or is ? If yes, which citation styles should be supported?

SECOND PHASE[edit]

User Management (Visibility, privileges)[edit]

  1. User roles
    1. Unregistered user (full viewing rights)
    2. Account user (full viewing rights and the right to make proposals for the content of the descriptive metadata, keywords and synonyms)
    3. Local administrator (full viewing and editing rights, competence to accept or refuse the proposals of the account users)
  2. User management workspace (only visible and editable for local administrators)
    • All information about the account users will be stored in the system: new users can be added, still available users can be edited or deactivated (not deleted).
    • User group creation will be added in a later stage (together with baskets).
  1. Bibliographical metadata should only be writable for administrators

Open Questions:

  1. Are further roles are needed (e.g. librarian, scientist)? With what privileges?

Annotations[edit]

  • What should be annotated (only the transcriptions or something else)?
  • What markup is needed for that annotation?

Synonyms[edit]

  1. Creation of a synonym list
    • Currently, no standardized synonym list is available at the institute.
    • The start content for the synonym list will be manually created by the institute based on the bibliographical and structural metadata.
  2. Assignment of synonyms
    • During the creation/editing of metadata it should be possible to specify new synonyms which then should be automatically added to the synonym list.
      As metadata is only writable for administrators, synonyms can only be specified by administrators. But scientists will have the opportunity to make proposals via a workflow.
    • If one synonym is not valid for the whole term of a metadata field, it should be possible to mark the part of the term (e.g. for "Die Polizeiordnungen des heiligen römischen Reiches" one synonym is "Polizeyordnungen").
  3. Scenarios based on synonyms
    • Search: When searching for one special term, also the items which use a synonym of this term in their metadata will be found (e.g the search for "Augsburg" also finds "Augspurg")
    • To do: Further scenarios will be named by the institute.

Open Questions:

  1. Auf welche Metadaten bezieht sich die Synonym Liste (z.b. Autor, Titel, Verlagsort)?
  2. Sollen immer alle Synonyme eines Metadatums mit angezeigt werden (in der Editmaske sowie in der detailed view)?
  3. Reicht eine Synonymliste aus? Soll diese Liste auch separat von den Metadaten sichtbar sein?
  4. Suche: Sollen Vorschläge gemacht werden (nachdem die ersten Buchstaben eines Begriffs eingegeben werden)? Das würde aber bedeuten, dass es für jedes Metadatenfeld eine eigene Liste geben muss (oder sollen die Vorschläge nur bei der simple search angezeigt werden?).
  5. Wird ein Webinterface zur Erstelung der Synonym Liste gebraucht?
  6. Vorschlag: Es wird eine Keyword Liste erstellt die mit Synonymen ergänzt werden kann und als Index dient.

Ingestion[edit]

  1. Batch methods for the ingestion of already available big data streams (scans, metadata, transcriptions?)
  2. Individual creation- and submission mask for bibliographical metadata about the books (web interface)
  3. A tool for the transformation of the archive pictures (tiff) to formats appropriate for the internet (jpeg for the viewing tool, pdf for the export).

Editing[edit]

  1. Individual creation- and submission mask for descriptive metadata about the documents of the books (web interface)
    Descriptive metadata should only be writable for administrators. But it should be possible that the scientists propose content for the descriptive metadata. This proposals have to be checked by an administrator who than decides if they can be published as alternatives. --> A workflow engine is needed, but not until the second phase

THIRD PHASE[edit]

Transcriptions[edit]

  • The transcription of all pages is desirable. Perhaps an OCR method can be integrated, which has to be corrected manually.
  • Heidelberg will deliver several transcriptions of the texts which can be integrated in ViRR. There, the scientists shall have the possibility to rework them.
    ViRR shall support to annotate external objects (like the transcriptions from Heidelberg), without saving the object directly in ViRR (redundant data storage is not desirable).
    This can only work, when the users of ViRR have the privileges to change the external objects.
  1. Tool for the online creation of transcriptions. This tool needs the following functionalities:
    • Automatic display of "who has done what"
    • Workflow engine: every user should be enable to create or update a transcription, but the changes will only be released, when they are authorized by a legitimate person.
    • For the search functionality, also the following content should be indexed:
    3. Transcriptions