ViRR Specification
|
This is a protected page.
Restricted Access
in rework, please don´t edit!
The start content (the images) are available as tiff, pdf and jpeg. The tiffs (up to 50 MB per picture) are the original archive version, the pdfs are needed for the download and the jpgs for the web presentation.
FIRST PHASE[edit]
Release one[edit]
Ingestion[edit]
- Technical (without an user interface) ingestion of the scans
- Automatic derivation of a first skeleton of the ToC (based on the signature of the book and the page numbers)
- Ingestion or manually submission of the bibliographic metadata (available in MAB)
- Automatic derivation of the basic keywords from the book titles as basis for an index
Browsing[edit]
- Alphabetical (sorted by title) browsing tree as entrance to the collection´s content
Display[edit]
- A thumbnail list with all pictures of one book
- Bibliographic metadata of book
- Each picture in a "detailed view"
Open Points[edit]
- Bibliographic Metadata
- A list of all MABs fields used by the institute is necessary to prepare the mapping.
- Quality correction of the pictures
- The institute is not pleased with the quality of the original TIFFs (e.g. the black frames of the scans should be removed) and also with the created jpgs. The MPDL can only offer an automatic refinement of the presentation of the pictures (jpgs), not of the original files. If a correction of the original files is needed, this has to be done manually by the institute.
Release two[edit]
Editing[edit]
- Enrichment of the already available ToC skeleton with information about chapters (e.g. page 13-33 are belong to chapter 1)
- Manually adding of metadata to the chapters (e.g. keywords)
Open Points[edit]
- Descriptive Metadata
- A list of all metadata that should be marked up is needed from the institute.
Release three[edit]
Browsing[edit]
- Extension of the alphabetical browsing tree with the chapters of the books
- Chronological (sorted by the creation date) browsing tree as entrance to the collection´s content
- Navigating functionalities within the thumbnail lists (paginator)
- Paging (blättern) within the "detailed view" of the pictures of one book
Display[edit]
- Integration of Digilib functionalities to navigate within one picture (in the "detailed view")
- Zooming in, zooming out, framing of a zoom area, switching back to see the whole picture
- Left, right, up and down navigation within a zoomed picture
- "View negative" functionality
- Dynamic creation of a "identification stamp" (Herkunftsnachweis) on all pictures in the "detailed view" (also after zooming)
- Metadata (bibliographic, descriptive, administrative/technical) of a book and a chapter
Search[edit]
- Simple Search
- One search field, which searches in every metadata field
- Advanced Search
- Search for special metadata
Open Points[edit]
- Browsing
- Shall the chronological browsing tree be sorted by the creation date of the books or the chapters within the books?
- Is a browsing via the names of the documents (law titles) needed?
- Search
- Are auto suggested fields in terms of lists needed for special metadata fields (e.g. author)?
Release four[edit]
Keywords[edit]
- Index
- An overall index over the collection in form of a browsing list is needed (is similar to the simple search).
- The following content should be indexed and visible in this list:
- Bibliographic metadata
- Descriptive metadata
- For the search functionality, also the following content should be indexed:
- 3. Transcriptions
- Creation of a keyword list
- Currently, no standardized keyword list exists at the institute.
- The start content for the keyword list will be manually created by the institute based on the titles of the books and the inside documents.
- Assignment of keywords
- Keywords are part of the bibliographic and the descriptive metadata.
- Keywords should be based on an authority file.
- During the creation/editing of metadata it should be possible to use keywords from an authority file or specify new keywords which then should be automatically added to the keywords list.
- As metadata is only writable for administrators, keywords can only be specified by administrators. But scientists will have the opportunity to make proposals via a workflow.
Open Questions:
- Keyword list als eine Browse Möglichkeit?
- Suche: Sollen Vorschläge gemacht werden (nachdem die ersten Buchstaben eines Begriffs eingegeben werden) bezüglich des Feldes "keywords"?
- Wird ein Webinterface zur Erstelung der Keyword Liste gebraucht?
- Handelt es sich hierbei um frei wählbare Stichwörter oder um standardisierte Schlagworte (unterliegen bestimmten Regeln)?
- Sollte es zwei separate Listen geben (eine mit den keywords und eine mit dem Index)?
Download / Export[edit]
Each picture, each book, each document (in each view) can be selected for a later export.
- Download of one or several single pages as jpegs
- Downloading of several selected pages of one book in one document (conversion from several jpegs to one pdf). This pdf should be automatically expanded with a cover page containing (identification) information like the origin of the pages, the usage rights, the URL and some bibliographical metadata.
- Downloading of a whole book as pdf (expanded with a cover page)
- Downloading of the METS document in XML
- Downloading of the transcriptions in xml and in pdf
- Export of only a part of the picture (after zooming)
The generated pdfs should not be to big, so a compression method is needed.
All downloads should also be printable (without downloading them).
Open Questions:
- Is downloading of the metadata of a book, collective title or document in (several) citation styles needed? If yes, which citation styles should be supported?
Persistent Identification[edit]
Following objects should be referenced:
- the book
- the toc
- each page
Later on, (after the creation of the METS data) we also need:
- each document (separate law, written be different authors)
- each chapter, paragraph, etc.
Open Points[edit]
SECOND PHASE[edit]
User Management (Visibility, privileges)[edit]
- User roles
- Unregistered user (full viewing rights)
- Account user (full viewing rights and the right to make proposals for the content of the descriptive metadata, keywords and synonyms)
- Local administrator (full viewing and editing rights, competence to accept or refuse the proposals of the account users)
- User management workspace (only visible and editable for local administrators)
- All information about the account users will be stored in the system: new users can be added, still available users can be edited or deactivated (not deleted).
- User group creation will be added in a later stage (together with baskets).
- Bibliographical metadata should only be writable for administrators
Open Questions:
- Are further roles are needed (e.g. librarian, scientist)? With what privileges?
Annotations[edit]
- What should be annotated (only the transcriptions or something else)?
- What markup is needed for that annotation?
Synonyms[edit]
- Creation of a synonym list
- Currently, no standardized synonym list is available at the institute.
- The start content for the synonym list will be manually created by the institute based on the bibliographical and structural metadata.
- Assignment of synonyms
- During the creation/editing of metadata it should be possible to specify new synonyms which then should be automatically added to the synonym list.
- As metadata is only writable for administrators, synonyms can only be specified by administrators. But scientists will have the opportunity to make proposals via a workflow.
- If one synonym is not valid for the whole term of a metadata field, it should be possible to mark the part of the term (e.g. for "Die Polizeiordnungen des heiligen römischen Reiches" one synonym is "Polizeyordnungen").
- During the creation/editing of metadata it should be possible to specify new synonyms which then should be automatically added to the synonym list.
- Scenarios based on synonyms
- Search: When searching for one special term, also the items which use a synonym of this term in their metadata will be found (e.g the search for "Augsburg" also finds "Augspurg")
- To do: Further scenarios will be named by the institute.
Open Questions:
- Auf welche Metadaten bezieht sich die Synonym Liste (z.b. Autor, Titel, Verlagsort)?
- Sollen immer alle Synonyme eines Metadatums mit angezeigt werden (in der Editmaske sowie in der detailed view)?
- Reicht eine Synonymliste aus? Soll diese Liste auch separat von den Metadaten sichtbar sein?
- Suche: Sollen Vorschläge gemacht werden (nachdem die ersten Buchstaben eines Begriffs eingegeben werden)? Das würde aber bedeuten, dass es für jedes Metadatenfeld eine eigene Liste geben muss (oder sollen die Vorschläge nur bei der simple search angezeigt werden?).
- Wird ein Webinterface zur Erstelung der Synonym Liste gebraucht?
- Vorschlag: Es wird eine Keyword Liste erstellt die mit Synonymen ergänzt werden kann und als Index dient.
Ingestion[edit]
- Batch methods for the ingestion of already available big data streams (scans, metadata, transcriptions?)
- Individual creation- and submission mask for bibliographical metadata about the books (web interface)
- A tool for the transformation of the archive pictures (tiff) to formats appropriate for the internet (jpeg for the viewing tool, pdf for the export).
Editing[edit]
- Individual creation- and submission mask for descriptive metadata about the documents of the books (web interface)
- Descriptive metadata should only be writable for administrators. But it should be possible that the scientists propose content for the descriptive metadata. This proposals have to be checked by an administrator who than decides if they can be published as alternatives. --> A workflow engine is needed, but not until the second phase
THIRD PHASE[edit]
Transcriptions[edit]
- The transcription of all pages is desirable. Perhaps an OCR method can be integrated, which has to be corrected manually.
- Heidelberg will deliver several transcriptions of the texts which can be integrated in ViRR. There, the scientists shall have the possibility to rework them.
- ViRR shall support to annotate external objects (like the transcriptions from Heidelberg), without saving the object directly in ViRR (redundant data storage is not desirable).
- This can only work, when the users of ViRR have the privileges to change the external objects.
- Tool for the online creation of transcriptions. This tool needs the following functionalities:
- Automatic display of "who has done what"
- Workflow engine: every user should be enable to create or update a transcription, but the changes will only be released, when they are authorized by a legitimate person.